Getting Started with Digitisation


This guide was last revised 3 June 2009

Today we can easily discover, share and use our knowledge and creativity using technologies in ways vastly different from the pre-digital era. Our ability to do this will only increase over time. The Make it Digital approach is to identify elements of good practice for digital content creation based on an understanding of the digital content life cycle. Key to the life cycle and how long your digital content will survive is the use of open standards.

Make it Digital has one detailed Getting Started with Digital guide:

  1. Digitising Family History and Whakapapa

Making content digital

Being digital can be hard work. We are analogue beings in an analogue world, where rules around time, location and materials greatly constrain what we can learn and create. Digital technologies disrupt many of those rules by changing the constraints. Today we can easily discover, share and use our knowledge and creativity digitally in ways that are impractical in analogue form. Our ability to do this digitally will only increase over time.

For those of us interested in making New Zealand’s knowledge and creativity available and accessible in digital form, it can be really hard to know where to start. That’s why we started with these Make it Digital guides. We were tasked with creating an online, up-to-date source of easy-to-follow advice accessible to anyone in New Zealand wanting to make their content available digitally. We want the guides to be collaborative, and aimed at staff, volunteers and interested individuals who may not be professionally trained in producing or managing digital content. In particular, the guides should suggest practical courses of action rather than just be a bibliography of the often conflicting resources to be found on the web.

As web-based documents, these guides are a continual work in progress. Where there are things missing, confusing or just plain wrong, let us know through leaving your comment. If you are stuck with a specific problem that the guides can’t answer, use our Ask a Question forum to get help. If you are an expert or a willing contributor, drop us a line to get involved in writing or updating the next set of guides.

Our approach to good practice

All digital content comes with what has been called ‘the digital mortgage’ – the ongoing investment of time and money needed to make sure the content remains findable, usable and generally of value over time. That investment starts at day one when content is created by digital capture of an original work or through digital copying of an analogue work (digitisation).

For the Make it Digital guides, good practice is not necessarily about using stringent techniques, standards and equipment (although they can often get a better result). Good practice is more about good planning and having at the outset a good understanding of what is involved in managing digital content over an extended period. Digital content does not have the physical cues of analogue content to remind us that we need to look after it. Digital content also loses all value forever without software and a machine that can interpret it. These factors have to be addressed by good practice, in particular around managing the digital content lifecycle and making use of open digital standards that will last.

The digital content life cycle

The Make it Digital guides are organised around a digital content life cycle that shows a connected series of stages that should be managed for any digital content initiative. The life cycle below emphasises that, with good practice, digital content can remain used and useful for an indefinite period. The life cycle also requires us to think differently about what it means to have original content and what it means to preserve content digitally.

Digital Life Cycle

The digital content life cycle has seven stages:

  1. Selecting: for analogue or new content, selecting what should be made digital
  2. Creating: putting content in a form to make it usable
  3. Describing: describing content so it can be organised
  4. Managing: managing content to keep it usable and available
  5. Discovering: organising content to make it findable
  6. Using & Reusing: ensuring content can be used and re-purposed
  7. Preserving: managing content to keep it usable and available long-term.

Each of these stages forms the basis of discussion for a Make it Digital guide. The guides in each of the following sections walk through a discussion of the basics you need to know about each stage of the life cycle, some principles to bear in mind for planning, and the standards, tools and resources we think can support you in developing good practice.

The role of open standards

Digital technologies tend to disrupt the old rules of the analogue world. As our use of digital content grows it is vital to have agreement on new rules to govern the digital world. For those rules to be effective, they need to be well described, readily available to anyone, and widely adopted. These characteristics form the core of open standards, and are likely to be the best chance for digital content to remain usable in the long-term.

Arguably the biggest success story in the use of open standards has been the development of the internet and specifically the World Wide Web. Open web standards such as HTTP, HTML, URL, XML and JPEG have ensured that the most fundamental web technologies have been able to operate across any software or hardware environment.

The principles of open web standards can be readily translated to other aspects of digital content, such as content formats, metadata schemas, repository and database structures. They allow software writers and hardware developers to incorporate the standards into their designs, while open availability encourages widespread use in a variety of different contexts. Open standards are a viable strategy for overcoming the impact of technology obsolescence.

Minimum characteristics of open standards

There is no one definition of what makes a standard ‘open’. As a minimum, we recommend that digital content creators and managers look for software, hardware, schemes and formats that have the following three open characteristics:

  1. the description and specification for the standard is publicly documented and available
  2. the standard can be implemented or used free of any royalties, contracts or patent licence fees
  3. the standard is in common or mainstream use, including by organisations with long-standing reputations

These characteristics require more than just popular use – for example, Microsoft’s Word .doc format and Fraunhofer’s .mp3 format are both proprietary format standards despite their popularity (while the less popular OpenDocument format and Ogg Vorbis format are open format standards). While the TIFF image format is owned by Adobe Systems Incorporated, it is publicly documented, free of royalties and in common use, giving it a sufficient degree of openness for it to be recommended.

Ideal characteristics of open standards

In addition to the above minimum characteristics, open standards are ideally:

  • in common or mainstream use in multiple countries
  • endorsed or approved by a formal standards body (e.g. ISO, W3C)
  • issued and maintained by a non-profit body independent of commercial or pecuniary interests
  • developed and agreed by a consensus of interested parties

In practice, it can take many years for open standards to emerge that have all of the ideal characteristics, and even longer for them to be supported in software and hardware. Furthermore, as patents expire, some published proprietary standards can effectively become open.

In the Make it Digital guides, we have focused on identifying standards that meet the minimum characteristics of openness described above. In some cases however, it will be difficult or impossible to follow an open standard. Good documentation and common use – often characteristics of ‘industry’ standards – may then have to be a necessary compromise until an open standard emerges.

Seven good practice tips for making it digital

The best digital technology cannot overcome poor decision-making and implementation. Content creators and owners should carefully consider and plan for all the stages of a digital project before spending the first dollar in bringing it to life.

For those who are just getting started with digitising or creating new content using digital formats, we have put together seven good practice tips that can help make it digital.

Tip 1: Have a clear purpose for your content

Making content digital doesn’t automatically make it of value. Broad plans to digitise or make large volumes of digital content available without researching who will use the content and why, may result in an expensive under-used resource. The volume of content possible digitally is so large that being selective can greatly improve the value of what you are creating. For instance, one image of a hat described as belonging to Katherine Mansfield is likely to be of more value than several random hat images with no descriptions.

A vital first step is to select and match your content to an identified need. You should undertake some basic research to identify the purpose of your digitisation. Be clear about what outcome you expect – for example are you aiming to protect original items by digitising them? Is it more important to teach people how to create digital videos or permanently keep access to the videos that they make? Are you expecting an orderly and structured user experience or will you encourage any quantity of diverse content to be created for searching through?

Tip 2: Choose appropriate formats for creating content

Knowing your purpose and expected outcome for users are two basics in getting started with a digital project, and both should inform your choice of technology and formats for creating content.

Whether digitising content or creating new digital content, your hardware decisions should be based on fit for purpose. Flat-bed scanners will generally do a better job of copying photographic prints and text than a free-standing digital camera, while dedicated scanners for books and film will come up with a better result for those media. Audio recorders should if possible create easily editable lossless formats, while video cameras should allow an option of directly exporting the native format from the recording medium for archiving and later editing.

Using lossless or uncompressed formats for copying or creating source files will allow the greatest flexibility for making edits and access copies while keeping the digital master safely archived. Choosing openly published interoperable formats also provides the best chance of your content being usable across different software platforms and into the future.

Tip 3: Set aside resources to describe your content

It’s not uncommon for a digital content project to start out by digitising or creating new content only to find later that the hundreds of digital items made are hard to navigate or sort. Having a strategy for unique naming of files and for embedding basic descriptive information into digital objects is an essential part of making digital content usable.

Descriptive information, or metadata, has always been important in professional collection management. Before digital storage there was always an option for users to browse shelves or filing cabinets to find what they wanted. Today the huge volumes of digital content being created means browsing by simple date, alphabetical or numerical order quickly becomes impractical. Most of this content is also not managed by information professionals. To ensure your digital content can be stored, found and used over time, it needs to have good metadata attached or tagged to it that describes what the content is, where it came from, and who can use it. If you find you are generating digital content with filenames such as IMG_001.JPG, 1.WAV, or UNTITLED.AVI, rename them now!

Tip 4: Work out in advance how your content will be managed

Any digital content that amounts to more than a few dozen items needs to be managed as a collection. This means continuing to develop it over time by adding, removing and updating content as required, and being prepared to migrate the individual items between different hardware and software platforms. It also means planning for ongoing back-up and maintenance, along with using a repository, content management or database system that supports open standards. It will be easier to plan and make these decisions ahead of your content creation so that you have a realistic view of the resources and formats required for the volume of content being generated.

Part of management is also deciding who has authority to access and make changes to the collection and who has overall responsibility and ownership of the content within. Although someone with IT skills may be needed to administer the hardware and software, a different set of skills is needed to make judgements about changes to the content. This process can be assisted by the creation of a collection policy that documents what the collection is about, how changes are made, who can access the collection and who is responsible overall.

Tip 5: Structure your content for easy discovery

Once the content you’ve created has been described and properly stored in a repository, database or content management system, it is important that users can easily access the information contained within. Without physical cues like shelf numbers, file colours or box size, it can be very difficult for anyone not familiar with your content to quickly search through your collection and find what they need. Understanding how users are going to arrive at your content and discover what you have for them is a critical step towards ensuring a well-used resource.

If you have web-based content available to the public, your web-site or content system should be designed to expose your collections to search engines. The vast majority of public users will come to your content through a commercial search engine like Google or Yahoo, not through your front page URL. Optimising your web content so search engines can index it will ensure your content can be discovered more easily.

For users who come directly to your front page, or where you have authentication or a login that restricts access to subscribers, navigation aids and search tools should be designed to help expose content that a searcher may not know you hold. A blank search box or an A-Z structure is off-putting for users that don’t know exactly what they are looking for. Discovery features like tag clouds, multiple navigation choices, showcases and widgets can help your content be found more easily.

Tip 6: Inform potential users of what they can do with your content

A decade ago most digital content users were using computers less powerful than some of today’s mobile phones, and accessed content with a CD-ROM drive or dial-up internet connection. Today’s PCs have high-resolution monitors, powerful processors and are likely to be connected to the internet via broadband. As a result, the expectations users have of their experience and interaction with your content is much different – they want to download, interact, copy and re-use content for their own purposes and projects, and the web is often the first place they go.

With these expectations in mind, users need to be instantly informed about what they can and can’t do with your content rather than being given blanket legal statements, locked down formats or requirements to send off for written permission. Having clear rights statements and where possible licence statements that focus on permitted behaviour rather than prohibited behaviour will help protect the integrity of your content while encouraging users to return to you for their next experience.

Tip 7: Implement a backup and long term storage plan

Analogue print, audio and moving image technologies are centred on the production of physical copies of content that, when stored appropriately, only decay gradually over time. Damage is usually visible to the eye, and when discovered the content can often be restored or at least made sense of. Digital technologies, in contrast, rely solely on machine-readable media that can decay or fail rapidly without visible signs of damage. Such failure can happen in days, weeks or months, and may lead to the complete and irreversible loss of your content. That means back-up and long-term storage options have to be a core part of your digital project planning from day one.

A back-up requires you to have a minimum of two copies of your content – your currently accessed content and a separate, up to date copy. Back-ups should be planned daily or weekly depending on the volumes of content available or being created, and should ideally involve a further off-site back-up to protect against theft, fire or natural disaster.

Long-term storage requires a different strategy from back-up. It involves planning for archival copies of your content to be migrated between different storage media over time, and a contingency for transfer of content to another agency should your organisation or service face closure. This is where your choices of appropriate formats, descriptions, collections policy and rights statements really come into their own.