Key Resources
Metadata enhancement
Digital New Zealand can easily connect to your content when you set up an XML sitemap.
But the structure and quality of what we find on those pages determines how easy it is for us to harvest metadata about it.
Make it Digital guide to describing digital content
Dublin Core and Digital New Zealand harvesting
DigitalNZ works by storing information about New Zealand content and providing access to descriptions, thumbnails and web links through web tools (see diagram of how DigitalNZ works).
Dublin Core is the primary structure we use behind the scenes in our “contributed metadata” store.
Your metadata will already be Dublin Core if we are harvesting it using OAI-PMH.
But when we collect information about your content using XML Sitemaps, we look for structure in your page mark-up that can be ‘translated’ (or mapped) into Dublin Core elements.
The structure can be in the body or the head of the page – and if it is Dublin Core already our job is made that much easier. But it doesn’t have to be.
What information does Digital New Zealand collect?
Digital New Zealand mainly collects information that translates to Dublin Core elements.
A title is the only piece of identifiable information you MUST have for your content. It will be displayed with the URL and your name as a content provider in search result lists. For images, a thumbnail URL is also necessary.
Rights information is also useful to what we are trying to achieve with Digital New Zealand. You can read more about that on our Kete rights page.
Ideally your content items should also have:
- A description
- Date information (often this will be the copyright date for the content)
- Creator information (the person or people who created the content originally)
- Coverage information (the location of the content, often a geographic placename)
- Subject(s) (an indication of what the content is about)
We also look for:
- Thumbnails to provide links to
- People who made the content available in its current form (the publisher)
- Language identification
- Type of content (in other words, the category it fits into such as Manuscripts)
In future we will display more of this information as facets on our search tool, and make it able to be queried using the DigitalNZ API.
More information about DigitalNZ search fields
Structuring the information for easy harvest
Although you may have structured information for humans to read, it doesn’t necessarily mean our harvester will easily be able to collect it.
This code, for example, displays useful information for a person reading about a content item but it lacks machine-readable structure, making it difficult for DigitalNZ to harvest:
<p><strong>Title:</strong> [Letter to Hazel]<br />
<strong>Date:</strong> 17 August 1914<br />
<strong>Pagination:</strong> Page 1 of 1<br />
<strong>Author:</strong> Cecil Malthus<br />
<strong>Format:</strong> Letter<br />
<strong>This is part of:</strong> Cecil Malthus : World War I papers [letters, telegrams, documents]</p>
…
You can make information about your content more machine readable by putting meta tags in the head of your pages, or identifying structure in tags in the body and using style-sheets.
Here are a couple of examples of how the above information could be identified:
In <meta> tags in the web page <head> section
<head>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/">
<meta name="DC.title" content="[Letter to Hazel]">
<meta name="DC.date" scheme=”ISO8601” content="1914-08-17">
<meta name="DC.creator" content="Malthus, Cecil">
<meta name="DC.format" content="letter">
</herd>
Advantage:
- Information in internationally standardised form.
- Other DC-harvesters can also collect this metadata.
- Metadata can be encoded using machine-readable values instead of human-readable ones.
Disadvantage:
- May duplicates data from page display (so you need to update it in two places if data changes).
- May not be appropriate for complex content pages with lots of hierarchies.
In the text in the <body> of the web pages
Ideally the lowest level tag should identify/suggest the raw field type in which the contained data (i.e. content information) should be stored.
This example uses span class attributes to identify data structure.
<p><strong>Title:</strong> <span property="dc:title">[Letter to Hazel]</span><br />
<strong>Date:</strong> <span property="dc:date">17 August 1914</span><br /> <strong>Pagination:</strong>Page 1 of 1<br /> <strong>Author:</strong> <span property="dc:creator">Cecil Malthus</span><br /> <strong>Format:</strong> <span property="dc:format">Letter</span><br /> <strong>This is part of:</strong> <span property="dc:description">Cecil Malthus : World War I papers [letters, telegrams, documents]</span></p>
Advantage:
- It is absolutely clear where data values start and end
- Flexibility - Your own class attributes can be used
Disadvantage:
- It may be more work to add this mark-up.
Improving information quality
When it comes to making New Zealand content easier to find, share, and use, data values are just as important as metadata structure.
In particular, the more consistent your information is with other content providers, the more discoverable it will be to users searching, browsing, or querying DigitalNZ content.
You can improve the quality of your metadata by:
- Using standard vocabularies and authorities wherever possible and appropriate, for consistency
- Repeating elements or structural units for separate pieces of data (for example don’t put all the keywords in one ‘key word’ tag, put them in separate tags)
- Not duplicating the same data in more than one place, as far as possible
- Using structural elements consistently (i.e. the same type of data appears in the same tag throughout your content)
- Including standardised rights and terms of use links.
Need more help?
Don’t hesitate to drop us a line if you want specific advice on how to improve the structure or quality of your metadata, or a free assessment of how easy it will be for DigitalNZ to collect information about your content.
