Using Dublin Core
Using Dublin Core
net/publication/277237038
CITATIONS READS
74 1,287
1 author:
Diane I. Hillmann
Metadata Management Associates LLC
49 PUBLICATIONS 849 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Diane I. Hillmann on 27 October 2015.
Home > Documents > 2001 > 04 > 12 > Usageguide > Enter keyword Search
TABLE OF CONTENTS
1. Introduction
z 2.1. HTML
z 2.2. RDF/XML
z 2.3. Metadata Contained in a Resource
z 2.4. Stand-Alone Metadata
5. Qualifiers
6. Examples
7. Glossary
1. INTRODUCTION
Metadata has been with us since the first librarian made a list of the items on a shelf of handwritten scrolls. The term "meta" comes from a
Greek word that denotes "alongside, with, after, next." More recent Latin and English usage would employ "meta" to denote something
transcendental, or beyond nature. Metadata, then, can be thought of as data about other data. It is the Internet-age term for information that
librarians traditionally have put into catalogs, and it most commonly refers to descriptive information about Web resources.
A metadata record consists of a set of attributes, or elements, necessary to describe the resource in question. For example, a metadata system
common in libraries -- the library catalog -- contains a set of metadata records with elements that describe a book or other library item: author,
title, date of creation or publication, subject coverage, and the call number specifying location of the item on the shelf.
The linkage between a metadata record and the resource it describes may take one of two forms:
1. elements may be contained in a record separate from the item, as in the case of the library's catalog record; or
2. the metadata may be embedded in the resource itself.
Examples of embedded metadata that is carried along with the resource itself include the Cataloging In Publication (CIP) data printed on the
verso of a book's title page; or the TEI header in an electronic text. Many metadata standards in use today, including the Dublin Core standard,
do not prescribe either type of linkage, leaving the decision to each particular implementation.
Although the concept of metadata predates the Internet and the Web, worldwide interest in metadata standards and practices has exploded with
the increase in electronic publishing and digital libraries, and the concomitant "information overload" resulting from vast quantities of
undifferentiated digital data available online. Anyone who has attempted to find information online using one of today's popular Web search
services has likely experienced the frustration of retrieving hundreds, if not thousands, of "hits" with limited ability to refine or make a more
precise search. The wide scale adoption of descriptive standards and practices for electronic resources will improve retrieval of relevant
resources from the "Internet commons." As noted by Weibel and Lagoze, two leaders in the field of metadata development:
"The association of standardized descriptive metadata with networked objects has the potential for substantially improving
resource discovery capabilities by enabling field-based (e.g., author, title) searches, permitting indexing of non-textual objects, and
allowing access to the surrogate content that is distinct from access to the content of the resource itself." (Weibel and Lagoze,
1997)
It is this need for "standardized descriptive metadata" that the Dublin Core addresses.
The Dublin Core metadata standard is a simple yet effective element set for describing a wide range of networked resources. The Dublin Core
standard comprises fifteen elements, the semantics of which have been established through consensus by an international, cross-disciplinary
group of professionals from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship.
Another way to look at Dublin Core is as a "small language for making a particular class of statements about resources" (Baker, 2000). In this
language, there are two classes of terms--elements (nouns) and qualifiers (adjectives)--which can be arranged into a simple pattern of
statements. The resources themselves are the implied subjects in this language. In the diverse world of the Internet, Dublin Core can be seen as
a "metadata pidgin for digital tourists": easily grasped, but not necessarily up to the task of expressing complex relationships or concepts.
The Dublin Core element set is outlined in Section 4. Each element is optional and may be repeated. Each element also has a limited set of
qualifiers, attributes that may be used to further refine (not extend) the meaning of the element. The Dublin Core Metadata Initiative (DCMI)
has defined standard ways to "qualify" elements with various types of qualifiers. A set of recommended qualifiers conforming to DCMI "best
practice" is available, with a formal registry in process..
Although the Dublin Core favors document-like objects (because traditional text resources are fairly well understood), it can be applied to other
resources as well. Its suitability for use with particular non-document resources will depend to some extent on how closely their metadata
resembles typical document metadata and also what purpose the metadata is intended to serve. (Implementors interested in using Dublin Core
for diverse resources are encouraged to browse the Dublin Core Projects pages for ideas on using Dublin Core metadata for their resources.)
The Dublin Core element set has been kept as small and simple as possible to allow a non-specialist to create simple descriptive
records for information resources easily and inexpensively, while providing for effective retrieval of those resources in the
networked environment.
Discovery of information across the vast commons of the Internet is hindered by differences in terminology and descriptive
practices from one field of knowledge to the next. The Dublin Core can help the 'digital tourist' -- a non-specialist searcher -- find
his or her way by supporting a common set of elements, the semantics of which are universally understood and supported. For
example, scientists concerned with locating articles by a particular author, and art scholars interested in works by a particular artist,
can agree on the importance of a "creator" element. Such convergence on a common, if slightly more generic, element set
increases the visibility and accessibility of all resources, both within a given discipline and beyond.
International scope
The Dublin Core Element Set was originally developed in English, but versions are being created in many other languages,
including Finnish, Norwegian, Thai, Japanese, French, Portuguese, German, Greek, Indonesian, and Spanish. The Special Interest
Group on Dublin Core in Multiple Languages is coordinating efforts to link these versions in a distributed registry using the
Resource Description Framework technology being developed by the World Wide Web Consortium (W3C).
Although the technical challenges of internationalization on the World Wide Web have not been directly addressed by the Dublin
Core development community, the involvement of representatives from almost every continent has ensured that the development
of the standard considers the multilingual and multicultural nature of the electronic information universe.
Extensibility
While balancing the needs for simplicity in describing digital resources with the need for precise retrieval, Dublin Core developers
have recognized the importance of providing a mechanism for extending the DC element set for additional resource discovery
needs. It is expected that other communities of metadata experts will create and administer additional metadata sets. Metadata
elements from these sets could be linked with Dublin Core metadata to meet the need for extensibility. This model allows different
communities to use the DC elements for core descriptive information which will be usable across the Internet, while allowing
domain specific additions which make sense within a more limited arena. Specific instructions for implementing such a model are
currently under development.
This document is intended to be an entry point for users of Dublin Core. For non-specialists, it will assist them in creating simple descriptive
records for information resources (for example, electronic documents, JPEG images, video clips). Specialists may find the document a useful
point of reference to the documentation of Dublin Core, as it changes and grows.
The guide will show in a non-technical fashion how Dublin Core metadata may be used by anyone to make their material more accessible. This
guide discusses the layout and content of Dublin Core metadata elements, how to use them in composing a complete Dublin Core metadata
record, as well as how to qualify elements to support use by a wide variety of communities.
Another important goal of this document is to promote "best practices" for describing resources using the Dublin Core element set. The Dublin
Core community recognizes that consistency in creating metadata is an important key to achieving complete retrieval and intelligible display
across disparate sources of descriptive records. Inconsistent metadata effectively hides desired records, resulting in uneven, unpredictable or
incomplete search results.
2. Syntax Issues
In this guide, we have chosen to represent Dublin Core examples in several different syntaxes, including: HTML ( the Web's Hypertext
Markup Language format), RDF/XML (the Resource Description Framework using eXtensable Markup Language) and in a generic form
(Element="value"). HTML provides an easily understood format for demonstrating Dublin Core's underlying concepts, but more complex
applications using qualification may find that using RDF/XML makes more sense. When considering an appropriate syntax, it is important to
note that Dublin Core concepts are equally applicable to virtually any file format, as long as the metadata is in a form suitable for interpretation
both by search engines and by human beings.
2.1. HTML
"Encoding Dublin Core Metadata in HTML" (Kunze, 1999) provides guidance for using HTML with unqualified Dublin Core, whether the
metadata be embedded in the resource or in a separate file.
HTML can also be used to express qualified Dublin Core, although there are limitations inherent in doing so. The current thinking on how this
might best be accomplished is contained in the working draft: Recording qualified Dublin Core metadata in HTML meta elements.
2.2. RDF/XML
RDF (Resource Description Framework) allows multiple metadata schemes to be read by humans as well as parsed by machines. It uses XML
(EXtensible Markup Language) to express structure thereby allowing metadata communities to define the actual semantics. This decentralized
approach recognizes that no one scheme is appropriate for all situations, and further that schemes need a linking mechanism independent of a
central authority to aid description, identification, understanding, usability, and/or exchange.
RDF allows multiple objects to be described without specifying the detail required. The underlying glue, XML, simply requires that all
namespaces be defined and once defined, they can be used to the extent needed by the provider of the metadata.
For example:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://media.example.com/audio/guide.ra">
<dc:creator>Rose Bush</dc:creator>
<dc:title>A Guide to Growing Roses</dc:title>
<dc:description>Describes process for planting and nurturing different kinds of rose bushes.</dc:description>
<dc:date>2001-01-20</dc:date>
</rdf:Description>
</rdf:RDF>
This simple example uses Dublin Core by itself to describe an audio recording of a guide to growing rose bushes. With XML and RDF, Dublin
Core can now be mixed with other metadata vocabularies. For example, the simple Dublin Core description above might be used alongside
other vocabularies such as vCard that can describe the author's affiliation and contact information, or a more specialised "rose description"
vocabulary that described the rose bushes in greater detail.
Some implementations using Dublin Core have chosen to embed their metadata within the resource itself. This approach is taken most often
with documents encoded using HTML, but is also sometimes possible with other kinds of documents. Simple tools have been developed to
make provision of Dublin Core metadata within HTML encoded pages fairly easy. One such tool, DC.dot, extracts metadata information from
an HTML document, and formats it so that it can be edited, then cut and pasted back into the HTML header of the original document.
Stand-alone metadata can exist in any kind of database, and generally provides a link to the described resource. This approach is likely to be
most practical for many non-textual resources, and is increasingly used for text as well, primarily to support easier maintenance and sharing of
metadata.
Each element is optional and repeatable. Metadata elements may appear in any order. The ordering of multiple occurrences of the same
element (e.g., Creator) may have a significance intended by the provider, but ordering is not guaranteed to be preserved in every user
environment. For instance, RDF/XML supports ordering, but HTML does not.
Content data for some elements may be selected from a "controlled vocabulary," which is a limited set of consistently used and carefully
defined terms. This can dramatically improve search results because computers are good at matching words character by character but weak at
understanding the way people refer to one concept using different words, i.e. synonyms. Without basic terminology control, inconsistent or
incorrect metadata can profoundly degrade the quality of search results. For example, without a controlled vocabulary, "candy" and "sweet"
might be used to refer to the same concept. Controlled vocabularies may also reduce the likelihood of spelling errors when recording metadata.
One cost of a controlled vocabulary is in needing an administrative body to review, update and disseminate the vocabulary. For example, the
US Library of Congress Subject Headings (LCSH) and the US National Library of Medicine Medical Subject Headings (MeSH) are formal
vocabularies, indispensable for searching rigorously cataloged collections. However, both require significant support organizations. Another
cost is having to train searchers and creators of metadata so that they know when using MeSH, for example, to enter "myocardial infarction"'
instead of the more colloquial "heart attack."
This section lists each Core element by its full name and label. For each element there is a reference description (DCMES 1.1) and there are
guidelines to assist in creating metadata content, whether it is done "from scratch" or by converting an existing record in another format. Links
to examples and to recommended Dublin Core Qualifiers for each element are also provided.
The elements are listed in the order they were developed, but there are other useful ways to group them. In the following table, you can see that
some elements relate to the content of the item, some to the item as intellectual property, still others to the particular instantiation, or version, of
the item.
Subject
Title
5. Qualifiers
In July of 2000, the Dublin Core Metadata Initiative issued its list of recommended Dublin Core Qualifiers. At the time of the ratification of
these qualifiers, the DCMI recognized two broad classes of qualifiers:
z Element Refinement. These qualifiers make the meaning of an element narrower or more specific. A refined element shares the
meaning of the unqualified element, but with a more restricted scope. A client that does not understand a specific element refinement
term should be able to ignore the qualifier and treat the metadata value as if it were an unqualified (broader) element. The definitions of
element refinement terms for qualifiers must be publicly available.
z Encoding Scheme. These qualifiers identify schemes that aid in the interpretation of an element value. These schemes include controlled
vocabularies and formal notations or parsing rules. A value expressed using an encoding scheme will thus be a token selected from a
controlled vocabulary (e.g., a term from a classification system or set of subject headings) or a string formatted in accordance with a
formal notation (e.g., "2000-01-01" as the standard expression of a date). If an encoding scheme is not understood by a client or agent,
the value may still be useful to a human reader. The definitive description of an encoding scheme for qualifiers must be clearly identified
and available for public use.
The use of qualifiers as an additional level of detail introduces the situation where a client can encounter collections of resources that are
described using Dublin Core with qualifiers that are unknown to the client application. This can happen either because the client does not
support qualifiers and the collection does, or the collection supports specialized qualifiers developed by implementors for specific local or
domain needs.
The useful interpretation of such descriptions will depend on the ability to ignore the unknown qualifiers and fall back on the broader meaning
of the element in its unqualified form. The guiding principle for the qualification of Dublin Core elements, also known as the "Dumb-Down
Principle," is that a client should be able to ignore any refinement and use the description as if it were unqualified. While this may result in
some loss of specific meaning, the remaining element value (minus the qualifier) must continue to be generally correct.
6. Glossary
Copyright © 1995-2001 DCMI All Rights Reserved. DCMI liability, trademark/service mark, document use and software licensing rules apply.
Your interactions with this site are in accordance with our privacy statements. Please feel free to contact us for any questions, comments or
media inquiries.
file://\\148.226.9.45\sitioweb\temporal\Baucis\Catalogados\baucis 0012.htm
View publication stats
19/06/2009