Metadata Interoperability: A Study of Methodology

Lois Mai Chan, Ph.D
Professor, School of Library and Information Science
University of Kentucky
U.S.A.
loischan@uky.edu

ABSTRACT: The rapid growth of Internet resources and digital collections and libraries is accompanied by a proliferation of metadata schemas. Each metadata schema has been designed based on the requirements of the particular user community, intended users, type of materials, subject domain, the depth of description, etc. Problems arise when building large digital libraries or repositories with metadata records prepared according to diverse schemas. Most users do not and should not have to know or understand the underlying structure of the digital collection; but in reality, they are experiencing difficulties in resource discovery and access. How to enable a “one-stop” seamless search presents considerable challenges. This presentation reviews some of the methods that have been or are currently used to achieve or improve interoperability among metadata schemas.

Introduction

Recent decades have witnessed a proliferation of metadata schemas for description of digital resources. Each has been designed based on the requirements of the particular user community, intended users, type of resources, depth of description, etc. Problems arise when building a large digital library or repository with participants using different description methods or metadata records prepared according to diverse schemas. The diversity of standards for description of resources of many different types poses particular challenges to the users as well as for those who are responsible for managing these resources. In Roy Tennant’s words, “Users should be able to discover through one search what digital objects are freely available from a variety of collections, rather than having to search each collection individually” (Tennant 2001). Users should not have to know or understand the methods used to describe and represent the contents of the digital collection; but in reality, they are experiencing difficulties in retrieval. How to enable the sharing and exchange of data and facilitate, for the user, a “one-stop” seamless search, also referred to as “federated search,” presents considerable challenges. To achieve that, the different metadata schemas must be made interoperable to enable conversion and exchange of data. Thus, the purpose of interoperability is to facilitate the exchange and sharing of data prepared according to different metadata schemas and to enable cross-domain searching.

In recent literature, a great deal has been written about interoperability between and among different metadata schemas. This presentation reviews and analyzes some of the methods currently used to achieve interoperability.

Definition of Interoperability

Many attempts have been to define the concept of interoperability. A few examples are given below:

“The ability of multiple systems, using difference hardware and software platforms, data structures, and interfaces, to exchange and share data” (NISO 2004)

“The ability of two or more systems or components to exchange information and use the exchanged information without special effort on either system” (ALCTS 2004)

“The compatibility of two or more systems such that they can exchange information and data and can use the exchanged information and data without any special manipulation” (Taylor 2004)

Trends and Models of Metadata Interoperability Projects

In recent years, numerous projects have been undertaken by the many players and stakeholders in the information community to achieve interoperability among different metadata schemas. Some of these efforts are outlined below. This paper focuses on descriptive metadata and presents an analysis of the conceptual models used to achieve interoperability. Seven of these models are discussed below:

Uniform standard
Application profiling/adaptation/modification
Derivation
Crosswalk/mapping
Switching schema
Lingua franca
Metadata framework/container

It should be noted that these models are not mutually exclusive. Sometimes, within a particular project, we may see more than one model being used.

Uniform Standard

In this approach, all participants of a consortium, repository, etc., use the same schema, such as MARC/AACR or the Dublin Core. By using the same standard, a high level of consistency can be maintained. This, of course, has been the approach in the library community for over a century. It is the ultimate solution to the interoperability problem. However, although it is a conceptually simple solution, it is not always feasible or practical, particularly in heterogeneous environments serving different user communities where components or participating collections contain different types of resources already described by a variety of specialized schemas. This method is only viable at the beginning or early stages of building a digital library or repository, before different schemas have been adopted by different participants of the collection or repository. Examples of uniform standardization include the MARC/AACR standards used in union catalogs of library collections and the Electronic Theses and Dissertations Metadata Standard (ELD-MS) based on the Dublin Core used by members of the Networked Digital Library of Theses and Dissertations (NDLTD).

Application Profiling/Adaptation/Modification

In the heterogeneous information environment, different communities manage information that has different characteristics and requirements. There often is no one metadata schema that meets all needs, that is, “one-size-does-not-fit-all.” To accommodate individual needs, in this approach, an existing schema is used as the basis for description in a particular digital library or repository, while individual needs are met through specific guidelines or through adaptation or modification by:

Creating an application profile (a set of policies) for application by a particular interest group or user community.
Adpating an existing schema with modification to cater to local or specific needs, that is, a DTD of an existing schema

This model ensures a similar basic structure and common elements, but with varying depths and details. Examples of application profiling include the Library-Application Profile (for using Dublin Core) and the Biological Data Profile of the National Biological Information Infrastructure (NBII), which is based on FGDC/CSDGM (Content Standard for Digital Geospatial Metadata) of the Federal Geographic Data Committee. Examples of adaptation/modification include:

Canadian adaptation of Dublin Core
GEM adaptation of Dublin Core
ETD-MS (using 13 Dublin Core elements and an additional element)

Derivation

In a collection of digital databases where different components have different needs and different requirements regarding depths, an existing complex schema such as the MARC format may be used as the “source” or “model” from which new and simpler individual schemas may be derived. This approach would ensure a similar basic structure and common elements, while allowing different components to vary in depth and details. For example, both the MODS (Metadata Object Description Schema) and MARC Lite are derived from the MARC21 standard, and the TEI Lite is derived from the full Text Encoding Initiative (TEI).

Crosswalk/Mapping

A crosswalk is defined as “a mapping of the elements, semantics, and syntax from one metadata scheme to those of another” (NISO 2004). This is by far the most common method used to enable interoperability between and among metadata schemas. The predominant method used is direct mapping or establishing equivalency between and among elements in different schemas. Equivalent fields or elements are mapped in order to allow conversion from one to the other. Most of the crosswalk effort to date has been in the form of mapping between two metadata schemas; mapping among multiple schemas has also been attempted.

There have been a substantial number of crosswalks. Some examples are:

MARC21 to Dublin Core
MARC to UNIMARC
VRA to Dublin Core
ONIX for books to MARCXML
FGDC to MARC
EAD to ISAD(G)
ETD-MS to MARCXML
Dublin Core/MARC/GILS
ADL/FGDC/MARC/GILS
MARC/LOM/DC
Etc., etc., etc.

The crosswalk approach appears to be more workable when mapping from complex to simpler schema – in other words, a “one way street.” An example is the crosswalk between the Dublin Core and MARC. Because of different degree of depth and complexity, crosswalk works relatively well when mapping MARC fields to Dublin Core elements but not vice versa, because MARC is a much more complex schema. One of the problems identified by Marcia L. Zeng is the different degrees of equivalency: one-to-one, one-to-many, many-to-one, and one-to-none (Zeng 2001). Also, while crosswalk works well when the number of schemes involved is small, mapping among multiple schemas is not only extremely tedious and labor intensive, but requires enormous intellectual effort. For example, a one-way crosswalk requires one mapping process (A-->B), and a two-way crosswalk requires two mapping processes (A-->B and B-->A). The process becomes more and more cumbersome the more schemas are involved. For example, a crosswalk involving three schemas would require six (or three pairs of) mapping processes, a four-schema crosswalk would require twelve (or six pairs of) mapping processes, and a five-schema crosswalk would require twenty mapping processes.

Switching Schema

In this model, an existing schema is used as the switching mechanism among multiple schemas. Instead of mapping between every pair in the group, each of the individual metadata schemas is mapped to the switching schema. This model reduces drastically the number of mapping processes required. The switching schema usually contains elements on a fairly broad level. Examples of using switching schemas include the Picture Australia project and the Open Archive Initiative (OAI). Both use the Dublin Core as the switching schema.

The Picture Australia project is a digital library project encompassing a variety of institutions including libraries, the National Archives, and the Australian War Memorial, many of which came with legacy metadata records. Records from participants are collected in a central location (the National Library of Australia) and then translated into a “common record format,” with fields based on the Dublin Core (Tennant 2001). The OAI stipulates that “it is compulsory that all open archives be able to generate metadata for all resources in unqualified Dublin Core (DC)…This will ensure that service providers who do not understand any other metadata format will at least be able to glean the basic information about resources from their DC renditions.” (Suleman and Fox 2001).

Lingua Franca

If no existing schema is found to be suitable for use as a switching schema, an alternative is the use of a lingua franca. A lingua franca acts as a superstructure, but is not a “schema” in itself. In this method, multiple existing metadata schemas are treated as satellites of a superstructure (lingua franca) which consists of elements common or most widely used by individual metadata schemas. This model facilitates cross-domain searching, but is not necessarily helpful in data conversion or data exchange. However, the lingua franca model allows the retention of the richness and granularity of individual schemas.

The lingua franca superstructure is built from a set of core attributes that are common to many or most of the existing schemas used by participants in a digital library or repository. An example is the ROADS template, which uses a set of broad, generic attributes.

The question is, then: how does one determine what the “most common attributes” are? A possibility is to make use of the core attributes, identified by the IFLA Working Group on the Use of Metadata Schemas (IFLA 2003), as occurring in the most widely used metadata schemas. These common core attributes are: Subject, Date, Conditions of use, Publisher, Name assigned to the resource, Language/mode of expression, Resource identifier, Resource type, Author/creator, and Version. The results of a survey conducted by the IFLA Working Group indicate that certain elements are more universally or frequently occurring than others. Based on this evidence, it could be argued that in a particular environment (a digital library, a repository, etc.), a consistent or central index (NISO, p. 2) or a combined index -- a master index merging the most commonly occurring elements in various metadata schemas from different collections -- can be used as a tool for federated searches. Such an index enables a layered service, offering access at a high level, involving an entire digital library or repository, while at the same time allowing the browsing of rich metadata descriptions within individual collections.

This model may be applied to different information environments, print, visual, audio, geospatial, etc. The common attributes shared by components or participants within a particular environment can be defined according to their user needs. For example, in a multilingual environment, it is expected that language would be an important attribute; and in an environment encompassing resources from various parts of the world, geographic location would be significant.

Some of the advantages of defining and using a set of core attributes are:

Each digital collection within the same repository or gateway can use its own chosen metadata elements for more detailed description
A common interface provides uniform searching capabilities for participating or member collections
The master index provides access points to all participating collections
The richness and granularity of the metadata contents of individual components of the digital repository can be retained, that is, no dumbing down would be necessary

The master index allows the users to enter from a common search interface and be directed to the appropriate component(s) or service(s) within the digital library, where the user may browse the rich description in the metadata records contained in the individual components parts, which may have been prepared according to different metadata schemas. Or, a further, more refined search utilizing the unique elements such as controlled vocabulary, publisher name, condition of use, etc. in the individual metadata schema may be conducted.

Metadata Framework/Container

In this approach, a metadata framework is used as a shell or container within which elements from multiple metadata schemas can be accommodated. Two prominent examples are discussed below:

Resource Description Framework (RDF)

The Resource Description Framework (RDF) is a data model developed by the World Wide Web Consortium (W3C) for the description of resources on the Web that “provides a mechanism for integrating multiple metadata schemes” (NISO 2004). Expressed in XML, multiple namespaces may be defined to allow elements from different schemas to be combined in a single resource description. An RDF record links multiple descriptions, created at different times for different purposes, to each other. The following example shows how different metadata schemas (as indicated by namespaces) can be packaged together (Iannella 1999):

<? xml version="1.0" ?>
<RDF xmlns = "http://w3.org/TR/1999/PR-rdf-syntax-19990105#"
xmlns:DC = "http://purl.org/DC#"
xmlns:AGLS = "http://naa.gov.au/AGLS#">
<Description about = "http://dstc.com.au/report.html">
<DC:Title> The Future of Metadata </DC:Title>
<DC:Creator> Jacky Crystal </DC:Creator>
<DC:Date> 1998-01-01 </DC:Date>
<DC:Subject> Metadata, RDF, Dublin Core </DC:Subject>
<AGLS:Function> Information Management – Internet /AGLS:Function>
</Description>
</RDF>

Metadata Encoding and Transmission Standard (METS)

The Metadata Encoding and Transmission Standard (METS) is a standard for packaging descriptive, administrative, and structural metadata into one XML document for interactions with digital repositories. It provides a framework for combining several internal metadata structures with external schemas (such as MODS or MIX). It is “a standard that provides a method to encapsulate all the information about an object—whether digital or not” (Tennant May 15, 2004).

The descriptive metadata section may point to descriptive metadata external to the METS document (e.g., a MARC record in an OPAC or an EAD finding aid maintained on a WWW server), or contain internally embedded descriptive metadata, or both. Multiple instances of both external and internal descriptive metadata may be included in the descriptive metadata section. The following example shows a file section from a digital library object for an oral history which has three different versions: a TEI-encoded transcript, a master audio file in WAV format, and a derivative audio file in MP3 format (METS 2004):

<fileSec>
<fileGrp ID="VERS1">
<file ID="FILE001" MIMETYPE="application/xml" SIZE="257537" CREATED="2001-06-10">
<FLocat LOCTYPE="URL">http://dlib.nyu.edu/tamwag/beame.xml</FLocat>
</file>
</fileGrp>
<fileGrp ID="VERS2">
<file ID="FILE002" MIMETYPE="audio/wav" SIZE="64232836" CREATED="2001-05-17“ GROUPID="AUDIO1">
<FLocat LOCTYPE="URL">http://dlib.nyu.edu/tamwag/beame.wav</FLocat>
</file>
</fileGrp>
<fileGrp ID="VERS3" VERSDATE="2001-05-18">
<file ID="FILE003" MIMETYPE="audio/mpeg" SIZE="8238866" CREATED="2001-05-18“ GROUPID="AUDIO1">
<FLocat LOCTYPE="URL">http://dlib.nyu.edu/tamwag/beame.mp3 </file>
</fileGrp>
</fileSec>

Conclusion – Trends

In the open, networked environment encompassing multiple user communities using a multitude of standards for description of digital resources, the need for interoperability among metadata schemas is over-riding. Currently, mapping metadata schemas still require enormous effort even with all the assistance computer technology can provide. If the information community is to provide optimal access to all the information available across the board of digital libraries and depositories, information professionals must give high priority to the task of creating—and maintaining—the highest feasible level of interoperability among extant and new information services.

REFERENCES

ALCTS/CCS/Committee on Cataloging: Description and Access Task Force on Metadata. (last updated 2004). Summary Report. http://www.libraries.psu.edu/tas/jca/ccda/tf-meta3.html

Guenther, Rebecca, and Sally McCallum. (2002). New metadata standards for digital resources: MODS and METS. ASIST Bulletin, 29(2).

Heery, Rachel M , Andy Powell, and Michael William Day. (Mar. 1998). Metadata: CrossROADS and interoperability [computer file]. Ariadne (Online) no. 14.

IFLA Working Group on the Use of Metadata Schemas. (2003). Guidance on the Structure, Content, and Application of Metadata Records for Digital Resources and Collections: Report of the IFLA Cataloguing Section Working Group on the Use of Metadata Schemas: Draft – For Worldwide Review. http://www.ifla.org/VII/s13/guide/metaguide03.pdf

Iannella, Renato (1999). An Idiot's Guide to the Resource Description Framework. http://archive.dstc.edu.au/RDU/reports/RDF-Idiot/

Johnston, Pete. (2003). Metadata and Interoperability in a Complex World [computer file]. Ariadne (Online) no37, p. 2.

McCallum, Sally H. (2003). Library of Congress metadata landscape. Zeitschrift für Bibliothekswesen und Bibliographie, 4.

METS: A Tutorial & Overview. (2004) http://www.loc.gov/standards/mets/METSOverview.v2.html

National Information Standards Organization. (2004). Understanding Metadata. http://www.niso.org/standards/resources/UnderstandingMetadta.pdf

Suleman, Hussein and Edward Fox. (2001) The Open Archives Initiative: Realizing Simple and Effective Digital Library Interoperability. Journal of Library Administration 35(1/2): 125-145.

Taylor, Arlene. (2004) The Organization of Information. 2nd ed. Westport, CN: Libraries Unlimited.

Tennant, Roy. (February 15, 2001) Different Paths to Interoperability. Library Journal 126(3):118-119.

Tenant, Roy. (May 15, 2004). It’s Opening Day for METS. Library Journal, 129 (9), 28.

Tenant, Roy. ((July 2004). Metadata’s Bitter Harvest. Library Journal, 129(12), 32.

Tenant, Roy. (Dec. 2003). The Engine of Interoperability. Library Journal 128 (20), 33

Tenant, Roy. (May 15, 2002). The Importance of Being Granular. Library Journal, 127(9), 32-33.

Zeng, Marcia Lei. (2001). Supporting Metadata Interoperability: Trends and Issues. In: Global Digital Library Development in the New Millennium. Ching-Chih Chen ed. Beijing: Tsinghua University Press. pp. 405-412.