Wade-Giles to Pinyin Conversion:
Australian Experience and Local Issues
OCLC Users' Group Pinyin Conversion Task Force
June 21, 1999
This report attempts to address technical issues relating to the Wade-Giles to Pinyin conversion for the purpose of helping OCLC CJK member libraries understand the conversion process and assisting librarians prepare for the conversion. This report includes two parts: I) Summary of the Australian conversion project undertaken in 1996; II) A list of suggestions related to the local conversion issues, for which the task force recommends libraries start dialoguing with related local library staff as early as possible.
Part I. Summary of the Australian Conversion Project
In 1996, the National Library of Australia (NLA) developed a conversion program that automatically converted 500,000 CJK (Chinese, Japanese, and Korean) records from Wade-Giles romanization to Pinyin.
The conversion program NLA developed was written to perform major functions in sequences as following:
- Divide the bibliographic file into Chinese, Japanese, Korean, and other material based on the language code in the record (008 field of the MARC record). Since Japanese and Korean records contain fewer Chinese words, the program uses different logic to process these records. For example, in Japanese and Korean records the author (1xx) and uniform title (240) fields were processed for conversion, but not the title (245) field. For non-CJK material, records were not processed for pinyin conversion even though they might contain Chinese data.
- Identify Wade-Giles strings in selected USMARC fields and subfields (the list of fields/ subfields are different among CJK languages) that are likely to contain Wade-Giles words by referring to a "word-table" that lists all valid Wade-Giles words with their pinyin equivalents.
- Based on the "word-table", convert Wade-Giles strings to their pinyin equivalents. Leave the bibliographic data untouched when no Wade-Giles string is found in the record.
- Flag the field/subfield for human review if the program cannot decide whether the subfield contained Wade-Giles.
- Delete duplicate data, such as two identical title added entries resulting from the conversion.
Since the beginning of the conversion, NLA has been maintaining parallel Pinyin and Wade-Giles databases for services. Its National CJK Service will review later if the maintaining of WG and PY databases will be continued for future services for its CJK libraries.
Overall, the conversion was a success. The conversion program flagged 12% of all fields for review, which was done over a period of two months with a total of 920 work hours. The following conversion errors were found during the conversion process:
- Unexpected diacritics -- Ayns that were incorrectly keyed as alifs or apostrophes.
- Unexpected tag and subfield combinations -- $d in personal name headings containing Wade-Giles, such as "chin shih", which were not converted.
- Unexpected subfield coding -- 245 $b incorrectly coded as $h which was not converted.
- Records with Chinese titles but with language code "jpn" -- Incorrect language coding. The program did not process 245 if the language code was not "chi".
- Japanese records with parallel titles in Chinese -- The program did not process 245 if the language code was not "chi".
- Diacritics omitted -- Some words were converted to the wrong words due to omission of diacritics, e.g., "Tien-chin" was converted to "Dianjin" instead of "Tianjin"; "yu" was converted to "you" instead of "yu".
- Some notes become incongruous -- For example, "Added title also in pinyin: ..." Technically, these notes could be deleted, but the decision was not to do it.
Conversion At Local Libraries:
Two conversion options were offered to ABN's (Australian Bibliographic Network) twenty-two member libraries:
- Libraries that added holdings to the Australian Bibliographic Network (ABN, now called the Kinetica database) for their CJK materials could extract a Pinyin version of MARC records for all the CJK material they own and load them into the library's local system. (Libraries whose holdings were not on the ABN database would need to add their holdings to it retrospectively in order to be able to extract the Pinyin records.)
- Alternatively, they could perform a local conversion using the conversion program, provided by the National Library free-of-charge, directly on their local data.
According to Mr. Ching-Ping Tang, Senior Librarian of the Australian National CJK Service, twenty member libraries ordered tapes of records that carried their holdings in the Pinyin database and loaded them into their local systems to replace their Wade-Giles records. Currently, two libraries, Australian National University and National Library of Australia, are still using Wade-Giles due to lack of resources. They might eventually choose to convert data locally.
Two problems remain after the conversion:
- Authority files were not converted.
- Wade-Giles data in non-CJK language records were not converted.
Part II. Issues to be Addressed Locally in Preparation for the Conversion
Currently RLG is testing the LC drafted conversion specs, and OCLC is closely monitoring the progress. Although no detailed information from either RLG/LC or OCLC is available yet, after learning the general conversion process from the Australian project, the Task Force would like to present in this section a list of local issues on the conversion to be considered with your own conversion plans.
If your library is a part of a larger library network or consortium which shares a common bibliographic database, it might be more effective and economical to cooperate in planning and implementation among all member libraries. Otherwise, you need to make your own plan. The Task Force has identified the following important issues that OCLC member libraries need to address when planning for the conversion.
General Issues Regardless of Conversion Methods:
- Form a committee or group consisting of librarians and systems people to plan for the conversion as soon as possible.
- Define the scope of your conversion project: Do you want to convert only CJK records or all records that contain Chinese data, such as names, uniform titles, subject headings, etc.
- Estimate how many records will be affected.
- How to proceed with the conversion of authority files? Will the global replacement capability in the authority control function be useful for this project?
- Do you plan to change the Cuttering to reflect the change of romanization? If yes, how do you handle re-labeling and re-shelving of materials?
- How to control the quality of converted records? Who is responsible for reviewing and cleaning up converted records? Is additional manpower needed?
- Budget: The costs of conversion and cleanup need to be budgeted in as early as possible. Do you need to apply for special funding or grant to cover the costs?
- Implication for public services? How to educate users--BI, flyers, etc.?
Issues Related to Possible Conversion Options for OCLC Member Libraries:
It appears at present that there is no one quick conversion method which fits the different needs of OCLC CJK member libraries. So far, there are three possible conversion options that have been conveyed to OCLC for consideration. With unavoidable extra cost, each option also has its own pros and cons:
Option A: OCLC converts master records in WorldCat to Pinyin and member libraries load the records into their systems.
Pros: Libraries' local records will be updated with the most up-to-date bibliographic data from the master records (the WorldCat). Require low level local technical support in loading converted records.
Cons: Local data you input/edited in the OCLC records or data you input/edited directly on your local system that do not exist in OCLC's WorldCat records will be lost.
If you plan to keep the local data, you need to:
- Understand how OCLC records are loaded into your system;
- Identify local fields that you want to keep, such as local call no. and local notes;
- Talk to your system staff or systems vendor to see whether they can modify the OCLC bib loader program to prevent complete overlay of records.
Option B: OCLC provides conversion services on libraries' archival records/tapes
Pros: Libraries' local notes/data will be preserved. Require low level local technical support in loading converted records.
Cons: Local data you input/edited directly on your local system that do not exist in your institutional archival records in OCLC will be missing. The most up-to-date information in WorldCat will not be captured to your local records.
If you plan to keep such local data, you also need to consider the three points mentioned in Option A.
Option C: Libraries do conversion locally
Pros: You can tailor the conversion to your own institutional needs.
Cons: Require high level local technical supports on writing/running the conversion program.
If your library decides to do conversion locally, you need to ask questions such as: do you want to use a conversion program provided by RLG/OCLC, if available, or write your own conversion program? (You need to study LC's specs when it is available and decide whether the RLG/OCLC provided conversion program is suitable for your local system.) In either case, you need to work with your systems staff or systems vendor to draw up a conversion plan.
Although the pinyin conversion to LC's database is scheduled to take place in a year or so, it is not too early to seriously think about local plans now. The Task Force plans to prepare another report on LC and RLG's conversion projects when more concrete information is available. Meanwhile, the Research Libraries Group is sponsoring a forum on Pinyin conversion at the ALA, scheduled on Sunday, June 27, 9:30 am-12:30 pm at the Embassy Suites (Lafitte 1) in New Orleans. The focus of this forum is on issues related to the impact of these changes on libraries' local systems and catalogs. Librarians concerned about this issue should plan to attend.
1. Groom, Linda. "Converting Wade-Giles cataloging to Pinyin: the development and implementation of a conversion program for the Australian National CJK Services." Library Resources and Technical Services 41, no.3: 254-63.
2. E-mail correspondences with Linda Groom and Ching-Ping Tang, Senior Librarian, Australian National CJK Service, NLA, in May 1999.
Task Force Members:
Sarah S. Elman (University of California, Los Angeles)
Wen-ling Liu (Indiana University)
Phyllis Wang (University of California, Davis)
Hsi-chu Bolick, Chair (University of North Carolina at Chapel Hill)
Originally posted on the OCLC-CJK listserv on 21 June 1999.