OCLC Pinyin Conversion Progress Report

July 3, 2001

Jay Weitz
Product Manager
OCLC Pinyin Conversion Project
weitzj@oclc.org


During June 2001, OCLC completed the conversion of Chinese language bibliographic records in WorldCat, the OCLC Online Union Catalog, from the Wade-Giles transliteration scheme to pinyin. A total of approximately 710,000 records coded "chi" in the Language fixed field (008/35-37) were converted. Two categories of records were purposely deferred so that they could be converted along with the non-Chinese language records later in 2001. One category is records with numerous Language codes in field 041, and especially records that contain Japanese and/or Korean codes as well as Chinese. The second category is score and sound recording records that are coded "N/A" in the Language fixed field but are identifiably Chinese. Because of the special challenges associated with these records, they will be treated using the even more conservative non-Chinese record conversion algorithms. OCLC expects to convert these two categories plus the non-Chinese language records during October 2001, to be completed before the end of the year.

In addition, OCLC has restored Chinese vernacular data to about 4200 records from which it was inadvertently lost during the loading of Library of Congress Chinese records converted by the Research Libraries Group in an earlier phase of the conversion process.

OCLC has worked in close cooperation with the Library of Congress and the Research Libraries Group in the planning and testing of this massive conversion process for the past two years. The conversion was designed to be conservative, so that as much data as could be safely and reliably converted would be converted. At the same time, we tried to minimize the chances for erroneous conversions. Please remember that for the conversion to work correctly, all bibliographic records that have been created in pinyin or that have been manually converted to pinyin should contain the marker field 987. This includes records submitted to OCLC for batchloading. The presence of field 987 will assure that the record is not inadvertently converted again by the pinyin conversion software.

Of course, as with any conversion of this size and complexity, there will inevitably be records converted incorrectly. The Library of Congress is currently working on both identifying and fixing records in known areas of problematic conversions. OCLC users with Chinese language skills have the ability to correct many records on their own as they come across them. (Remember that you must be using OCLC CJK software to lock and replace any record with vernacular data.) Minimal Level records (Encoding Levels K, M, 2, 3, 5, 7, and all 4s that do not include field 042 with code "pcc") may be locked, corrected, and replaced by any OCLC user with a Full Level authorization or above. If you are a participant in CONSER or OCLC's Enhance program (as many OCLC CJK users are), you are additionally encouraged to fix any incorrectly converted records with Encoding Levels higher than minimal, in accordance with your CONSER, Regular Enhance, or National Level Enhance authorization.

OCLC users are strongly urged to report, via the usual error reporting mechanisms, any erroneously converted records that they are not authorized to correct themselves. The full range of paper and electronic means of reporting errors in OCLC records can be found in Chapter 5 "Quality Assurance" of OCLC's "Bibliographic Formats and Standards" (in print, p. 59-72).

For an outline and explanation of the conversion process that was jointly developed by LC, RLG, and OCLC, as well as answers to many questions about the conversion, see the LC "Pinyin Conversion Project" web site.

You may contact OCLC's Midwest Region Marketing Manager Ms. Chris Mottayaw (e-mail: chris_mottayaw@oclc.org; phone: 800-848-5878 x6476; fax: 614-718-7444) for more information about the local pinyin conversion service options. Questions concerning the specifications for the bibliographic conversion should be addressed to Philip Melzer, Team Leader, Korean/Chinese Cataloging Team, Library of Congress at pmel@loc.gov.


Originally posted on EASTLIB, 8 August 2001.