OCLC Pinyin Conversion Progress Report

October 12, 2001

Jay Weitz
Product Manager
OCLC Pinyin Conversion Project

During October 2001, OCLC accomplished the conversion of non-Chinese language bibliographic records in WorldCat, the OCLC Online Union Catalog, from the Wade-Giles transliteration scheme to pinyin. A total of 24,909 records with Language Codes other than "chi" in the Language fixed field (008/35-37) were converted. These records were carefully selected by algorithms designed to find identifiable Wade-Giles text and represent all of the non-Chinese bibliographic records that could be safely converted. This includes the two categories of bibliographic records purposely deferred during the Chinese language conversion that was completed earlier in 2001: records with numerous Language codes in field 041, especially records that contain Japanese and/or Korean codes in addition to Chinese; and score and sound recording records that are coded "N/A" in the Language fixed field but that are identifiably Chinese.

For specifications on the conversion of these non-Chinese language records, see http://www.loc.gov/catdir/pinyin/nonchi-contents.pdf on the Library of Congress Pinyin Conversion Project Web site.

OCLC has worked in close cooperation with the Library of Congress and the Research Libraries Group in the planning and testing of this massive conversion process for more than two years. The conversion was designed to be conservative, so that as much data as could be safely and reliably converted would be converted. At the same time, we tried to minimize the chances for erroneous conversions. Please remember that all bibliographic records that have been created in pinyin or that have been manually converted to pinyin should contain the marker field 987. This includes records submitted to OCLC for batchloading. The presence of field 987 will assure that the record is not inadvertently converted again by the pinyin conversion software.

Of course, as with any conversion of this size and complexity, there will inevitably be records converted incorrectly. The Library of Congress is currently working on both identifying and fixing records in known areas of problematic conversions. OCLC users with Chinese language skills have the ability to correct many records on their own as they come across them. (Remember that you must be using OCLC CJK software to lock and replace any record with vernacular data.) Minimal Level records (Encoding Levels K, M, 2, 3, 5, 7, and all 4s that do not include field 042 with code "pcc") may be locked, corrected, and replaced by any OCLC user with a Full Level authorization or above. If you are a participant in CONSER or OCLC's Enhance program (as many OCLC CJK users are), you are additionally encouraged to fix any incorrectly converted records with Encoding Levels higher than minimal, in accordance with your CONSER, Regular Enhance, or National Level Enhance authorization.

OCLC users are strongly urged to report, via the usual error reporting mechanisms, any erroneously converted records that they are not authorized to correct themselves. The full range of paper and electronic means of reporting errors in OCLC records can be found in Chapter 5 "Quality Assurance" of OCLC's "Bibliographic Formats and Standards" (in print, p. 59-72).

For an outline and explanation of the conversion process that was jointly developed by LC, RLG, and OCLC, as well as answers to many questions about the conversion, see the LC "Pinyin Conversion Project" web site.

You may contact OCLC's Midwest Region Marketing Manager Ms. Chris Mottayaw (e-mail: chris_mottayaw@oclc.org; phone: 800-848-5878 x6476; fax: 614-718-7444) for more information about the local pinyin conversion service options. Questions concerning the specifications for the bibliographic conversion should be addressed to Philip Melzer, Team Leader, Korean/Chinese Cataloging Team, Library of Congress at pmel@loc.gov.

Originally posted on EASTLIB, 12 October 2001.