OCLC Pinyin Conversion Progress


As has been announced, OCLC began its conversion of Chinese language bibliographic records in WorldCat, the OCLC Online Union Catalog, from the Wade-Giles transliteration scheme to pinyin during May 2001. The first phase of the conversion, expected to be completed by the end of August 2001, began with all Chinese language bibliographic records (identified by fixed field Language Code "chi"), starting from the highest OCLC number and working backwards through WorldCat. The complete set of approximately 8900 converted Chinese language CONSER serial records have already been loaded into WorldCat and redistributed by the Library of Congress. Once all Chinese language records are converted, OCLC will continue on to convert non-Chinese language records that contain identifiable Wade-Giles data. This phase of the conversion process is expected to be complete before the end of 2001.

To date, aside from the 8900 CONSER records, some 80,000 non-serial Chinese language records have been converted, in the WorldCat partitions for OCLC record numbers 42,000,000 and higher. This represents most of the Chinese language cataloging added to WorldCat since August 1999. OCLC expects to continue the conversion in approximately weekly chunks of three million records, working backwards through the database.

OCLC has been working in close cooperation with the Library of Congress and the Research Libraries Group with the planning and testing of this massive conversion process for nearly two years. The conversion was designed to be conservative, so that as much data as could be safely and reliably converted would be converted. At the same time, we tried to minimize the chances for erroneous conversions. Please remember that for the conversion to work correctly, all bibliographic records that have been created in pinyin or that have been manually converted to pinyin should contain the marker field 987. The presence of field 987 will assure that the record is not inadvertently converted again by the pinyin conversion software.

Of course, as with any conversion of this size and complexity, there will inevitably be records converted incorrectly. The Library of Congress is currently working on both identifying and fixing records in known areas of problematic conversions. OCLC users with Chinese language skills have the ability to correct many records on their own as they come across them. (Remember that you must be using OCLC CJK software to lock and replace any record with vernacular data.) Minimal Level records (Encoding Levels K, M, 2, 3, 5, 7, and all 4s that do not include field 042 with code pcc) may be locked, corrected, and replaced by any OCLC user with a Full Level authorization or above. If you are a participant in CONSER or OCLC's Enhance program (as many OCLC CJK users are), you are also encouraged to fix any incorrectly converted records with Encoding Levels higher than minimal, in accordance with your CONSER, Regular Enhance, or National Level Enhance authorization.

In addition, OCLC users are strongly urged to report, via the usual error reporting mechanisms, any erroneously converted records that they are not authorized to correct themselves. The full range of paper and electronic means of reporting errors in OCLC records can be found in Chapter 5 "Quality Assurance" of OCLC's "Bibliographic Formats and Standards" in print (p. 59-72) and on the OCLC Web site (http://www.oclc.org/oclc/bib/chap5.htm).

For an outline and explanation of the conversion process that was jointly developed by LC, RLG, and OCLC, as well as answers to many questions about the conversion, see the LC "Pinyin Conversion Project" Web site at http://lcweb.loc.gov/catdir/pinyin. Questions concerning the specifications for the bibliographic conversion should be addressed to Philip Melzer, Team Leader, Korean/Chinese Cataloging Team, Library of Congress, at pmel@loc.gov.

There is additional information on the OCLC site at http://www.oclc.org/oclc/pinyin/index.htm. Also see the OCLC Web site for information on the array of local data conversion options OCLC is offering to both members and non-members. You may contact OCLC's Midwest Region Marketing Manager Ms. Chris Mottayaw (e-mail: chris_mottayaw@oclc.org; phone: 800-848-5878 x6476; fax: 614-718-7444) for more information about the local pinyin conversion service options.


Message originally posted on OCLC-CJK list, May 18, 2001.