Pinyin Romanization: Word Division Recommendation

Philip Melzer
Regional and Cooperative Cataloging Division
Library of Congress

New developments in technology and computer programming have prompted the Library of Congress to again consider converting from Wade-Giles romanization of Chinese to the pinyin system. The Library has begun discussing options for conversion with the major utilities and the library community.

In 1990, the Library of Congress indicated that it wished to explore, with OCLC and RLG, possibilities of machine conversion of existing MARC records to pinyin. It was recognized then that a preliminary step in preparing for conversion would involve discussion and agreement on a standard for word division.

It has been almost forty years since the People's Republic of China adopted pinyin. Pinyin is now generally recognized as the standard for romanization of Chinese throughout most of the world. And yet, there seems to be no generally accepted international standard for pinyin word division. Although the government of China has issued standards for word division, publishers and authors often do not conform to its guidelines. Dictionaries published in China do not follow consistent word division practices.

European romanization practices and word division practices also seem to vary greatly. For example,

We feel that the absence of an international standard for word division provides us with the opportunity to propose a system that best meets our needs.

The National Library of Australia (NLA) has sought to maintain consistency in applying syllable aggregation practice as it converts its files from Wade-Giles to pinyin. NLA pinyin word division guidelines are embodied in two of its CJK (Chinese, Japanese, Korean) Technical Committee resolutions, and read as follows:

That, where a cataloguer inputs Pinyin data into the National CJK System, each Chinese character should be input as one Pinyin syllable, except for proper and geographic names, where the syllables should be joined.

That... In the Pinyin database:

- where the preferred form in LCNA is a pure Wade-Giles form, the preferred form in the Pinyin database should be that Wade-Giles form converted to Pinyin

- where the preferred form in LCNA is an "established" form i.e. not a pure Wade-Giles form, for instance Chang Kai-shek and Confucius, the preferred form in the Pinyin database should be that "established" form.

Note: LCNA is the Library of Congress Name Authority file.

In other words, individual syllables of surnames and forenames, instead of being hyphenated, would be connected, as would individual syllables of geographic names. Terms for jurisdictions and topographical features would be separated from geographic names. All other syllables would be separated from each other.

The Library of Congress proposes following the practice that has been adopted by the National Library of Australia. We believe that this approach would offer a number of distinct advantages:

  1. It is familiar. It offers the least possible change from our present practice, and therefore would be easy to learn and apply.

  2. It is straightforward. It eliminates the use of the hyphen entirely. The guidelines for its application can be written in a few words, making it easy to communicate and put into practice.

  3. As NLA converts its database from Wade-Giles to pinyin, it is also converting hundreds of thousands of American MARC records. By adopting the same system of word division, libraries in this country, and major utilities, could conveniently utilize Library of Congress bibliographic records just as they have been converted by NLA, without having to take the further step of accommodating different word division practices.

  4. Separating most syllables would make it possible for records to be changed to suit other institutional needs (i.e., it would allow for future syllable aggregation or connection).

  5. It does not seem to present a conflict with an established Chinese or European policy or practice (because practices vary so greatly).

For all of the above reasons, we believe that this would be the most economical approach for the Library of Congress, the major utilities, and individual libraries. Furthermore, it would be easy for users to learn and apply.

This proposal will be submitted to professional organizations for review and comment. The Library would also appreciate receiving comments on this proposal from scholars and other library users. Please send all comments to the author at:

Copyright © 1996 Philip Melzer.
Submitted to CLIEJ August 13, 1996.