James K. LIN
Many people have commented on the merits of word division. Yet few have provided a clear definition on what constitutes Chinese "word division". The "RLG Chinese Aggregation Guidelines" now being used by RLIN is to "parse strings of Chinese characters into semantic units. These aggregated units will then become terms indexed by the RLIN system". RLG has been very clever in avoiding the use of "word" in its definition I cited above. They knew the "Guideline" are not "word division" guidelines. Rather, they are "phrase division" guidelines.
What is the definition of Chinese word division, then? There are many answers to the question. Yet there has been no one clear acceptable definition that will satisfy all. Here I'll provide one example from Paul Kratochvil's "The Chinese Language Today": "An MSC (modern standard Chinese) is the smallest unit which may function as an immediate constituent of MSC segmental sentences." He continued: "That words in MSC can be established only on the basis of actual structural features of MSC utterances, and that it is next to impossible to set up MSC words, or rather to find words in MSC by using the gauge of the set of features popularly associated with words in European languages." He further indicated that" The point is not that features of meaning have nothing to do with the status of MSC words, but that meaning is only a partial aspect of the total behavior of the smallest operational units called words, and as such it is not a property sufficient for establishing these units."
A popular myth about Pinyin is that Pinyin and the word division are twin: You can not have one without the other. In a LC policy statement dated January 3, 1991, on "LC Position on the Use of Pinyin Romanization", it stated that:" A major difficulty impeding the adoption of Pinyin is the lack of standards for word division." This is the typical common mistaken notion about the relationship between Pinyin and the word division.
Both the Pinyin and the Wade-Giles are the symbols or systems of symbols used for representing pronunciation of Chinese characters. It is in the sphere of "phonetics" when we talk about one phonetic system or the other, very much the same way Websters or International Phonetic Alphabet systems are to the pronunciation of English. The word division is the study of combining Chinese characters into the equivalent of "words" in European languages. It falls into the sphere of "morphology", the study of words. There are at least four or five popular systems for the pronunciation of Chinese. The Pinyin and the Wade-Giles are two of them. The American East-Asian libraries have used Wade-Giles system for decades, without word divisions.
The People's Republic of China adopted the Pinyin system in 1958. A guideline on the "proper ways to write (i.e., link)" Chinese names in Pinyin form was introduced in 1976. It basically stipulates that surname and forename should be separated by space and forename should be joined together, and other specifications. In the same year, a guideline for writing place names was also published. It was not until 1982 that a guideline on how to write Pinyin form in a text was announced.
The discussion on how to link various romanized forms can be traced back as early as in the 1920's, long before Pinyin's time.
I want to emphasize here that the guidelines use the term "Zheng4 Ci2 Fa3". It literally means "orthography." In other words, they are the guidelines on when and how to write the Pinyin forms. The guidelines take the syntactic structures into consideration, which usually link elements larger than what we normally would call "word". Because of the fact that syntactic structures vary from one utterance to another, the guidelines leave a lot of room for personal interpretations.
Most people failed to mention that the guidelines are applicable only to the Modern Common Chinese. They are not intended to govern the Classical Chinese, where the syntactic structures are vastly different from that of Modern Common Chinese.
I'll outline my key points in the following:
"That, where a cataloguer inputs Pinyin data into the National CJK system, each Chinese character should be input as one Pinyin syllable, except for proper and geographic names, where the syllables should be joined."
Name and subject authority records must be converted to Pinyin prior to the initiation day for Pinyin conversion. Since LC maintains these two files, LC has the obligation to carry out the conversion on these two files. These are the absolute prerequisites for Day 1 operation. Any attempt (e.g., changing the authority records only when they are related to the works being cataloged) to "shortcut" these requirements will definitely bring catastrophic consequence to the East Asian libraries around the world. Some people compared this practice to the conversion from AACR I to AACR II. I believe those who entertain such an idea have underestimated the complexity of Pinyin conversion.
The ambiguity of the rules governing the vernacular subject heading proposals contribute to the "mixed bag" situation. Catalogers have been confused and concerned with the situation.
Two questions are posted here:
Ideally, it will be best to have all past bibliographic records converted to Pinyin forms. But for the sake of implementation of Pinyin, I personally could live with the idea of split files for the time being until either LC or the two utilities convert them later. It will be ineffective and inefficient to expect individual institutions to convert bibliographic records by themselves one at a time.
How to deal with the classification schedules that are exclusively derived from the Wade-Giles is another matter of great concern. We have to come to agreeable solutions prior to the conversion.
We have to determine whether to freeze the call numbers or not.
If we do not freeze the old numbers, then we will resort to using "see references" all the time, and will end up with relying on both Pinyin and Wade-Giles side by side forever. The cataloging staff thereby will have the added burden to deal with the two systems.
If freezing the old numbers is the way to go, and no references will be made to the old numbers generated by Wade-Giles forms, then there will be the need to establish new schedules in areas where schedules are arranged by the Wade-Giles forms, especially the numbers for the 20th century authors in PL2735.5-PL2929.5. Every existing author will probably end up with two author numbers, one based on the Wade-Giles and the other, Pinyin. Not too many names or subjects can stay in the same numbers. Cuttering is not only determined by the first letter in a word, the second and third letters will also influence the outcome of the Cuttering. Also it is not just the surnames that will be the determining factors in Cuttering; the forenames will also play important roles. The determining factors differ from caption to caption.
Two areas that will be greatly affected by the conversion are:
In addition to the DS and PL schedules, other schedules will also be affected when names and vernacular terms are used for Cuttering purposes.
Further, LC needs to replace the old numbers and their captions based on Wade-Giles with the new numbers and captions based on Pinyin in all the affected areas, and have them printed on the classification schedule before Day 1. Otherwise, outside libraries will have nothing to work on except the blank A-Z captions. Only LC can determine the class and the subject Cutter numbers to replace those old ones. There are differences between the subject and book Cutter numbers. The former is part of the class numbers, while the latter can be assigned by the outside libraries.
For literary authors, LC's practice has been to link the AACR II names to the numbers derived from AACR I names. For example, Hsiao, Hung, 1911-1942, the AACR II name, classed in PL2740.N3, the number derived from the AACR I name, Chang, Nai-ying, 1911-1942. The Pinyin format for the author will become Xiao, Hong. Shall the author be retained in PL2740.N3, or reclassed according to the Pinyin form?
Commentaries on individual persons or works are supposed to class with the original call numbers (F570), which in most cases, are based on Wade-Giles. If we freeze the old numbers, we have to establish new numbers based on the Pinyin forms for authors and works before we can proceed to catalog any commentary works.
We understand that LC has been looking for a new shelflisting procedure, which will be very different from the current practice. We would like to know to what extent that will impact the call numbers. If we carry out the Pinyin conversion task ahead of the new shelflisting procedure, we certainly will suffer another major setback in the Pinyin-based call numbers. We can not afford to keep on abandoning and changing our call numbers every so often.
At the time when library budget and cataloging staff are facing deep cuts throughout the U.S. libraries (LC included), we certainly can not afford to do everything we want to do at the same time. We have to sort out our priorities and plan very carefully and wisely on how to allocate our limited resources.
In an April 3rd E-mail to Philip Melzer, the Chairperson of CEAL Technical Processing Committee, Mr. Eugene Wu, the Librarian of Harvard-Yenching Library, listed several top priorities facing the U.S. East Asian libraries today. They are: retrospective conversion, CJK local OPAC system, CJK characters in the Name Authority File. I can add another one: integrated library online system capable of handling CJK scripts.
The National Library of Australia is to be complimented for its commitment and professionalism as evident in its fine statements about Pinyin syllable linkage and its conversion of all its Chinese bibliographic records to Pinyin form. If there is any lesson to be learned, I believe that is: Say only what you can deliver, and deliver what you have already said.