Bibliographic Control of Chinese Material in the United Kingdom*

Sandra Gilkes

Libraries that cater for material in Chinese need to be able to provide access to their collections for those learning the language and those who can read and understand the language already. For this to be effective, the Chinese language libraries in the United Kingdom need to assess who their readers are and what their requirements might be. For the most part, the majority of Chinese material users in U.K. libraries are those studying Chinese language and literature and who, therefore, have some level of competence in the Chinese language. Given the outline of the need to provide access as mentioned, the main consideration for a library to cater for Chinese is whether to adopt a romanisation scheme that the majority of users can use and recognise so that even if there is a limited amount of ability to read Chinese (because readers are learning the language), use of the romanisation scheme can at least help locate the required item. At this point, Pinyin seems to be the romanisation scheme that is being more widely brought into practice in libraries. For Chinese, there are several schemes that can be used for the activity of romanisation, in itself a form of script conversion. However, since Pinyin was created in China and is increasingly being used outside of China as well. In 1997, the Library of Congress decided to switch all its records from Wade-Giles to Pinyin romanisation. Therefore, it would seem appropriate that those libraries that need to use a romanisation scheme should, for reasons of standardisation and facilitating access, opt of Pinyin.

For material in non-roman scripts such as Chinese, bibliographic control, defined as the way in which a library organises access to its collection, includes the activity of romanisation. According to Spalding, romanisation is "the representation of roman letters of names and words originally written in some other writing system"[1]. There are several romanisation schemes for Chinese. Since it is suggested that this is the scheme that is to use in libraries, Pinyin is explained in greater detail: It is a romanisation scheme based on the 26 roman letters of the alphabet except v, four modified roman letters and the four diagraphs zh, ch, sh, and ng as well as the letter u with umlaut to represent y. The Pinyin system also employs four diacritical marks for the tones of the Chinese pronunciation. In this instance, Chinese refers to the standard pronunciation of Mandarin or Putonghua, and the four tones of pronunciation of Chinese - high, high rising, falling and rising, falling - all of which are indicated with diacritical marks. When using Pinyin, words are romanised into a single linguistic unit. There are intrinsic advantages to the use of Pinyin over other schemes, and these are given below:

  1. Pinyin is the officially recognised romanisation scheme of the People's Republic of China and therefore lacks the colonial connotations of the other more archaic forms of romanisation such as Wade-Giles. In addition, it is also the method of romanisation being used in China and is taught in schools there, as part of a long-standing Chinese attempt to simplify the notoriously difficult Chinese language. In addition, Pinyin has been accepted as the ISO standard for the romanisation of Chinese.

  2. From a linguistic point, it is phonetically closer to the spoken Chinese language than Wade-Giles. Wade-Giles is based on the way those, who created the scheme such as the British diplomat and military envoy Wade, and the Professor of Chinese at Cambridge who produced the Chinese-English Dictionary, heard the Chinese language spoken at the time.

  3. Pinyin is part of the teaching of the Chinese language world-wide, and as such has the added benefit that it is the scheme that the majority of library users, regardless of nationality, can recognise and identify in a library wherever in the world that library might be.

Examples of how Pinyin is expressed are included below in the visual and phonetic comparison between Wade-Giles and Pinyin romanisation schemes and in appendix 1.

Pinyin Wade-Giles 中文
Bingju Fenxi Ping Chu Fen Hsi 病句分析
Putonghua Zhengyin Pu Tung Hua Cheng Yin Shou Tse 普通话正音
Beijing Peking[2] 北京

Given the developments in automation and the subsequent attempts at using the vernacular Chinese language for creating catalogue records and providing records for readers, it is interesting to consider what the two libraries in the United Kingdom that have the most significant Chinese collections do in order to ensure bibliographic control of their collections. Having chosen to romanise using the Pinyin scheme, the libraries must, then, find a way to create cataloguing records and enable public access to the catalogue. In the United Kingdom, those libraries that have the largest and most significant collection of Chinese literature are the British Library and the Library of the School of Oriental and African Studies.

1. The British Library:
The British Library opened its Oriental and India Office collection in its new library in St. Pancras on August 12th 1998. Its collection of Chinese material can be accessed through the OPAC (on-line public access catalogue) and via the catalogue on the Internet. For cataloguing purposes, records can be created using an automated system, known as Allegro C, developed at the University of Braunschweig, Germany. Records are created in the romanised Pinyin form and increasingly, it is aimed to be used as the system through which to create records in the Chinese vernacular, although this process is not intended to be initiated, let alone completed until at least after the year 2000. The on-line access and the card catalogue use a mixture of Wade-Giles romanisation and Pinyin romanisation.

  1. For records for items in stock since 1966, the library uses Wade-Giles;
  2. For items since 1966, the library uses Pinyin;
  3. For systems acquired since 1983, the library uses the Allegro C automated cataloguing system with the ability to search in Chinese script or in romanisation by author, title and keyword and subject. This is the method of cataloguing for all languages in the Oriental and India Office collection, including Chinese, Japanese and Korean (CJK). This indicates that the British Library switched from Wade-Giles to Pinyin romanisation early in the development and simplification of the Chinese language, even before the National Library of China. However, having the automated facility for use with CJK does not eradicate the problem that when using Pinyin, word-division has to be settled. The problem of word division for Pinyin still remains, with the Chinese Section of the British Library following the practice of word aggregation. This results in words being joined up as much as possible. For example, tushuguanxue not tushuguage xue or tu shu guan xue. This difficulty of word division is also recognised by the Library of Congress, which acknowledges "as yet, there seems to be no generally accepted international standard for Pinyin word division. Although the government of China has issued standards for word division, publishers and authors often do not conform to its guidelines."[3] The Allegro C system also creates MARC records. Allegro C is used for cataloguing by the University of Leeds and Oxford University Libraries with the British Library in a cooperative cataloguing venture with the total number of Chinese records in the three databases at about 50,000 records.
2. Library of the School of Oriental and African Studies:
The method used by the School of Oriental and African Studies (SOAS) Library is that of a Chinese language card catalogue and a printed book catalogue. The Chinese card catalogue and the subject headings used in the library are transliterated using the Wade-Giles romanisation system. The OPAC, covering accession since October 1989, however, uses Pinyin. Wade-Giles and Pinyin conversion tables are displayed on the Chinese card catalogue cabinet. As the SOAS OPAC uses Pinyin, therefore it is stated in the instructions to users, "Chinese words formed from two or more characters and romanised on the on-line system are always treated as separate characters. For example, Fo jiao ci dian not Fojiao cidan (Dictionary of Buddhism)"[4] Advice to library users emphasises that because romanised Chinese characters contain many homophones, title searches may not be effective and that a subject inquiry using key words might produce better results.

The use of technology in libraries and for those libraries that have material in Chinese has increased significantly in the recent years. There are two trends to this development:

1. The development of computers and the Internet for the Chinese language.
2. The use of computers in libraries for cataloguing Chinese language material.

Automation for libraries with the Chinese collections is now at the stage of being able to input in the vernacular Chinese characters, and this extra facility has been the reason for a reconsideration of whether to use a romanisation scheme at all. In the United Kingdom, Allegro C and Innopac are the most widely used computer systems for this. They enable cataloguers to create records using the vernacular script; assuming that the readers want to be able to use a catalogue in the vernacular Chinese script, this is most ideal. However, given that as stated, there are valid reasons for the use of the romanisation scheme Pinyin. The best solution would seem to catalogue and have access to the catalogue by use of both the vernacular Chinese script and Pinyin. (Hereby, I draw the reader's attention to a forthcoming publication on this precise subject: "Information Processing for CJKV" by Ken Lunde to be released by O'Reilly Associates, Inc. in October 1998 in the United States of America.)

The automation facilities for Chinese characters, i.e. non-roman, are also connected to machine readable data character coding and the standards that exist for this. Some of the standards used for this exchange of bibliographic information in Chinese are listed:

  1. Taiwan CCCII (Chinese Character Code Information Interchange)
  2. GB 2312-80
  3. ANSI Z39.64-1989
  4. ISO 10646

A significant amount of library catalogues are now also accessible via the Internet. It is interesting to note, therefore, that the medium of the Internet must, as a consequence, also be able to cater for material in different scripts. The means to do this is by using the appropriate software for transcribing into and out of the Chinese script. Without the appropriate software, the vernacular Chinese text appears, as no doubt the majority of Internet users have experienced, illegible. In order to read the web-pages, the codes such as:

  1. Big5
  2. GB
must be used.

Given that the Internet is increasing in significance for libraries as they seek to provide on-line access to their collections, an overview of the current facilities of the Internet is relevant. Examples of some of the software that can be used for Chinese are given below:

  1. TwinBridge Apple MacIntosh Chinese language
  2. NJStar

Using the Internet for Chinese can best be achieved by the use of those Internet Service Providers (ISP) that offer Chinese facilities, as listed:

  1. Yahoo
  2. Sinanet
  3. Alta Vista

Cinet-L news items 5/7/98 and 5/14/98 indicate that Yahoo Chinese has also been created. This is a 10,000 site index created in both simplified and traditional Chinese, with search results display in both styles. (According to Digital, 30% of documents on the web are in languages other than English.) In addition, Netscape has also voiced its intention to launch a Chinese-language guide to global computer networks, given that industry analysts believe the number of Internet users in China will increase to 6 million by the year 2000. When using the Internet, notices that the GB or Big 5 Code or EACC must be used or relevant software must be provided, appear.

An analysis of the subject of romanisation for Chinese in libraries should come to a conclusion as to whether there should be romanisation at all, and whether this is the most appropriate way for bibliographic control of Chinese collections in libraries in the United Kingdom. The arguments in favour of romanisation in libraries, which should consider the position of the cataloguer and the library users at the same time, are very different to those arguments in favour for the use of romanisation per se. For example, in the case of geographical place names as considered in the works of Aurousseau. (The Rendering of Geographical Place Name -1957). However, once the library has opted to choose Pinyin, it should be noted that the means to use this for Chinese can be adapted to the on-line catalogue such as Allegro C and Innopac, and that this can be used for input and creation of cataloguing records in the vernacular Chinese. It is suggested that this is the best means for the attainment of bibliographic control for libraries. Given the increasing use of computer in libraries, it would seem sensible to suggest that those libraries establishing CJK collections are able to provide both the Chinese vernacular and the romanised form in Pinyin. It should also be noted that whatever the scheme chosen and whether cataloguing is carried out using a romanisation scheme or not, a library that holds material in non-roman scripts such as Chinese needs also to provide the appropriate library personnel, who, regardless of nationality, should be able to instruct users in the use of the OPAC for Chinese and the romanisation scheme being used in the library.


This is a brief overview of current practice in libraries in the United Kingdom. It is interesting to consider whether it is in fact appropriate for a U.K. library to provide catalogue records in a different language, albeit one that uses roman letters, but increasingly, in the vernacular Chinese script. Should any British library, funded by the U.K. population, actually provide access via a different language, (although this is also done for other languages such as French or German,) let alone different scripts, even if the majority of its readers, despite learning the Chinese language, are actually of English or other national origin. Why not consider cataloguing Chinese material in English by means of transcribing into the English phonetic alphabet? At this point, the ratio of Chinese nationalities and other groups would need to be assessed as would the percentage of those library users reside in the U.K. to consider where the largest user group comes from and what their preferences are. By contrast, the National Library of China catalogues English material by sinisation into Pinyin. In this instance, when the majority of readers are of Chinese origin, although there are others of differing nationality, and although those seeking English-language material, are learning the English language, access to those books is provided by using a vernacular Chinese Pinyin catalogue record. The National Library of China does not, for example, catalogue its English material in English to provide access to English books, even if its library readers, who are mainly Chinese, might be learning English which to them is a foreign language. On the other hand, remembering that this National library is in China, it seems appropriate for it to catalogue using Pinyin and not English. It is important to remember that bibliographic control, despite the implications, should not be established in order to complicate access but in order to provide facilitated access to readers.


1. Chinese Romanisation Table for Pinyin- University of UCLA
Note: this table is included on the Internet by the library for the use of library users who wish to acces the catalogue and thereby the CJK collection of the material in the library. It indicates the different romanisation schemes that the library offers -Wade-Giles and Pinyin as well as Bopamofa the Taiwanese romanisation scheme. It is interesting to note that this is actually provided on the Internet, without readers actually having to adopt/use another software package in order to make the Chinese script legible.

2. List of URL's used in the research:

Allegro GmbH
AsianDoc Newsletter
British Library
Chinese Character Dictionary Geneology
Chinese Librarianship: an International Electronic Journal http://library.fgcu-edu/iclc/cliej/
CEAL Task Force
Durham Library
East Asian Library
IFLA 64/058-86e.html
National Library of Australia
SOAS Library
Unicode Consortium
University of Hawaii, Manoa Library

3. Notes:

[1] Spalding, Summer C. (1977). "Romanisation re-examined". In Library Resources and Technical Services, 77, p.3.
[2] Lau, Shuk-fong and Wang, Vicky. (1991). "Chinese Personal Names and Titles: Problems in Cataloging and Retrieval". In Cataloging and Classification Quarterly 13(2), p.57.
[3] Melzer, Philip. (1998). "Pinyin romanization: word division recommendation". Available at URL:
[4] SOAS Library. (1998). "SOAS Library: China Section". Available at URL:

This article is based on an M.A. report.
Copyright © 1998 Sandra Gilkes.
Submitted to CLIEJ on 18 September 1998.