China Data Retrieval: A Method for Computer-Assisted Indexing of Translated Mainland Chinese Material

Item

Title
China Data Retrieval: A Method for Computer-Assisted Indexing of Translated Mainland Chinese Material
Description
An acute need exists for a detailed index to the heavy and continuing flow of materials currently translated from Mainland China sources. As with many foreign sources, these materials come in many small fragments, and their timeliness makes it important to arrange and index them quickly in order to be of maximum use to the analyst. Only in the last few years have computer hardware and programs been brought together for the preparation of such a social science index at less than prohibitive cost. This Memorandum reports on a pilot project to apply these computer techniques to the construction of such an index and to estimate costs of production on a regular, large-scale basis.A useful scheme for indexing translation series must have the following attributes: high user acceptability, low cost in each phase of production; an easily learned and efficient procedure; a sufficiently sophisticated program; and cost-effective equipment. Before the pilot project began, a number of operational choices were made. (1) A predefined set of index categories was rejected in favor of an open-ended subject list drawn directly from the text. (2) To maximize efficiency of off-line inputs and to reduce costs, the IBM MT/ST (Magnetic Tape/Selectric Typewriter) was deemed the best available input device. (3) An internal Rand program, "Quester," was chosen and modified to handle the data. (4) The IBM 360/65 computer was selected because it was immediately available at Rand and is a widely available, large-capacity, high-speed machine. (5) A computer-driven phototypesetting capability was built into the system.The project was divided into two parts. The first introduced enough dtaa into the system to provide material for initial checking of all operations. It was necessary to convert the Quester program, to the desired format, make it compatible with the 360/65 computer, write various peripheral programs, and gain experience with the MT/ST input equipment and the RCA Spectra 70 phototypesetting output program. The second was devoted to a trial run of one month of data from the translation series, Foreign Broadcast Information Service -- Communist China, Daily Report (FBIS-CC) (the 21 issues for January 1969) to ascertain costs, provide an adequate presentation, and work out standard procedures for more than one indexer. The results demonstrated the feasibility and desirability of the integrated scheme, and the costs for large-volume daily output were estimated to fall within the desired range.The original 1064 pages of raw material were indexed in 173 pages of printout, including all input data and an alphabetical index of subject categories. Thus, one page of the index covers about six pages of original translation. These figures demonstrate that an index of either the FBIS-CC or a larger set of translations can be produced in a volume acceptable to users.Conclusions, based on production experience, include the following. (1) If he has had previous experience working with translated Chinese Communist materials, a potential indexer can perform well after about one week's practice. Similarly, in a week, a trained MT/ST typist reduced keyboarding errors to a satisfactory level. (2) A small number of general groupings emerged as potential subject-category divisions for the index. Modifying the format to incorporate these general headings might increase the index's convenience to users. (3) The success of an expanded version of the pilot program will depend on the sources and level of funding. It would probably require several months of lead time before the program could be fully operational, i.e., produce an index on a daily basis. (4) Once this point has been reached, the scope of service could be broadened. Quester can do Boolean searches, can accept extracts and abstracts, and -- with additional modifications -- can perform logical operations. These capabilities, if exploited, could make the system an active and powerful research tool in its own right. The more distant future might bring a remote access retrieval and display capability.Yearly costs are estimated for the indexing of three publications: FBIS-CC alone ($79,000); all six FBIS Daily Reports ($588,000); and all English-language translation series from Mainland China except the New China Agency's daily English output ($288,000). In all three cases, figures derived from the pilot project were extrapolated in a straight line to estimate the yearly costs. The annual cost for a single subscription, on the basis of a 1000-copy print run, would thus be roughly $79, $588, and $288 respectively, for the three publication alternatives. This probably makes the index a library item. A variable subscription rat is recommended, shifting some of the cost from individual to institutional subscribers, to government subsidy, or to other outside financial support.Four possible institutional locations of the indexing service are compared: U.S. government agency, university, not-for-profit research corporation, and profit-making research corporation. Each has built-in advantages and drawbacks. However, for economic reasons, the government locus might be considered first.While this experiment demonstrated the general validity of computer-indexing of Chinese materials translated into English, the results should not be viewed as suggesting that the IBM 360/65 or the Quester program should necessarily be used in a full-scale implementation. Technological growth in computer systems and related programs is such that more advanced techniques should be examined before any system is chosen for implementation.
Subject
Subject Indexing
China
English Language
Computers
Information Retrieval
Feasibility Studies
Indexes
Identifier
AD0718403
AD0718403
Abstract
An acute need exists for a detailed index to the heavy and continuing flow of materials currently translated from Mainland China sources. As with many foreign sources, these materials come in many small fragments, and their timeliness makes it important to arrange and index them quickly in order to be of maximum use to the analyst. Only in the last few years have computer hardware and programs been brought together for the preparation of such a social science index at less than prohibitive cost. This Memorandum reports on a pilot project to apply these computer techniques to the construction of such an index and to estimate costs of production on a regular, large-scale basis.A useful scheme for indexing translation series must have the following attributes: high user acceptability, low cost in each phase of production; an easily learned and efficient procedure; a sufficiently sophisticated program; and cost-effective equipment. Before the pilot project began, a number of operational choices were made. (1) A predefined set of index categories was rejected in favor of an open-ended subject list drawn directly from the text. (2) To maximize efficiency of off-line inputs and to reduce costs, the IBM MT/ST (Magnetic Tape/Selectric Typewriter) was deemed the best available input device. (3) An internal Rand program, "Quester," was chosen and modified to handle the data. (4) The IBM 360/65 computer was selected because it was immediately available at Rand and is a widely available, large-capacity, high-speed machine. (5) A computer-driven phototypesetting capability was built into the system.The project was divided into two parts. The first introduced enough dtaa into the system to provide material for initial checking of all operations. It was necessary to convert the Quester program, to the desired format, make it compatible with the 360/65 computer, write various peripheral programs, and gain experience with the MT/ST input equipment and the RCA Spectra 70 phototypesetting output program. The second was devoted to a trial run of one month of data from the translation series, Foreign Broadcast Information Service -- Communist China, Daily Report (FBIS-CC) (the 21 issues for January 1969) to ascertain costs, provide an adequate presentation, and work out standard procedures for more than one indexer. The results demonstrated the feasibility and desirability of the integrated scheme, and the costs for large-volume daily output were estimated to fall within the desired range.The original 1064 pages of raw material were indexed in 173 pages of printout, including all input data and an alphabetical index of subject categories. Thus, one page of the index covers about six pages of original translation. These figures demonstrate that an index of either the FBIS-CC or a larger set of translations can be produced in a volume acceptable to users.Conclusions, based on production experience, include the following. (1) If he has had previous experience working with translated Chinese Communist materials, a potential indexer can perform well after about one week's practice. Similarly, in a week, a trained MT/ST typist reduced keyboarding errors to a satisfactory level. (2) A small number of general groupings emerged as potential subject-category divisions for the index. Modifying the format to incorporate these general headings might increase the index's convenience to users. (3) The success of an expanded version of the pilot program will depend on the sources and level of funding. It would probably require several months of lead time before the program could be fully operational, i.e., produce an index on a daily basis. (4) Once this point has been reached, the scope of service could be broadened. Quester can do Boolean searches, can accept extracts and abstracts, and -- with additional modifications -- can perform logical operations. These capabilities, if exploited, could make the system an active and powerful research tool in its own right. The more distant future might bring a remote access retrieval and display capability.Yearly costs are estimated for the indexing of three publications: FBIS-CC alone ($79,000); all six FBIS Daily Reports ($588,000); and all English-language translation series from Mainland China except the New China Agency's daily English output ($288,000). In all three cases, figures derived from the pilot project were extrapolated in a straight line to estimate the yearly costs. The annual cost for a single subscription, on the basis of a 1000-copy print run, would thus be roughly $79, $588, and $288 respectively, for the three publication alternatives. This probably makes the index a library item. A variable subscription rat is recommended, shifting some of the cost from individual to institutional subscribers, to government subsidy, or to other outside financial support.Four possible institutional locations of the indexing service are compared: U.S. government agency, university, not-for-profit research corporation, and profit-making research corporation. Each has built-in advantages and drawbacks. However, for economic reasons, the government locus might be considered first.While this experiment demonstrated the general validity of computer-indexing of Chinese materials translated into English, the results should not be viewed as suggesting that the IBM 360/65 or the Quester program should necessarily be used in a full-scale implementation. Technological growth in computer systems and related programs is such that more advanced techniques should be examined before any system is chosen for implementation.
Creator
Robinson, Thomas W.
Publisher
Santa Monica, CA : The Rand Coporation
Date
1970
Format
xi, 214 pages ; 28 cm.
Type
report
Date Issued
1970-12
Corporate Author
The Rand Coporation
Report Number
RM-6332-PR
Contract
F44620-67-C-0045
NTRL Accession Number
AD718403
Distribution Conflict
No
Access Rights
THIS DOCUMENT HAS BEEN APPROVED FOR PUBLIC RELEASE AND SALE; ITS DISTRIBUTION IS UNLIMITED
Index Abstract
Contrails and DTIC
Photo Quality
Not Needed
Distribution Classification
1
DTIC Record Exists
Yes
Report Availability
Full text available by request