CIA HISTORICALREVIEW'PROGRAM RELEASE AS SANITIZEO_
TITLE: On Processing Intelligence Informatic
AUTHOR: Paul A. Borel
VOLUME: 3 ISSUE: Winter
A cotlociion ol articles on Ihe historical, operational, doctrinal, and theoretical aspects ol intelligence.
All sutemcnis of fact, opinion or analysis expressed in Studies in Intelligence are those of
the authois They do not necessarily reflect olTicial positions or views of the Central Intelligence Agency or any other US Government entity, past or present. Nothing Lfl the contents should be construed as asserting or implying US Government endorsement of an article's factual statements and interpretations.
ON PROCESSING INTELLIGENCE INFORMATION Paul A. Bore!
Tho cycle of organizational activity for Intelligenceextends from the collection of selected Information to Its direct use In reports prepared for policy makers. Between these beginning and end activities thereumber ofwhich can be grouped under tbe term information proc* cssing. These functions include the Identification,organization, storage, recall, conversion Into more useful forms, synthesis and dissemination of the Intellectual content of the information collected. The ever-mounting volume of Information produced and promptly wanted and tbe high cost of performing these manifold operations areritical review of current practices In the processing field.
Storing and Retrieving Information
Efficient and economical storage and retrieval ofIs by all odds the toughest of the processing problems. Millions are being spent on it by the research libraries ofof industry, and of government. Even as we meet here today, an International conference is under way Inat which new means of storing and searching for scientific information are being discussed.
For Intelligence, storing and retrieving Information Is avexing problem. Our Document Division alonedally an average ofifferent mtelligencereceived in an average ofopies per document. This Is exclusive of special source materials, cables, newspapers, press /nimrnartes, periodicals, books, and maps. Since these reports come from scores of different major sources, the dairy vol tome fluctuates and shows lack of uniformity hi format, In reproduction media, tn length and quality of presentation, and In security classification. As they come tn they must be read
Processing Intelligence Information
with an eye to identifying material of Interest to somecustomer offices or
Weeneral horary of books and periodicals, whose operations approximate those of the conventional library. We have several registers (In effect special libraries) through which we handle special source materials, biographic data onand technicians, films and ground photographs, and data on industrial installations. Most of these materials areto control through indexes of IBM punched cards.
Weollection of two million intelligence reportsby mlcrophotography. Short strips of film are mounted in apertures on IBM punched cards filed in numerical sequence. Access to these cards, from which photocan be made. Is obtained through an organized index of IBM cards now numbering eight million. Thus access to the document itself Is Indirect, through codes punched Into the Index cards to Indicate subject, area, source, classification, date and number of each document The data on Index cards retrieved In responsearticular request is reproduced on facsimile tape and constitutes the bibliography given theThisseeks toiven request with the relevant "tateUIgence facts" oncall the Inx system.
These then are our assets. Ill say no more at this time about problems in connection with the genera! library, or those of operating our registers, since they are in manyvariations on the theme of our concern with theoperation of the Intellofax system.
Demands made on our document collection stem from three types of requests:
Requestspecific document to which the analysteference or citation;
Requestspecific bit of information in answerpecific question;
Requests for aU information relevantubject which may or may not be well denned. Our major difficulties are almost all connected with the last of these three, the one whichiterature search. In searching unclassified literature we rely on commerciallyreference aids, but In searching classified materials we
On Processing IntelligenceWMlf
use the Intellofax punched card Index. This Index we would use to retrieve, for example, Information responsive to afor "anything you have on the movement of Iron ore from Hainan to Japan5lassified through Secret, and exclusive of CIA source material."
Intellofaxigh-cost operation. Onlyoer cent of the questions put to the Information section of our Library are answered by literature search; yet someeople are used in the necessary coding, and anotheron IBM andoperations exclusively In support of Intellofax. On the other hand, some portion of this cost would be incurred In operating any alternative system even at minimum level; and Intellofax makes possible tbe organization of bibliographicIn various forms and at speeds which would notanual system.
Search results, however, are not uniformly accurate. We recently tested the accuracy of the Intellofax system bya task team of three analystsesearch officea controlled experiment. Five subjects, corresponding to common types of reports produced by that office, wereThe test Indicated quite conclusively that the system does an efficient Job of retrieving documents referring toobjects or categories (trucks, factories, serialut that It Is less satisfactory inore generalsuch as industrial investments inomparison with the analysts' own files showed very satisfactoryperformance in retrieving documents placed in the system, but some documents In the analysts' files were not retrieved. Reruns with the same code patterns yielded consistent results.
The Inaccuracies of the Intellofax system reflected to the above and other tests can be reduced by revising procedures and Improving supervision, but they cannot be eliminatedIn literatureet of symbols assigned todocuments is used to provide the searcherlue to the pertinence of any document to the request he is servicing. This set of symbols is in the nature of an index, but different people viewing these symbols may give theminterpretations This makes the problem complex, for the determination that thereeaningful relationeven two pieces of information depends on many dlfler-
Proceoiog IniefNgence Information
ent, often subtle criteria which elude unequivocal
Tbe solution of the accuracy problem would appear to turn on the ability toaster set ofode, large enough to cover an extremely wide variety of subjects and areas and small enough to be contained on an index card, one applicable to diverse documents containing fragmentary,and often seemingly unrelated information, and at the sameconducive to uniform application initially by those coding Incoming documents and later by those seeking tothem. To prepareodeough assignment today. The Job ts not likely to be easier for some time.
It Is relevant at this point to invite your attention to the views on this subject of the Working Party organised last
books ol reference
and finalized intelligence reports. It would be impracticable to try and include the wetter of documents from which such finished reports are buOf up; evenere practicable, it would be an immense task beyond our resources'
I disagree. Not as to the difficulty of the task or itshigh cost, but as to Itselieve the solution lies In a) selectivity in Identifying those documents to bo held by the Center, and b) the organization of those documents Into discrete collections, each controlled by an index suitable to Its particular requirements. This Is the ap-roach we have takers, more by accident than by design. Such an approach makes it possible to cope with small problems, even though the big problem may stiH be unmanageable.
Reference Service and the Research Function
Where central reference services have been organizedof research offices, it soon becomes evident that the functional line Of demarcation between them and the research units Is not clear. This becomes Important when It results In
Modern Methods of Mendtm$5, yarn. &
On Processing fnfeJ/igonce Information
duplication of effort or. worse, In non-use of referenceby tbe researcher laboring under tbe mlsimpresslon that be has all relevant documents In his possession Today'slike his predecessor, feels insecure without flies which he can call his own. Inituation we mustroper regard for tradition, but sometimes it Is difficult to distinguish tradition from inertia. Recently our Biographic Register,eport publishedesearch office, found that failure on the part of tbe author to check tbe Register flies had resulted in some one hundred errors or omissions.
It must be decidedeference service is to be active or passive, dynamic or static. Toimpleassive approach to reference service would mean thatpersonnel would merely keep the stacks of the library in order, leaving It to research analysts to exploit the collection. Under the active approach, on the other band, referencewould discuss the researcher's problem with him and then proceed, as appropriate, toibliography, gather apparently pertinent documents, screen them, check with colleagues in other departments for supplementary materials, make abstracts, have retention copies made of popular items In short supply.equirement for supplementary field service, or prepare reference aids. In CIA we aim at active rather than passive reference service How active we arearticular caseunction of the customer's knowledge of our services, his confidence in us, and bow pressed be is to get the Job done.
eparate facility has been set up to provide reference services it Is not long before It publishes. This comes about for several reasons, the least controversial of which isustomer haspecific request. Thus our sciencemay callompilation of biographic data on themost likely to represent the Soviet Union at aInternational conference on the peaceful uses of atomic energy. We call this type ofesearch oraid. Some are quite specific; others arc more general, being prepared in responseeed generallyumber of different customers may, for example, make known that it would be very helpful toeriodic compilation of all finished Intelligence reports and estimates for ready refer-
On Pweuing InteWigence Information
encc. Or tbe need may be Implied rather than expressed: tbe reference analyst may note thateriod of time theon him for biographic data about Soviet scientists is heavy, many requests calling for much the same information furnished earlier to others. The result: the productionajor reference aid along the lines of our "Soviet Men of Science.'" And naturally It lant longevised edition is called for.
Criteria for determining when and when not to summarize Information holdingseneral reference aid are elusive. It Is similarly difficult to define the proper scope of the general reference aid. How far can it go before the researcherit an Infringement on tbe research activity for which be is responsible? This question has Implications beyond those readily apparent. Quite basic Is the feeling among research personnel that they and their missionut above theofficer and hisanifestation of this attitude Is the steady flow of competent people out of reference intowithrickle corning the otheroubt whether the ^consistency of this position Is appreciated to view of the Joint effort required by research and referenceto provide the soundest base possible for the research effort.
In my view the legitimate limits of the reference aid can best be arrived at in terms of the highest level of serviceof the reference officer. Stated simply It Is this: to make known the availability of services and information the existence of which may be unknown to the researcher; and,ask, to make the prdlmlnary selection of materials to meet the particular needarticular user. This may Involve bulk-reduction operations (such as abstracting) tomaller quantity of material containing everything pertinent to the user's problem, or conversion operations (such' as translation) to get Information in usableould even say that the reference function includesevaluation of the reliability of information. To themust be left tbe determination of its significance for the present; to the estimator Its significance for the future; and to the policy-maker the Indicated course of action.
On Processing Intelligencet IWT
Machine appffcoiion to Documentation Problems
In processing intelligence information. Increases inmay depend upon the adopUon'of techniquesThis Is especially the case when savings ofsought. But ss toon as you consider automation, thatInclusion In your processing systemachine aspart of It. you are faced with the need to makedifferent In nature from those made with respectdesirability of expanding staff orifficult problem to achieve an optimum balanceman and machine Among the manythere are two Important ones which ought toseldom are, fully explored before you commit yourself toshould accurately determine theor loss in Urmi of time, space, manpower, andyou should be fully aware of the limitations of theand of Its use by man. It Is often more importantwhat cannot be done with the machine than toto what can.
ould again Incline to dlsaj
view of the greet Initial Investment needed to launchechanized referencehe very large andrequirement for coding, maintenance and otherskill and the inevitable limitations of machinery when applied to tnUtUgence processes, we do not think theofystem merits further examination.
No one would argue that large Investments should be made In schemes unless they hold promise of relieving majorAnd the demandsechanized reference system for special skills are admittedly both high and persistent. However, these factors should be weighed In terms of the relative costs, not only the cost of alternative ways to solve the particular documentation problem, but also the cost of not solving it at alL We take exception to the conclusion that the limitations of machinery when applied to Intelligencearee also believe it unwise to categorl-caBythe Introduction of machinery as not meriting
Processing Intelligence, Information
further examination, Limitations there are today and will continue to be. But those which arc inevitable are fewer than Is. generally supposed. Only by daring apd" risking will we come to know how few are the real limitationsechanized approach to documentation. This philosophy is yieldingdevelopments in the fields of rniexophotographic storage, automatic dissemination, abstracting, and translation, all fields of particular concern today.
MiCTOphotographg. Both Air mtelligence and CIA arca system developed by Eastman Kodak known as Mini card. This system In essence2 mm film strip for the present CIA system of IBM punched Index cardsto hard copy or film in the document storage file. Self-indexing Mini card document images are readnot mechanically as IBM cards are. The characteristics of Minlcard makeeduction of space requirementsactor ofnd an Increase In speed of handlingactor ofhe new system Is capableevel ofmanipulationegree of coding sophistication which gives promise of radically augmenting the contribution of the Information fragment to tbe solution of reference problemsearch of the literature. And, contrary to present practice, the integrity of the file is maintained at all times.
Automatic Dissemination, Air Intelligence Is testing aData Processing Set designed by Magnavox. Thiseneral-purpose computer especially designed for problems requiring close correlation. Requests for Information form the reference file against which to coming documents must be compared. Up0 words specifying the subjects and areas of Interest, other qualifying data (such as evaluation or type of copynd user Identifications are stored tothe requirementssers.ocument Is to be disseminated, its subject and area coverage, previously coded and punched into paper tape, is fed into the machine. The machine searches its file of requirements and printsist of those who have requestedocument, the total number of copies needed, and the form In which it Is wanted Speed and uniformity of performance rather than financial economy is what the Air Force is after to this case.
On Processing Intelligence
Automatic Abstracting. Army mteUigence and IBM are working on means (or producing, entirelyhy automatic means, exceipts of Army field reports that will serve" the purposes of conventional abstracts.ecent demonstration thetexteport, in machine-readable form, was scanned by anata-processing machine and analyzedtandard program. Statistical information derived from word frequency and distribution was used by the machine toelative measure of significance, first for individual words and then for sentences. Sentenceshighest in significance were extracted and printed out to become thedoption of this method ofabstracts of overseas reporting would require the uselexowriter in the field. When the original report Is typed onlexowriter tape would beyproduct and would accompany the report to headquarters. There tapes in sequence would be fedomputer and auto-abstracts printed out.
Mechanical Transon. The only successful Free World demonstration of machine translation to date took place onontinuous passageentences taken from Russian chemical literature was translated by the Georgetown University research group, under CIA andScience Foundation sponsorship. Anomputer was programmed with the appropriate grammatical, syntag-matic and syntactic rules,ussian-English vocabulary was introduced into Its memory system. The machinethe text, determined the lexical equivalents of the words, reconstructed the text, performed the necessary logical operations, and printed out the English translation. Only minor stylistic editing was required to make the productfavorablyranslation mademguist The rate of translation was0 words per hour. With improved Input equipment (readingates uper hour are foreseen as possible. Research hasstarted on mechanical translation from Polish, Czech, Serbo-Croatian, French, Arabic, and Chinese Soviet research tn this field Is considerably ahead of ours.
JJ^MIProcessing fnfeftgence Information
In dosing this general review of aspects of the Intelligence documentation problem, we should look briefly at certain trends which affect us all. First, channels for procuring pub-IIcations and techniques for storing and retrieving thedocument are extensive and well developed. Theoutlook Is for no basic change In ways and means in this field, but rather an expansion and mtensification of present methods.
Second, the type of reference or Information service coming to be required will demand action primarily In preparingpersonnel to give assistance of higher quality than Is given today. Reference tools will need to be improved also, but this Is likely tohereore sophisticated reference officer toemonstrable need for them The Increase in amount and kinds of material available will call for more intense exploitation of it by the research analyst; he In turn will by necessity rely tocreasingly on the reference officer for first-cut selection and evaluation. Referencewill therefore need greater subject competence, more language ability,ider training and experience in all aspects of Intelligence documentation.umber of American corporations are using information specialists as members of research teams. This approach deserves testing In intelligence.
Third. In the field of Uterature searching, specialized schemes will be developed to fit the needs of specialized users. While general theory will continue to be developed, pragmatic approaches to problems based on an analysis of the way users employ services and exploit materials will play an increasingly important role. Proved systems employed by referencewill be simplified and adapted for use try the Individual analyst to enable him to control the Uterature he requires in his immediate possession. The analyst in turn wfil provide the central system with the means of subject retrieval In hisfieldy-product ot tbe way he controls his files. In this field, machines will long continue toecondary rote.
Fourth, the present and future demands for referencewill lead to increased use of machines where these can be
On Processing Intelligence 'rrformcrfion
Introduced without jeopardizing tbe performance ol essential Intellectual operations. This fact and the to creasing volume of Information which must be processed will bring about more centralization. The problem then becomes one of Insuring that central reference is at least as responsive to research needs as the reference facility which is an integral part of the research area. The solution Is to be found to an approach which Integrates the utformatton-processtog activities,performed,togie system within which collection, processing, and user components operate along well-defined lines.Original document.