From Data Processing to Information processing

The Move from Data Processing to

Information Processing


P  Trehin


Data processing applications have moved from an batch environment to an interactive end-user environment. The new users are demanding different services from the system: they want more than data; they want information, that is: a pertinent answer to the questions they ask. Systems (hardware and software) should be designed accordingly.

The increase in compute power availability and the multiplication of computer interfaces is not in itself going to help the end users, if anything it will confuse them even more.

Artificial intelligence methods have in their roots the desire to create systems that behave more intelligently in response to end users. On another hand, research in cognitive science has shown features of the perception and memorisation processes in human beings that were unknown just a few years ago.

The recent developments in artificial intelligence and in cognitive science should be integrated in the design of future computer systems.


Reduce Uncertainty

Most people associate the word INFORMATION with NEWS or eventually with COMPUTER  APPLICATIONS. But these are limited and very imperfect understandings of this concept.

Information started to have a richer meaning when C. Shannon and W. Weaver in 1948 published the Mathematical Theory of Communications (1) But it was L. Brilloin (2) with his interpretation of information as the opposite of entropy, who gave to information theory the status of a general theory.

The concept evolved from an engineering content, essentially based on a mechanical approach, to a broader meaning encompassing the the semantic aspects of the communications process as well. Information became synonymous with the reduction of uncertainty of a receiver after the reception of a message.  That generalisation of the concept of information is consistent with a realistic representation of the interactions of man with his environment (A. Moles) (3).

These interactions can be described as an quest for information, leading to a better control of actions in an uncertain universe. This information enables a plan of action through the construction of models of the environment (even if unconsciously). This ability to build a model (descriptive or normative), is the one that differentiate man from other living species.

In the enterprise world (whatever the type of human enterprise: business, research and development, education, medicine, legal and even leisure) the individuals involved have to face an uncertain environment:

       Is this product going to have a market?

       Is this experiment leading to some positive results?

       Are the studies preparing these teenagers for tomorrow's environment?

       Is that patient going to respond to this new treatment?

       Is this law applicable to this case?

       What type of equipment should I take for that mountain hike on Sunday?

All are situations of uncertainty.

I will not expand on this subject any more: information in the sense that has been exposed, uncertainty reduction, is a fundamental need of human beings, individuals, or organisations.


The Data Processing Syndrome

New technologies have a natural tendency to foster the interest of the people who have to use them. They often become experts at the new tool and tend to forget the primary reason that drove them to use it.

The pervasiveness of data processing applications has amplified this attitude. One must admit that the mastering of a complex tool is in itself a subject of great satisfaction. Hence many physicists, chemists, economists or other specialists who needed the power of a computer for their works became data processing experts.

Data processing evolved from a means of problem solving to a goal in itself. The production of data from various inputs only partially contributed to the information of the end users (Mavor, Kidd, Vaughan)(4). One's  uncertainty is reduced little, if at all, by the quantity of data produced. Information requires pertinent answers to the end users' questions.

The system complexity resulting from this approach was a positive reinforcement to the designers of data processing systems, finding there a quasi-aesthetic dimension to the application of their computer knowledge. The evolution of APL/360 to APLSV to VSAPL under CMS is an illustration of this phenomenon. APL evolved from a problem-solving language to a data processing system, extremely rich in functions but at the same time less user friendly.  (5, 6, 7)


The End User Itch

These same technological advances made possible the access to computer power to a different end users population either through interactive terminal systems (TSO, VSPC, CMS. . . ) or through the use of personal computers. This evolution has profoundly modified attitudes toward data processing.

End users, most of them non-data processing experts, have a very different goal than the creation of data. They are basically concerned with the acquisition and creation of information. This is exemplified by the reactions expressed at end-users' meetings: complaints about reliability (major cause of uncertainty in job achievements), calls for end-user support groups, calls for tools.

The end-users see the system in its totality, from the computer to the terminal. Hence ease-of-use encompasses all aspects of the system:  terminal hardware, connection to the computer (networking), data entry, storage and retrieval, and the printing of text and graphics.

This requirement for ease-of-use is often the subject of controversy with data processing professionals, as it is obvious that it inevitably uses more computer resources, leading to a lesser pure data processing efficiency.


The Pertinent Answer

The concept of pertinent answer bears fundamental consequences for the design of an information processing system. Considering that,  People communicate via, not with, computer systems . (Branscomb & Thomas) (8) is a giant step toward this goal. In order to communicate efficiently the sender and the receiver have to have an adequate channel and a common repertoire. The computer systems need to be able to respond to these conditions.

The repertoire concept can be viewed in a narrow sense as in the OSI definitions of Document Interface Architecture / Document Content Architecture, but also in the more general sense of

The channel constituted by the system can equally be considered in a very technical sense, as only subject to physical noises, in a bits-in / bits-out. measure of quality, but here again it is necessary to consider the computer system not as a transparent channel but as an active one.

Indeed, the interactions with the computer, in the present state of development, are a long way from natural language, where the acquisition of information can be either voluntary or incidental, where the transmission of information does not require so much learning of techniques totally foreign to the subject to be communicated.

These remarks entail a few subsequent ones:

1.     The system must be designed with the counseling of experts in the specific end-users' various fields, hence using their languages.

2.     The system must be able to learn from interactions with the end users, evolving with them in such a way that the answers given by the system keep their pertinence.

3.     Interactions with the computer should require very little learning of programming, system commands, and other hardware dependent knowledge.


Cognitive Processes and Information Processing

The recent progress in artificial intelligence has come as much from technological advances in computer science as from progress in the understanding of the human brain (9). These conversely were by and large made possible by some advanced computer techniques. (computer analysis of EEG graphs, scanners outputs. . ) (10)

In fact we can observe a convergence of research in psychology, neurology, neuro-biology, linguistics, neuro-linguistics and of course artificial intelligence, around the field of knowledge acquisition and representation.

Cognitive process studies will be key to the development of the future computer systems, they will permit the design of systems adapted to interface with the peculiarities of the human knowledge acquisition process, but also by using in the computer architecture some of the findings of the cognitive sciences (11), (12).

Advances in the understanding of perception (visual, auditory tactile), information processing in the brain (13) and memory processes (short and long term ) (14) will have a direct impact on the design of computer systems from the workstations hardware to the data base architecture and the communication networks.

Knowledge acquisition and representation research may also lead to an easier and more efficient creation of INFORMATION BASES, (15, 16) with incidental or programmed acquisition of knowledge by the system.

It is evident that progress in this type of research is dependant on fundamental new discoveries which are by nature very difficult to predict. It is not necessary to wait for new advances in AI or cognitive science to start to take advantage of some of the techniques and concepts used in these fields.  (17)

Data base inquiry languages and architecture would benefit from answers to questions like: What does this specific end-user want to know? not any end-user but the one using the data base now.

This would require the system to learn from the interactions with the end-user, to establish a dynamically updated user profile similar to the one used in automatic documentation (ITIRC) where a fixed profile is used to extract periodically papers from a documentation data base. The system should be able to make inferences based on the direct question asked and the knowledge that it has of this user.

Another potential use of AI and cognitive techniques would be in the development of a better communication of the end-users with the computers, in a language that would be as close as possible to a natural language.


Natural Language Versus Natural Languages

The difficulty of the problem has limited the results in the development of natural languages as means of interaction with computers.(18)

Beyond the trivial observation of the multiplicity of national languages, one must admit that there is no natural language but several natural languages. Different social groups have, over time, invented various languages. Furthermore, group membership is time dependent. The same words used by an engineer at work may convey a different meaning when used at his bowling club.(19)

In fact the intersection of the various repertoires is a very narrow body of basic concepts. Communication between different social groups is however established at a level that the statistical analysis of the common parts of their languages could not explain. This high level of communication is only possible because human beings have the capacity to learn languages, their repertoire is neither infinite nor frozen, the participants in a conversation can learn or remember the elements of language that are not part of their usual repertoire and are used in the ongoing conversation, usually without having to ask a question.

We get back to one of the fundamental aspects of AI: in order to understand a natural language, the computer system will have to be able to learn the specifics of that language from interactions with the end-user.

This idea of the multiplicity of natural languages is present in the concept of expert systems where very specialised knowledge areas are addressed in the specific language of a particular field. I believe that expert systems are not an interim solution driven by existing technology, even with more powerful computer systems there will be a fundamental need for them in order to cope with the diversity of languages.

It is possible to envision the integration of many expert systems in a comprehensive system, but that, in the state of development of AI, is still science fiction.

Natural languages have another characteristic: they are multimodal, that is that more than one sensory mode is used in the communication process. We talk and listen but we also act (make gestures, show drawings, objects, people. . . )and look. In fact all our senses can be used for communication. Combination of senses makes the communication process easier, hence the preference for "face to face" over telephone conversation. But even there we have a multimodal communication: voice intonation can change the meanings of words.

The closer we get to a single modality, the harder the communication process becomes. We have to introduce a higher level of redundancy in our messages in order to compensate for the loss of other information normally transferred by the other perceptual modalities.

The behaviour of a person during a phone conversation is illustrative of my purpose: the person keeps acting with gestures, makes faces, shows on his graph the problem as if his correspondent was able to see him...

In the case of written messages some of the redundancy can be eliminated as the receiver will be able to look a few lines or pages behind if there is a point that he does not understand. In a voice only message, the receiver will have to rely much more on short term memory if the same situation occurs.

So surprisingly a voice recorded message may be further apart from a natural language than a written message, specifically if the written message makes use of graphical representation.

In all cases, perception is an integrative process; both past and present context are combined with the direct stimulus received by our senses from the environment.  (20)

The development of computer languages in the direction of more natural languages, may have to use multimodal communication procedures. Communication with and through computer systems would be enhanced by the use of several and simultaneous modes of input/output, processing and storage.

Perhaps, some day, the computers will be able to react like Proust with his famous "Madeleine".


The Challenges of Information Processing

The basic question remains, will we be able to design computer systems that will give to the end-users the INFORMATION they are looking for? (21) Will we be able to minimise the intellectual detour that communication with and through computers requires today.

These are challenges that the designers of expert systems, natural languages, voice recognition subsystems, and advanced robotics will have to face.  (22) I do not believe that these challenges will be met only by technological breakthrough. To a large extent, the solutions will depend on tedious work. Redefining file after file, record after record, field after field will have to take place to create a better knowledge representation of our environment. Better access methods will have to be designed where ease of use will have to be a constant preoccupation.

The need for programmers will be phenomenal, but like for foreign languages, one has to be able to speak of a specific subject in that language in order to communicate, the pure knowledge of programming languages will be vastly insufficient. It will be valuable only if this language is used to convey a specific knowledge, scientific, technical, medical, economic or financial, legal, sociological or psychological. We will need programmers with some good knowledge of at least one of the fields mentioned above.

Technological advances and fundamental research will of course have a crucial role to play as AI machines will need several orders of magnitude more memory and speed than the present technology provides. We will also need the development of new high technology interfaces for input-output.

However both the laborious development of techniques based on state of the art technologies and the discovery of new ones will not solve the problems of human communication through computer based systems. It is only with the cooperation with the researchers in cognitive science that progress will be achieved.  (22)

This will require extensive multi-disciplinary researches built around efficient communication between the various scientific communities in order to avoid the building, with AI of a modern tower of babel.



1        E. Shannon & W. Weaver, The Mathematical Theory of Communication, The University of Illinois press -- 1949  1969 ed.

2         L. Brilloin, Science and Information Theory,NY Academic Press 1962

3         A. Moles, Theorie Informationelle de la Perception  in Le Concept d'Information dans la Science Contemporaine, Les Cahiers De Royaumont, Gauthier Villars, 1965

4         O. Mavor, J. Kidd & W. Vaughan, Cognitive Models of Scientific Work and Their Implications for the

5        Design of Knowledge Delivery Systems,Technical Report Report Associates, Annapolis, MD, Oct 1981

6         APL 360 Users Manual, IBM Ref GH20-0906-1

7         APLSV / VSAPL User Manuals IBM Ref SH20-1460-0, GE09-0124-0

8         L. Branscomb & J. Thomas, Ease of Use: A System Design Challenge, IBM System Journal, Vol. 23 No 3, 1984

9         D. Waltz, Artificial Intelligence, Scientific American, 1982, Vol 241

10     Saltzberg, Brain Spiking: EEG Spectral Coherence, Ph.D. Dissertation, Texas Research Institute of Mental Sciences

11     Editorial on: Artificial Intelligence, Psychological Medicine, 1981, 11, 449-453

12     Y. Wilks, Programs and Texts, Computer and the Humanities, Vol 11, 259-263 1978 (Perg.  Press)

13     W. Herbert and J. Miller, Remembrance of Things Partly, Science News, Vol 124, Dec 10, Dec 17 1983

14     The Biochemistry of Memory: a New and Specific Hypothesis, Science, Vol 224, June 1984

15     M. Colombetti, Knowledge and Thought in a Cognitive System, Cybernetica, Vol XXVI, 1, 1983

16     A. Newell, The Knowledge Level, Artificial Intelligence, 18, 1982, 87-127

17    P. Strurdza, Data Dictionary Design with An AI Model, Proc. of the 16th An. Hawaii Conf. on Sys. Sciences, 1983

18     T. Winograd, Computer Software for Working with Language, Scientific American , 9, 1984 (Computer Software Issue)

19     J. Morton, Le Lexique Interne, La Recherche, n143, 4, 1983

20     J. Gerrissen, Theory of the Human Global Analysis of Visual Structure,IEEE on Systems Man and Cybernetics Vol SMC12, N6, Nov - Dec 1982

21    J. Ralloff, The Fifth Generation, Science News, Vol 125, May 26, June 2, June 16 1984

22    B. Kursunoglu, Interdisciplinary Study on AI, Pres. at Workshop on Biological Dimension of AI

23     K. Nakamura, A. Sage, S. Iwai, An AI Data-Base Using Psychological Similarity between Data, IEEE Vol SMC13, n4, Jul-Aug 1983