We have already mentioned information retrieval, but there are other
information seeking processes for which the decomposition in
figure 1 is appropriate. One of the most familiar is
the process of retrieving information from a database. The
distinguishing feature of the database retrieval process is that the
output will be information, while in information filtering (or
retrieval), the output is a set of entities (e.g., documents) which
contain the information which is sought [3]. For
example, using an library catalog to find the title of a book
would be a database access process. Using the same system to discover
whether any new books about a particular topic have been added
to the collection would be an information filtering process. As this
example shows, database systems can be applied to information
filtering processes, and we will present examples of this in
section 4.
Another process that can be described using figure 1 is information extraction. The information extraction process is similar to database access in that the goal is to provide information to the user, rather than entities which contain information. It is distinguished from the database access process by the nature of the sources from which that information is obtained. In the database access process information is obtained from some type of database (e.g., a relational database), while in information extraction the information is less well structured (e.g., the body of an electronic mail message). Information extraction techniques are sometimes found in the selection module of a text filtering process, helping to represent texts in a way that facilitates selection.
One interesting variation on the information extraction and database
access processes is what is commonly referred to as ``alerting.'' In
the alerting process the information need is assumed to be relatively
stable with respect to the rate at which the information itself is
changing. Monitoring an electronic mailbox and
alerting the user whenever mail from a specific user arrives is one
example of an information alerting process. Presenting mail from that
user first in a sorted list would be an example of information
filtering.
Database retrieval, information extraction, and alerting techniques all inform text filtering practice, and three benefit from advances in text filtering research. We do not intend to comprehensively review those research areas, but we do occasionally mention how relevant technologies developed to support those processes can be applied to support the information filtering process.
Finally, ``browsing'' is another information seeking process for which the decomposition shown in figure 1 is appropriate. Since browsing can be performed on either static or dynamic information sources, browsing has aspects similar to both information filtering and information retrieval. ``Surfing the World Wide Web'' is an example of browsing relatively static information, while reading an online newspaper would be an example of browsing dynamic information. The distinguishing feature of browsing is that the users' interests are assumed to be broader than in the information filtering or retrieval processes. Precisely what is meant by ``broader'' is difficult to define, however, and the distinction is often simply a matter of judgement. In order to sharpen the distinction for the purpose of this report, we propose an operational definition of browsing. When an interest is so broad that it cannot be represented effectively in an information filtering (or retrieval) system, we will refer to the information seeking process as browsing rather than as filtering or retrieval. In other words, we propose that researchers seek to characterize the broadest interests for which their information filtering systems are useful, and then refer to the limitations they discover in that way as the dividing line between filtering and browsing for their system.