This report describes the design of systems to support the text filtering process with particular emphasis on the information selection component. Because such an emphasis might leave the reader with the mistaken impression that collection and display are lesser challenges, we pause briefly to describe the relationship between selection and the other two components depicted in figure 1.
Dynamic information can be collected actively (e.g., with autonomous agents over the World Wide Web), collected passively (e.g., from a newswire feed) or some combination of the two. Early descriptions of the information filtering problem implicitly assumed passive collection (c.f. [7,18]). As the amount of electronically accessible information has exploded, active collection has become increasingly important (c.f. ). Active collection techniques can benefit from a close coupling between the collection and selection modules because they exploit both user and network models to perform information seeking actions in a network on behalf of the user. In a fully integrated information filtering system, some aspects of user model design are likely to be common to the two modules. That commonality would provide a basis for sharing information about user needs across the inter-module interface. But because the purpose of the collection module is to choose whether to obtain information before that information is known while the purpose of the selection module is to choose information to retain for display to the user once that information has been collected, the user model for the selection module is not likely to be identical to the user model for the collection module. In the succeeding sections we will generally limit the discussion to systems which use passive collection techniques, both because this choice allows us to concentrate on the selection component and because there has been little reported on how the two components can be integrated.
Such a clean division is not possible for the interface between the selection and the display components, however. The goal of an information filtering system is to enhance the user's ability to identify useful information sources. While this can be accomplished by automatically choosing which sources of information to display, experience has shown that user satisfaction can be enhanced in interactive applications by using techniques which exploit the strengths of both humans and machines.
A personalized electronic conference system that lists submissions in order of decreasing likelihood of user interest is one example of such an approach. The automatic system can use computationally efficient techniques to place documents which are likely to be interesting near the top of the list, and then users can rapidly apply sophisticated heuristics (such as word sense interpretation and source authority evaluation) to select those documents most likely to meet their information need. If the system has produced a good rank ordering, the density of useful documents should be greatest near the top of the list. As the user proceeds down the list, selecting interesting documents to review, he or she should thus observe that the number of useful documents is decreasing. By allowing the human to adaptively choose to terminate their information seeking activity based in part on the observed density of useful documents, human and machine synergistically achieve better performance than either could achieve alone.
In other words, in interactive applications an imperfectly ranked list (referred to as ``ranked output'') can be superior to an imperfectly selected set of documents (referred to as ``exact match'' selection) because humans are able to adaptively choose the set size based on the same heuristics that they use to choose which documents to read. The choice of a ranked output display design imposes requirements on the selection module, however. Because the display module must rank the documents, the selection module must provide some basis (e.g., a numeric ``status value'') from which the ranking can be constructed. Display design is a rich research area in its own right, but our discussion of the issue is focused solely on aspects of the display design that impose requirements on the selection module.