next up previous
Next: Case Studies Up: CAR-TR-830 CLIS-TR-96-02 CS-TR-3643 Previous: Terminology

Historical Development


Luhn introduced the idea of a ``Business Intelligence System'' in 1958 [25]. In Luhn's concept, library workers would create profiles for individual users, and then those profiles would be used in an exact-match text selection system to produce lists of new documents for each user. Orders for specific documents would be recorded and used to automatically update the requester's profile. Foreshadowing later concerns about privacy, he also observed that a set of profiles could be used to identify which users had expertise in specific areas.

Luhn's early work identifies every aspect of a modern information filtering system, although the microfilm and printer technology of the day resulted in significantly different implementation details. In describing the function of the selection module as ``selective dissemination of new information'' he coined the term which described this field for nearly a quarter century.

A decade later, widespread interest in Selective Dissemination of Information (SDI) resulted in creation of the Special Interest Group on SDI (SIG-SDI) of the American Society for Information Science. Houseman's 1969 survey for that organization identified 60 operational systems, nine of which served over 1,000 users each [18]. These systems generally followed Luhn's model, although only four of the 60 implemented automatic profile updating, with the rest about evenly split between manual maintenance of the profiles by professional support staff or by the users themselves. Two factors had led organizations to make this investment in SDI: the availability of timely information in electronic form, and the affordability of sufficient computing capability to match those documents with user profiles. These are the same factors motivating information filtering today, although distribution of scientific abstracts on magnetic tape (the dominant source of external information at the time) has been replaced by nearly instantaneous communications across large networks of interconnected computers.

Denning coined the term ``information filtering'' in his ACM President's Letter that appeared in the Communications of the ACM in March of 1982 [7]. Introducing the new ACM Transactions on Office Information Systems, Denning's objective was to broaden a discussion which had traditionally focused on generation of information to include reception of information as well. He described a need to filter information arriving by electronic mail in order to separate urgent messages from routine ones, and to restrict the display of routine messages in a way that matches the personal mental bandwidth of the user. Among the possible approaches he identified was a ``content filter.'' The remaining six techniques (hierarchical organization of mailboxes, separate private mailboxes, special forms of delivery, importance numbers, threshold reception, and quality certification) all required the cooperation of the other users, and hence would better be studied from a more global perspective the receiver's local scope of action represented by the information seeking model in figure 1. We shall have more to say on Denning's other approaches in section 5.3.2.

Over the subsequent decade, occasional papers on information filtering applications appeared in the literature. While electronic mail was the original domain about which Denning had written, subsequent papers have addressed newswire articles, Internet ``News'' articles,gif and broader network resources [9,19,30,43]. The most influential paper of this period was published in the Communications of the ACM by Malone and others in 1987 [26]. There they introduced three paradigms for information selection, cognitive, economic, and social, based on their work with a system they called the ``Information Lens.'' Their definition of cognitive filtering, the approach actually implemented by the Information Lens, is equivalent to the ``content filter'' defined earlier by Denning, and this approach is now commonly referred to as ``content-based'' filtering. They also described an economic approach to information filtering, a generalization of Denning's ``threshold reception'' idea, that had implications beyond the scope of the information seeking system model in figure 1. We describe the economic issues related to information filtering briefly in section 5.3.3.

The most important contribution of Malone and his colleagues was to introduce an alternative approach which they called social (now also called ``collaborative'') filtering. In social filtering, the representation of a document is based on annotations to that document made by prior readers of the document. They speculated that by exchanging this sort of information, communities of shared interest could be automatically identified.gif If practical, social filtering would provide a basis for selection of information items, regardless of whether their content could be represented in a way that was useful for selection. The balance between content-based and collaborative filtering is an important unresolved issue, and we will have much more to say on the relative merits of the two approaches in the sections that follow.

Large-scale government-sponsored research on information filtering also began in this period. In 1989 the United States Defense Advanced Research Projects Agency (DARPA) sponsored the first of an ongoing series of Message Understanding Conferences (MUC) [23,17].gif The principal thrust of those conferences has been use of information extraction techniques to support the selection of messages. In 1990, DARPA launched the TIPSTER project to fund the research efforts of several of the MUC participants [12]. TIPSTER added an emphasis on the use of statistical techniques to preselect messages that could then be subjected to more sophisticated natural language processing. In TIPSTER, this the preselection process is known as ``document detection.'' In 1992 The National Institute of Standards and Technology (NIST) capitalized on this research by co-sponsoring (with DARPA) an annual Text REtrieval Conference (TREC) focused specifically on text filtering and retrieval [13].

So for the first decade after Denning identified networked information as an important application for filtering technology, information filtering was either addressed episodically or included as part of a broader research effort. Finally, in November of 1991, Bellcore and the ACM Special Interest Group on Office Information Systems (SIGOIS) jointly sponsored a workshop on ``High Performance Information Filtering'' that brought together a substantial quantity of research to establish a basis for the explosive growth the field has experienced in the past five years. Forty contributors examined the area from a wide variety of perspectives, including user modeling, information selection, application domains, hardware and software architectures, privacy, and case studies. A year later, in December of 1992, expanded versions of nine papers from that workshop appeared in a special issue of the Communications of the ACM [1,2,4,10,11,24,31,36,37].

next up previous
Next: Case Studies Up: CAR-TR-830 CLIS-TR-96-02 CS-TR-3643 Previous: Terminology

Douglas W. Oard
Sun Apr 27 13:18:52 EDT 1997

Web Accessibility