In a field as diverse as information filtering it is inevitable that a rich and sometimes conflicting set of terminology would emerge. Sometimes this is simply the result of differing perspectives, other times new terminology is needed to convey subtly different meanings. For example, ``information retrieval'' is sometimes used expansively to include information filtering. But it is also commonly used in the more restricted sense that we have defined. Information filtering is alternatively referred to as ``routing'' (with a heritage in message processing) as ``Selective Dissemination of Information'' or ``SDI'' (with a heritage in library science), as ``current awareness,'' and as ``data mining.'' Sometimes routing is used to indicate that every document goes to some (and perhaps exactly one) user. Information filtering is sometimes associated with passive collection of information, and is sometimes meant to imply that an all-or-nothing (i.e., unranked) selection is required. SDI is sometimes used to imply that the profiles which describe the information need are constructed manually. The use of ``current awareness'' is sometimes meant to imply selection of new information based solely on the title of a journal, magazine, or other serial publication. And ``data mining'' is sometimes taken to imply that vast quantities of information are available simultaneously. All of those interpretations have a historical basis, but it is not uncommon to find these terms used to describe systems which lack the distinguishing characteristics of their historical antecedents. We shall avoid this problem by referring to all of these variations as ``information filtering.''
Taylor defined four types of information need (visceral, conscious, formalized, and compromised) that reflected the process of moving from the actual (but perhaps unrecognized) need for information to an expression of the need which could be represented in an information system . In common use, however, application of the terminology is unfortunately not nearly so precise. The visceral information need is often referred to as an ``interest'' or simply as an ``information need.'' But it is occasionally referred to as a topic, a term that is sometimes (e.g., in the TREC evaluation we describe in section 4) used to describe the formalized (i.e., the human expression of) the information need. And in some experimental work, the visceral information need is referred to as a ``query'' even though ``query'' is the traditional term for Perry's concept of a compromised information need that could be submitted to an information retrieval system. In this report, we use ``interest'' and ``information need'' interchangeably to refer to the visceral information need, and reserve the use of the terms ``topic'' and ``query'' for their more specific meanings.
In an information filtering system, the system's representation of the information need (i.e., the compromised information need) is commonly referred to as a ``profile.'' Because the profile fills the same role as what is commonly called a ``query'' in information retrieval and database systems, sometimes the term ``query'' is used instead of ``profile'' in information filtering as well. It would not be technically correct to call the profile a ``user model'' because a user model consists of both a representation of the users interests and a method for interpreting that representation to make predictions. But that usage occasionally appears as well. We shall avoid confusion on this subject by using only the term ``profile'' when referring to the compromised information need in the context of information filtering.