Before describing how machine learning techniques have been applied to text filtering it is useful to consider more carefully how information about the user can be acquired. Rich defined a distinction between ``explicit'' models which are ``constructed explicitly by the user'' and ``implicit'' models which are ``abstracted by the system on the basis of the user's behavior'' . Both implicit and explicit user models are found in text filtering systems (SIFT, for example, uses an explicit model). The machine learning techniques we describe in section can be used to create what Rich called implicit models.
In order to construct an implicit user model we must be able to observe both the user's behavior and the salient features of the environment in which that behavior is exhibited. In the case of text filtering, the salient elements of the environment are the documents which have been examined by the user. Section 5.1 described how information about those documents can be acquired, either from contents or from annotations made by others.
In section 4 we presented several examples of how representations of previously seen documents can be combined with evidence of the user's interest in those documents to predict interest in future documents. With the exception of InfoScope, every system we have described requires the user to explicitly evaluate documents, a technique we refer to as ``explicit feedback.'' Explicit feedback has the advantage of simplicity. Furthermore, in experimental systems explicit feedback has the added advantage of minimizing one potential source of experimental error, inference of the user's true reaction. But in practical applications explicit feedback has two serious drawbacks. The first is that a requirement to provide explicit feedback increases the cognitive load on the user. This added effort works against one of the principal benefits of a text filtering system, the reduced cognitive load that results from an information space more closely aligned with the user's perspective. This problem is compounded by the observation that numeric scales may not be well suited to describing the reactions humans have to documents. For example, is a document which address the information need well but contains little expository text better or worse than a document that is easily understood but less complete? These difficulties motivate the study of implicit feedback mechanisms.
In his InfoScope system, Stevens observed three sources of implicit evidence about the user's interest in each message: whether the message was read or ignored, whether it was saved or deleted, and whether it was replied to or not. Because the users decision to read or ignore the message was necessarily based on a summary of the same message header information that InfoScope used to construct feature vectors, it would be reasonable to assume that the ``read or ignore'' decision would be nearly as useful as explicit feedback. InfoScope did, however, allow explicit feedback as well.
Morita and Shinoda also investigated implicit feedback for filtering Internet News articles, using both save and reply evidence but substituting reading duration for InfoScope's ``read or ignore'' evidence . In a six week study of eight users, they found a strong positive correlation between reading time and explicit feedback provided by the user on a four-level scale. Furthermore, they discovered that interpreting as ``interesting'' articles which the reader spent more than 20 seconds reading produced better recall and precision in a text filtering experiment than using documents explicitly rated by the user as interesting. This surprising result reinforces our observation that users sometimes have difficulty expressing their interest explicitly on a single numeric scale.
Since the experimental subjects were asked to read articles without interruption, it is not clear whether such useful relationships can be found in environments where reading behavior is more episodic. But Morita and Shinoda's results, coupled with the anecdotal evidence reported by Stevens, suggest that implicit feedback may be a practical source of features to which machine learning algorithms can be applied. Both implicit and explicit feedback produce features that are associated with documents. But unlike the feature vectors which describe the document's contents, feature vectors based on implicit or explicit feedback describe the user's reaction to the document.