next up previous
Next: Text Filtering Technology Up: Case Studies Previous: Content-Based Filtering

Social Filtering

The Tapestry text filtering system, developed by Nichols and others at the Xerox Palo Alto Research Center (PARC), was the first to include social filtering [11,40]. Designed to filter personal electronic mail, messages received from mailing lists, Internet News articles, and newswire stories, Tapestry allowed users to manually construct profiles based both on document content and on annotations made regarding those documents by other users. Those annotations were explicit binary judgements (``like it'' or ``hate it'') that could optionally be made by each user on any message they read.

Like InfoScope, Tapestry profiles consisted of rules that specified the conditions under which a document should be selected. One important difference was that Tapestry allowed users to associate a score with each rule. Tapestry then generated ranked output by comparing the scores assigned by multiple rules. Tapestry implemented this sophisticated processing efficiently by dividing the filtering process into two stages using a client-server model. In the first stage, a central server with access to all of the documents applies a set of simple rules, similar to those used by SIFT, to determine whether each document may be of interest to each user. The more sophisticated rules in each profile are then executed in each users workstation (the client) to develop the ranked list.

Experience with several small scale trials of social filtering suggests that a critical mass of users with overlapping interests is needed for social filtering to be effective. Tapestry was restricted to a single site because both the content and the software were subject to proprietary restrictions, so only limited anecdotal evidence of the social filtering aspects of Tapestry's performance are available. The GroupLens project of Miller and others at the University of Minnesota is presently the most ambitious attempt to reach a critical mass on a dynamic information source [32].

GroupLens is designed to filter Internet News, a freely redistributable text source. Like Tapestry, GroupLens is built on a client-server model. GroupLens uses two types of servers, content servers (which are simply standard Internet News servers) and annotation servers (which have been developed for the project). The design permits both the content and annotation servers to be replicated so that each server can efficiently service a limited user population. Modified versions of some popular (and freely redistributable) Internet News client software are made available in order to encourage the development of a large user population, and implementers of other client software are permitted to incorporate the GroupLens protocol in their products.gif

GroupLens annotations are explicit judgements on a five-valued integer scale. Unlike Tapestry, however, the annotations need not be assigned an a priori interpretation. Users may register annotations with their annotation server using whatever semantics for the five values they wish. The annotation servers collect annotations from their user population, use correlation information to predict their user evaluations of unseen articles, and provide those predictions to client programs on request. The initial GroupLens trial began in 1996 using a limited number of newsgroups and a single annotation server. Results are not yet available, but the project's important contributions, distributed annotation servers, profile learning for social filtering, and a design which encourages development of a large user base, provide an excellent prototype for future work on social filtering.

One limitation of the existing experimental work on social filtering is user motivation. In GroupLens, users annotate documents in order to improve the performance of their filter's ability to learn from other clients who have annotated the same documents. This creates a bit of a ``chicken and the egg'' problem, though, since there is no incentive for the first user to annotate anything. If content-based and social filtering are integrated in the same system, however, then a synergy between the two techniques can develop. Tapestry demonstrated one way in which the two approaches can be combined when manually constructed profiles are used. The URN system, developed by Brewer at the University of Hawaii, illustrates a more automatic method by which such synergy can be achieved.

URN was an Internet News filtering system in which users could provide two types of information to support profile learning. The first was by making explicit binary judgements about the utility of the document. Those judgements were then used as a basis for a typical content-based ranked output system. What makes URN unique is that users can also collaboratively improve the system's initial representation of the document by adding or deleting words which they feel represent (or, for deletions, misrepresent) the content of the document. In URN these changes are propagated to all other users, allowing the user community to collaboratively define the structure of the information space. Since user-specified words are given preference by URN when developing representations for new documents, users have an incentive to improve the set of words which describe existing documents.

In URN each user maintains a separate content-based user model, while the annotation server effectively maintains a single collaboratively-developed model of the document space. This approach lacks the sophistication of the separate user models based on shared annotations found in GroupLens, but URN's integration of content-based and social filtering techniques illustrates one way in which these two paradigms can be combined.

next up previous
Next: Text Filtering Technology Up: Case Studies Previous: Content-Based Filtering

Douglas W. Oard
Sun Apr 27 13:18:52 EDT 1997

Web Accessibility