Abstract:
P2P deployments are a natural infrastructure for building distributed
search networks. Proposed systems support locating and retrieving all
results, but lack the information necessary to rank them. Users,
however, are primarily interested in the most relevant results, not
necessarily all possible results.
Using random sampling, we extend a class of well-known information
retrieval ranking algorithms such that they can be applied in this
decentralized setting. We analyze the overhead of our approach, and
quantify how our system scales with increasing number of documents,
system size, document to node mapping (uniform versus non-uniform),
and types of queries (rare versus popular terms). Our analysis and
simulations show that a) these extensions are efficient, and scale
with little overhead to large systems, and b) the accuracy of the
results obtained using distributed ranking is comparable to that of a
centralized implementation.
@InProceedings{vijay-hipc07, author = {Vijay Gopalakrishnan and Ruggero Morselli and Bobby Bhattacharjee and Pete Keleher and Aravind Srinivasan}, title = {Distributed Ranked Search}, booktitle = {14th IEEE International Conference on High Performance Computing}, pages = {7 -- 20}, year = {2007}, address = {Goa, India}, month = {December}, }