

Controlling the Robots of Web Search Engines

Authors

J. Talim <talim@math.usask.ca>
Dept. of Mathematics and Statistics, University of Saskatchewan, Canada
Z. Liu <zhenl@us.ibm.com>
IBM Research, Hawthorne
Ph. Nain <nain@sophia.inria.fr>
INRIA, France
E. G. Coffman, Jr. <egc@ee.columbia.edu>
Electrical Engineering Dept., Columbia University

Abstract

Robots are deployed by a Web search engine for collecting information
from different Web servers in order to maintain the currency of its
data base of Web pages. In this paper, we investigate the number of
robots to be used by a search engine so as to maximize the currency
of the data base without putting an unnecessary load on the network.
We adopt a finitebuffer queueing model to represent the system. The
arrivals to the queueing system are Web pages brought by the robots;
service corresponds to the indexing of these pages. Good performance
requires that the number of robots, and thus the arrival rate of the
queueing system, be chosen so that the indexing queue is rarely starved
or saturated. Thus, we formulate a multicriteria stochastic
optimization problem with the loss rate and emptybuffer probability
being the criteria. We take the common approach of reducing
the problem to one with a single objective that is a linear function
of the given criteria. Both static and dynamic policies can be
considered. In the static setting the number of robots is held fixed;
in the dynamic setting robots may be reactivated/deactivated as a
function of the state. Under the assumption that arrivals form a
Poisson process and that service times are independent and exponentially
distributed random variables, we determine an optimal decision rule
for the dynamic setting, i.e., a rule that varies the number of robots
in such a way as to minimize a given linear function of the loss rate
and emptybuffer probability. Our results are compared with known
results for the static case. A numerical study indicates that
substantial gains can be achieved by dynamically controlling the
activity of the robots.

