User-Centric Operational Decision-Making in Distributed Information Retrieval
Information Systems Research, Forthcoming
43 Pages Posted: 30 Aug 2006 Last revised: 25 Feb 2015
Date Written: December 1, 2008
Information specialists in enterprises and consumers on the Internet regularly use Distributed Information Retrieval (DIR) systems that query a large number of Information Retrieval (IR) systems, merge the retrieved results and display them to users. There can be considerable heterogeneity in the quality of results returned by different IR servers. Further, since different servers handle collections of different sizes, have different processing and bandwidth capacities, there can be considerable heterogeneity in their response times. The broker in the distributed IR system thus has to decide which servers to query, how long to wait for responses and which retrieved results to display based on the benefits and costs imposed on users. The benefit of querying more servers and waiting longer is the ability to retrieve more documents. The costs may be in the form of access fees charged by IR servers or user's cost associated with waiting for the servers to respond. We formulate the broker's decision problem as a stochastic mixed integer program. We present closed-form results for the optimal query set and wait time in the special case when the relevance scores and response times of the IR servers are independent and identically distributed. When servers are heterogeneous, we present a simulations-based optimization technique and demonstrate how the optimal query set and wait time may be determined. The technique is computationally efficient and can be used to generate decision rules for source selection and query termination that are relatively easy to implement. We use data gathered from two different contexts - a DIR system that queries IR engines of several US federal agencies and a comparison shopping engine that queries multiple stores for price and product information - to validate our technique. Our research demonstrates that user satisfaction can be considerably improved by modeling user utility and incorporating historical information on performance of the IR servers.
Keywords: Distributed IR, metasearch, Patent search, Optimal operational decisions, Utility theory, Source selection, Query termination
Suggested Citation: Suggested Citation