Distributed Information Retrieval, Multi-Agent Systems and the role of Logic Programming

Keith Clark and Vasilios S. Lazarou

Logic Programming Section

Department of Computing

Imperial College

180 Queen's Gate

London SW7 2BZ

{klc,vl3}@doc.ic.ac.uk


See also System overview Experimental testbed (in construction) Homepage

General description

The advent of large wide-area networks, Internet is the most characteristic example, has caused a vast increase both in the information availability and in the number of the information sources. This evolution offers great promise for obtaining and sharing diverse information conveniently. However the multitude, diversity and the dynamic nature of on-line information sources make accessing any specific piece of information an extremely difficult task.

One way to address these issues is to use information agents. These Distrinuted Information Retrieval agents should be able to:

A. accept a request from a human or agent client,

B. translate this request into a language understood by the information sources,

C. identify the information sources that contain information relevant to the request,

D. pose the request to these sources,

E. collect the corresponding results,

F. process the returned results and

G. present the results to the client.

We have followed this approach in developing our information retrieval system for the WWW. The overall agent architecture is as follows. Our system supports a collection of information sites. The notion of an information site is used to describe a logical entity that contains a set of information sources. It is a logical clustering of actual-physical WWW sites. In each information site, we find the information server agents: the extractor agent and the information source interface agent. The extractor scans through all the information sources, identified to it as URLs. These are URLs of the top level web pages of various research groups. It extracts from each such web page summary information about all the technically relevant local documents (documents belonging to that research group) that are accessible via a chain of links from the page. The summary information is extracted as a set of attribute/value pairs which is passed to the information source agent. The information source agent handles the query answering process. It accepts retrieval enquiries and attempts to evaluate them against the attribute-based information. It also registers an abstract of the attribute based information it contains with a with a information access facilitator for the general topic about which they have information. For example, a site agent for a web site of a research group in DAI would register summary information about the technical papers on its site with an access facilitator agent for the general topic of AI. This facilitator will generally receive queries to do with any area of AI and will route the message to the appropriate information source agents that have registered with it. Metadata caching and query answering planning are among other activities of an information access facilitator. Finally, each information access facilitator, in turn, advertises its wares with a central matchmaker, the corner stone of the whole distributed retrieval system.

In contrast to the above, the user agent is the one that deals with client side activities. This is the agent that the end user (the human agent) interacts with. It translates the user's query, input via a Netscape form, into the appropriate query message. It also displays the answers it receives using Netscape. (The answers invariable contain URLs that can be then accessed using just the standard Netscape facilities.) The user agent makes use of the services of a local query facilitator. The query facilitator accepts requests from user agents. It has the role to identify which information access facilitators have the potential to satisfy this request through the information sites that they manage. They initially find out about these access facilitators via queries to the central matchmaker. Thereafter, the maintain direct information about these facilitators, and the individual information sources for which they are gateways, based on the queries that have been routed to them and successfully answered.

Concluding with the overall agent architecture, there are two other agents the matchmaker and the descriptor. It has been already mentioned that the matchmaker serves as an advisor agent used to diffuse requests to agents that expressed an ability to handle them by matching requests from query facilitators and advertisements from the access facilitators. The descriptor agent contains terminological knowledge that is exploited both by the extractor during the elicitation of the key attributes and the information access facilitator for potential query reformulation.

In our system the query-based model has been adopted. This is a goal-driven information seeking approach. The user just poses a request; it is a task of the DIR system to transparently process the request by executing the activities A to G presented above.

Concerning the type of information to be handled, the system we have implemented to date supports unstructured hypertext information sources having the form of HTML documents. The underlying platform of the system is the World Wide Web that incorporates hypermedia information managed by diverse systems without imposing any standard structure. As a result although structured sources such as relational databases have attractive implementational characteristics, we would have restricted ourselves into a subset of the available information.

Finally, there are two other important aspects worth mentioning. The first is that our architecture supports the activation of mobile agents. In the current status, these agents have a monitoring and answer collecting role; they monitor sources that are known to contain query-relevant information and they propagate the new answers back to the user agent. The second one is that all the agents of our architecture support persistent information storage enhancing the security and recovery capabilities of the system.

The role of Logic Programming

In order to enhance the expressiveness and the functionality of our system several LP techniques have been employed. First, the query language used to express the user's requests is a Prolog variant. There is a set of predicates that the user has to use in order to express his request. These predicates correspond to attributes that characterise the aspects of the supported information. Some of these are standard, predefined for a specific application (e.g. author and title for technical document retrieval), For both the standard and the extra predicates, the user through the user agent has the ability to browse the interface in order to get information about their usage. This textual information includes the intended meaning of the predicate, its arity, the use of its arguments and if needed predicate specific guidelines.

In correspondence with the use of first-order logic based query language, the information structures in our system are specified in a similar manner through the use of a predicate-attribute format. Both the knowledge that the agents possess and the information stored in the various sites have this declarative form. It is the task of the extractor agent to extract the relevant information and transform it from an unstructured format to an attribute-based one. As a result, our information sites are managed as local deductive databases. This integration between the query language and the representation of the site information enables the information source agent to expose efficient query answering, since the mapping between the user's query and the query to be posed on the deductive database is straightforward.