\documentclass[a4paper]{article}
\usepackage[english]{babel}
\usepackage{graphicx}
\selectlanguage{english}
\date{}
\author{Michiel Hildebrand, Anton Eli\"{e}ns, Zhisheng Huang and  Cees Visser\\
\\
Intelligent Multimedia Group\\
Vrije Universiteit, Amsterdam, Netherlands\\
\{mhildeb,eliens,huang,ctv\}@cs.vu.nl }

\title{Interactive Agents Learning their Environment\footnote{http://www.cs.vu.nl/\textasciitilde eliens/research/media/title-interactive.html}}

\begin{document}
\maketitle
\begin{abstract}
In this paper we describe the implementation of interactive agents
capable of gathering and extending their knowledge. Interactive
agents are designed to perform tasks requested by a user in
natural language. Using simple sentences the agent can answer
questions and in case a task can not be fulfilled the agent must
communicate with the user. In particular, an interactive agent can
tell when necessary information for a task is missing, giving the
user a chance to supply this information, which may in effect
result in teaching the agent. The interactive agent platform is
implemented in DLP, a tool for the implementation of 3D web
agents. In this paper we discuss the motivation for interactive
agents, the learning mechanisms and it's realization in the DLP
platform.
\paragraph{Keywords:}
\emph{Interactive Agents, Virtual Environments, Natural Language,
Learning, DLP, VRML, Grail.}
\end{abstract}


\section{Introduction}
Research done in the combined fields of computational linguistics,
computer graphics and autonomous agents has led to the development
of autonomous virtual characters. These characters will make
interaction with our computer systems more natural, as an
interface for existing systems and as a base for new applications.
The use of autonomous characters in virtual or physical musea,
shops, information kiosks or public buildings are just a few of
the possibilities.
\par
Autonomous agents with a humanoid appearance and autonomous
behavior provide a user friendly alternative to traditional
interfaces. Agents may perform actions, display information or, in
addition, gather information as well. They use interaction to
learn about their environment by adding or modifying their
knowledge. The benefit of this approach is that agents need not be
given all information in advance but can instead built up their
knowledge during the proces.
\par
Natural language provides an easy and flexible way to communicate.
We can obtain information with questions and add information with
declarative sentences. In response the agent uses simple sentences
to answer questions and to communicate about the information he is
missing. The embodiment of agents allows movements and gestures.
How interaction takes place will be shown in more detail in
section \ref{examples}, where we describe a sample scenario.
\par
Our system is implemented in DLP \cite{DLP}, a distributed logic
programming language suited for the implementation of 3D
intelligent agents \cite{DLPplatform}. Avatars for the agents are
built in the Virtual Reality Modelling Language (VRML) or X3D, the
next generation of VRML. These avatars have a humanoid appearance
based on the H-anim specification\footnote{http://h-ahim.org}. The
gestures for the agent are made in the STEP scripting language
\cite{STEP}. Natural language processing is done by
\emph{grail}\footnote{http://www.let.uu.nl/\textasciitilde
Richard.Moot/personal/grail.html}, a parser based on categorial
grammar \cite{Moot}.
\par
The structure of this paper is as follows: In section 2 we
demonstrate how users and agents can interact with each other.
Section 3 provides a formal description of interaction and
learning. The realization of the platform is described in section
4. In section 5 we describe other work related to interactive
agents. Section 6 ends the paper with a conclusion and suggestions
for further research.

\section{A Sample Scenario of Interaction}
\label{examples}
\begin{figure}[htb]
  \begin{center}
    \includegraphics[width=12cm]{screenshot.jpg}
    \caption{Interactive agent situated in a virtual house}
    \label{screenshot}
  \end{center}
\end{figure}
Figure \ref{screenshot} shows the application as it is run in a
web browser. The main screen contains the virtual world including
the agent. The interface is created with standard html forms. At
the bottom it contains a field for language input, two selection
menus for command shortcuts and a status screen with four buttons
to display the agents knowledge. On the right side there are
various shortcuts to demo actions and predefined viewpoints, and a
selection list of active modules.
\par
A user can give natural language input to an interactive agent by
typing English sentences. There are three types of sentences
possible: commands, questions and declarative sentences. To
indicate the type of a sentence the marks, respectively (!,?,.),
have to be used. The following sentences present a sample of
possible inputs:
\begin{quote}
\emph{Switch on the TV!\\
Sit on the blue couch!\\
Sit on the table!\\
Where is your bed?\\
Can you give me a book?\\
Yes, you can sit on the table.\\
There is a book inside the studyroom.}
\end{quote}
\par
Given a command the agent will try to perform the corresponding
action. In response to the command \emph{Switch on the TV!} the
agent will walk to the TV and raise his hand to switch the power
button. The command \emph{sit on the couch!} is less
straightforward, because there is more then one couch in the
livingroom. If a specific couch was part of the interaction
before, the agent will continue using the familiair couch.
Otherwise the user has to be more specific and the agent will
prompt him to tell which couch is meant. A user can describe the
object by adding extra properties about it, and for instance say:
\emph{sit on the blue couch!}.
\par
Instead of multiple possibilities it could also be that the
whereabouts of an object are unknown. To locate an object the
agent first looks around. If no object is found he will ask the
user for help. The user can in response give directions to explain
where the object is located. If for example the agent asks where
to find a book the user can respond that \emph{there is a book
inside the studyroom.} Information exchange could also be
reversed. A user can for example ask \emph{where is your bed?} and
the agent will respond that it is inside the bedroom.
\par
Interactive agents contain knowledge about their environment. In
situations where common knowledge is not sufficient to make a
judgement the agent will communicate with the user. For example an
agent will not immediately sit on a table if requested. Instead he
will tell the user he is not sure sitting on tables is allowed.
The user can confirm, for example if all seats on the couch are
taken, and say: \emph{Yes, you can sit on the table.} The agent
adds this new belief to his knowledge and takes place on the
table. In this case the user takes the role of an instructor to
teach the agent.

\section{Formal Description of Interaction}
Input can result in three types of actions according to the types
of sentences we distinguished. A command will result in a basic
action, a physical action in the environment. Questions can be
used for two different reasons: to ask information, \emph{Is there
a book on the table?} or to request an action, \emph{Can you give
me the book?}. The former results in an answer action while the
latter results in a basic action. Declarative sentences are used
to give information that agents can use to modify their knowledge,
a learning action.
\par
To control the performance of actions they are embedded in
conditions and effects, in a similar way as in \emph{capabilities}
used in \cite{Cap}, \cite{3APL}. The conditions check if the
actions are possible in the current state. Only if all conditions
are satisfied the actions are performed. The effects update the
agent's knowledge according to the changes made in the
environment. The actions itself consist of calculations, physical
actions, text outputs or references to other actions.
\par
A state, which we will denote with the letter \emph{S}, is
determined by the agents knowledge. The knowledge of interactive
agents consists of objects and their properties. We characterize a
state as a mapping from object to (property, value) pairs. Given a
state \emph{S} an agent can perform the actions if the
corresponding conditions \emph{C} can be satisfied in that state.
Formally we write:
\[<S,C>\stackrel{a}{\rightarrow} S',\]
where \emph{a} is the observable behavior of the agent in his
environment. The new state \emph{S'} is obtained by modifying the
agents knowledge in \emph{S} by the effects.

\paragraph{Learning.}
A learning action has no observable behavior. Learning affects the
actions an agent can perform. In other words, we can describe a
learning action as a transition from one state to another changing
the set of possible actions. For example, the permission to sit on
a table extends the actions the agent can perform, by adding the
possibility to sit on tables.

\section{Realization}
The agent is built from components as depicted in figure
\ref{architecture}.  The components are implemented in the
distributes logic programming language DLP \cite{DLP}. The avatar
and the virtual world are constructed in VRML. Interaction between
the agent and his environment is possible because DLP has
programmatic control over VRML through the External Authoring
Interface (EAI) \cite{DLP}. In particular DLP can get/set values
from objects through handles.
\begin{figure}[htb]
  \begin{center}
    \includegraphics[width=8cm]{architectuur1.png}
    \caption{Agent components}
    \label{architecture}
  \end{center}
\end{figure}
\par
All objects in the VRML world are created according to a
prototypical structure. This structure guarantees that each object
has fields for \emph{position, rotation, center, scale, size} and
\emph{name}. The sensor component checks the values of these
fields and stores them in the knowledge base. Perception is
restricted in the sense that the agent can only get information
about objects located in the same room and positioned in his field
of vision.
\par
The agent has a humanoid appearance based on the H-anim standard.
The H-anim standard describes a humanoid as consisting of a large
number of joints. In our system we use a simplified version with
only 7 joints. The agent's body movements are defined with STEP, a
scripting language for embodied agents based on dynamic logic
\cite{STEP}. The actuator component contains the STEP kernel to
execute these scripts.
\par
The knowledge component contains all data that may change from one
agent to another. Therefore to create an interactive agent only
this knowledge has to be defined, leaving the other components
untouched. The actions defined by STEP scripts, beliefs and the
lexicon are all treated as knowledge. Currently only beliefs can
be learned. However, the system is constructed in such a way that
it can be extended to learn the other types of knowledge as well.
\par
Beliefs gathered through perception give an objective view of the
world, since their values are determined by the properties of
objects. Subjective beliefs are gathered or modified through
interaction with the user, in the form of learning actions.
Beliefs can also be given in advance to minimize the need for
irrelevant interaction.
\par
Given an input the controller creates the appropriate behavior
based on the agent's knowledge. Input from the command shortcuts
can be directly performed by a basic action. Input given in
natural language is first parsed, as described in the final
section. The term that results from parsing may contain quantified
or free variables. The quantified variables are substituted
according to the agent's beliefs. Further processing is done
according to the three types of actions.
\par
An answer action creates an appropriate answer by matching the
term with the agent's beliefs, substituting free variables. A
learning action updates the agents beliefs either by modifying or
adding beliefs. A basic action is a collection of conditions,
actions and effects. As an example a basic action to switch an
object on or off (TV or lamp) can be described as\footnote{For
clarity the fail actions from the conditions are removed. The
position property has a try action to find the object. The status
property outputs that the object already has the desired status.}:
\begin{verbatim}
action(switch(Object, Status), Conditions, Actions, Effects) :-
  Conditions = [
    property(Object, [position,XO,YO,ZO]),
    property(Object, [rotation,RO]),
    property(belief, [switch,Object,Button]),
    property(belief, not([status, Status])),
    property(Button, position(Xswitch,Yswitch,Zswitch))
  ],
  Actions = [
    get_in_room(Object),
    get_in_reach(XO,YO,ZO, Xswitch,Yswitch,Zswitch, RO, rightHand),
    arm_rotation(YO, Yswitch, Rarm),
    action([switch, Switch, Rarm])
  ],
  Effects = [
    change(Object, [status,Status], [status,not(Status)])
  ].
\end{verbatim}
The position and rotation properties are looked up in the object's
perception data. In the beliefs a lookup is done to determine
which button can be used and if the object does not already have
the desired status. The actions make the agent move to the same
room as the object and place him in reach of the switch button.
After the necessary rotation of the agent's arm is calculated the
physical switch action can be performed. Finally the effects
modify the agent's beliefs by changing the status of the switched
object.
\par
The natural language parser, \emph{grail}, is based on categorial
grammar \cite{Moot}, that can be characterized by the slogan
'parsing as deduction'. A categorial grammar is given by an
assignment of types to the words in the lexicon. The types are
constructed from atoms and logical connectives. An expression is
well-formed if a derivation in the logic for these connectives is
possible. The meaning of an expression is then assembled according
to the derivation \cite{CG}.

\section{Related Work}
The Gesture and Narrative Language group have created several
embodied conversational agents \cite{Rea,Mack}. They focus on
controlled dialog with multi-modal input and output. In addition
their system uses camera's and speech recognition.
\par
In \cite{tuby} Virtual Teletubbies are developed to create
believable agents. These agents combine long-term goals with
low-level reactive behavior. Virtual physics are used to create
realistic environments.
\par
At the Synthetic Creatures group virtual agents are used to
simulate animal behavior \cite{Creatures}. Their implementation of
a sheepdog uses reinforcement learning to associate actions with
utterances (single words or clicks). Both their reasoning and
learning behavior is very well suited for modelling animal
intelligence.
\par
Jacob, the animated instruction agent is a project from the
University of Twente \cite{Jacob}. Jacob instructs and assists a
user to learn to perform in a virtual environment. The user can
communicate with Jacob through natural language.
\par
Just-Talk is an application to train law enforcement personal in a
virtual role-playing environment \cite{Just-Talk}. The virtual
personality uses a combination of facial gesture, body movements,
and spoken language. The students can converse with that
personality using spoken natural language.
\par
In comparison with the work just mentioned we may characterize our
project as focusing on natural language interaction between an
agent and the user. The agent uses logic-based learning to adapt
to his environment.

\section{Conclusions and Future Research}
In this paper we have described a system where users interact with
agents using natural language. In response to user input the agent
can perform physical actions, give answers to questions or learn
new information. If non of the three actions are possible the
agent generates an output message to indicate what information is
missing. This approach makes it possible that the agent can learn
to act in unfamiliar environments.
\par
Currently the agent can learn by modifying his beliefs. In further
research the system can be extended to learn other kinds of
knowledge as well. For example actions can be learned by combining
existing actions, or by defining simple STEP scripts.
\par
An altogether different challenge is to apply this approach to
another application domain. The INCCA (International Network for
the Conservation of Contemporary Art) has developed a multimedia
repository about contemporary art. In such an application an agent
can aid the user in navigating the information space and create
presentations about the INCCA repository, on demand of the user.

\begin{thebibliography}{xxxxxx}
\bibitem[1]{CG}Moortgat, M. (2002). \emph{Catergorial Grammer and Formal
Semantics}. Encyclopedia of Congnitive Science, Nature Publishing
Group, Macmillan Publishers Ltd.
\bibitem[2]{DLP}Eli\"ens, A. (1992). \emph{DLP, A Language for Distributed
Logic Programming}, Wiley
\bibitem[3]{DLPplatform}Eli\"ens, A., Huang, C., Visser C. (2002).
\emph{A platform for Embodied Conversational Agents based on
Distributed Logig Programming}. AAMAS Workshop -- Embodied
conversational agents - let's specify and evaluate them!, Bologna
16 July 2002.
\bibitem[4]{STEP}Huang, Z., Eli\"ens, A., Visser, C. (2002).
\emph{STEP: A Scripting Language for Embodied Agents}. PRICAI-02
Workshop -- Lifelike Animated Agents: Tools, Affective Functions,
and Applications, Tokyo 19 August 2002.
\bibitem[5]{Rea}Cassell, J., Bickmore, B., Campbell, L., Vilhjalmsson, H., Yan, H.
 (2001). \emph{Conversation of a System Framework: Designing
Embodied Converstational Agents}. Embodied Conversational Agents,
pp. 29-63. MIT Press.
\bibitem[6]{Mack}Cassell, J., Stocky, T., Bickmore, T., Gao, Y., Nakano, Y.,
Ryokai, K., Tversky, D., Vaucelle, C., Vilhj�lmsson, H. (2002).
\emph{MACK: Media lab Autonomous Conversational Kiosk}. In Proc.
of Imagina '02, pp. 12-15, Monte Carlo.
\bibitem[7]{Creatures}Isla, D., Burke, R., Downie, M., Blumberg, B. (2001).
\emph{a Layered Brain Architecture for Synthetic Creatures}. In
Proc. of the Int. Joint Conf. on Artifical Intelligence (IJCAI),
pp. 1051-1058, Seattle.
\bibitem[8]{3APL}Hendriks, K., de Boer, F., van der Hoek, W.,
Meyer, J-J. (1999). \emph{Agent programming in 3APL}. Autonomous
Agents and Multi-Agent Systems, 2(4), pp. 357-401.
\bibitem[9]{Cap}Panayiotopoulos, T., Anastassakis, G. (1999).
\emph{Towards a virtual reality intelligent agent language}.  7th
Hellenic Conf. on Informatics, September 1999, Ioannina.
\bibitem[10]{tuby}Ballin, D., Aylett, R., Delgado, C. (2002).
\emph{Towards Autonomous Characters for Interactive Media}.
Intelligent Agents for Mobile and Virtual Media, Ch. 5, pp. 55-76.
Springer.
\bibitem[11]{Moot}Moot, R. (2002) \emph{Proof Nets for Linguistic
Analysis}. Ph. D. thesis, Utrecht Institute of Linguistics OTS,
Utrecht University.
\bibitem[12]{Jacob}Evers, M., Nijholt, A. (2000).
\emph{Jacob - An animated instruction agent in virtual reality}.
In Proc. of the 3rd Int. Conf. on Multimodal Interaction (ICMI
2000).
\bibitem[13]{Just-Talk}Frank, G., Hubal, R. (2002)
\emph{An Application of Responsive Virtual Human Technology}. In
Proc. of the 24th Interservice/Industry Training, Simulation and
Education Conf. 2002.
\end{thebibliography}
\end{document}