The ACOI framework

contents abstract intro web ACOI detector query retrieval conclusions References

The ACOI framework is intended to accomodate a broad spectrum of classification schemes, manual as well as (semi) automatic, for the indexing and retrieval of multimedia objects. What is stored are not the actual multimedia objects themselves, but structural descriptions of these objects (including their location) that may be used for retrieval.

The ACOI model is based on the assumption that indexing an arbitrary multimedia object is equivalent to deriving a grammatical structure that provides a namespace to reason about the object and to access its components. However there is an important difference with ordinary parsing in that the lexical and grammatical items corresponding to the components of the multimedia object must be created dynamically by inspecting the actual object. Moreover, in general, there is not a fixed sequence of lexicals as in the case of natural or formal languages. To allow for the dynamic creation of lexical and grammatical items the ACOI framework supports both black-box and white-box (feature) detectors. Black-box detectors are algorithms, usually developed by a specialist in the media domain, that extract properties from the media object by some form of analysis. White-box detectors, on the other hand, are created by defining logical or mathematical expressions over the grammar itself. In this paper we will focus on black-box detectors only.

The information obtained from parsing a multimedia object is stored in the Monet database. The feature grammar and its associated detector further result in updating the data schemas stored in the (Monet) database. The Monet database, which underlies the ACOI framework, is a customizable, high-performance, main-memory database developed at the CWI and the University of Amsterdam [Monet].

At the user end, a feature grammar is related to a View, Query and Report component, that respectively allow for inspecting a feature grammar, expressing a query, and delivering a response to a query. Some examples of these components are currently implemented as applets in Java 1.1 with Swing. See [ACOI].

The processing which occurs for a MIDI file, by using the grammar and associated detectors described in section Detector is depicted in slide midi-processing.

slide: Processing MIDI file

The input is a MIDI file. As indicated in the top line, the MIDI file itself may be generated from a score. As indicated on the bottom line, processing a MIDI file results in a collection of features as well as in a (simplified) MIDI file and corresponding score. In the current prototype, a collection of Prolog facts is used as an intermediate representation, from which higher level features are derived by an appropriate collection of rules.

The (result) MIDI file contains an extract of the original (input) MIDI file that may be presented to the (end) user as the result of a query. This setup allows us to verify whether our extract or abstraction of the original musical structure is effective, simply by comparing the input musical structure with the output (MIDI) extract.

Formal specification

Formally, a feature grammar G may be defined as

G = (V,T,P,S), where V is a collection of variables or non-terminals, T a collection of terminals, P a collection of productions of the form

V -> (V \union T) and S a start symbol. A token sequence ts belongs to the language

L(G) if

S -*-> ts. Sentential token sequences, those belonging to

L(G) or its sublanguages

L(G_v) = (V_v,T_v,P_v,v) for

v \e (T \union V), correspond to a complex object

C_v, which is the object corresponding to the parse tree for v. The parse tree defines a hierarchical structure that may be used to access and manipulate the components of the multimedia object subjected to the detector. See [Features] for further details.