Swiss Informatics Society

Special Interest Group

on Information Systems


DBTA Workshop on Information Retrieval:
Algorithms and Systems for Text and Multimedia Retrieval

7 November 2008

Universität Basel
Petersplatz 1, CH-4056 Basel

The information space we have to deal with in our daily life is continuously growing - both in terms of size and complexity. In addition to textual documents, large collections of different media types such as audiovisual objects which might even be annotated with novel types of meta data (e.g., geotags), are more and more gaining importance in a broad variety of applications. This demands novel systems that are able to manage large-scale
information spaces and requires powerful algorithms for (content-based) retrieval of relevant information in distributed multimedia collections and the combination of text and multimedia queries.

The aim of this workshop is to bring together researchers and practitioners from the field of text and multimedia information retrieval. The presentations will survey and analyze novel applications and introduce recent developments and research from leading-edge industry and academic institutions. The workshop programme will be completed with demonstrations of innovative systems and prototypes for text and multimedia retrieval.



Welcome and Introduction
Prof. Heiko Schuldt, University of Basel

10:15 - 11:00  

Harvesting, Searching and Ranking Knowledge from the Web
Prof. Gerhard Weikum, Max-Planck Institute for Informatics Saarbrücken, Germany

11:00 - 11:15 Coffee Break

11:15 - 11:45

Faces and Places
Prof. Thomas Hofmann, google, Zurich

11:45 - 12:15

PLSA-based Approaches to Image Annotation
Dr. Florent Monay, Idiap Research Institute, Martigny

To go beyond the query-by-example paradigm in image retrieval, there is a need for textual indexing of image collections. In this context, different models have been proposed to learn statistical dependencies between the text captions associated to a set of images and their respective visual features. These dependencies are then used to predict words for an unseen image based on its visual content. Here, we propose to model an annotated image as a mixture of latent aspects that generated both its text caption and its visual features. By choosing to represent an image as a set of quantized patches, this assumption translates in a formulation based on the Probabilistic Latent Semantic Analysis (PLSA) model where aspects are defined as multinomial distributions over patches and words. We investigate different possibilities to learn these aspects and evaluate their performance on an image annotation task.

12:15 - 12:45

Search Strategies by an Automated Object and Concept Graph
Dr. Jörg Wurzer, iQser

Current search strategies are based on keywords with a long list of results. The user must choose the right keyword combination and has to start a query. A graph of content objects can be used to deliver information automatically considering the context of the user. No active search is needed. The user can also use this graph to describe the needed content by its relations precisely and get an accordingly precise result of his query. A concept graph gives an overview over the topics and their relations and enables a precise selection of information.

12:45 - 14:00 Lunch Break

14:00 - 14:30

Distributed Information Retrieval: an Approach Based on Metadata Harvesting
Prof. Fabio Crestani, USI, Lugano

The talk describes an approach to content-based Distributed Information Retrieval based on the periodic and incremental centralisation of full-content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative's protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval.

14:30 - 15:00

Special Purpose Retrieval Applications
Dr. Peter Schäuble, eurospider, Zurich

The information retrieval applications are evolving quickly into many different and highly specialized applications. We try to give an overview on this interesting and dynamic field. We also discuss selected retrieval applications which are located in the landscape of special purpose retrieval applications.

15:00 - 15:15 Coffee break

15:15 - 15:45

Discovery of Protein Interactions from Scientific Literature: OntoGene and the BioCreative experience
Dr. Fabio Rinaldi, University of Zurich

Text Mining is increasingly seen as a helpful technology in the biomedical sciences, supporting the process of hypothesis formulation, based on evidence available in the literature, but scattered across a multitude of publications. In this talk we describe activities performed within the scope of the OntoGene project (http://www.ontogene.org/), which aims at supporting the semi-automatic extraction of semantic relations among specific biological entities (such as proteins, genes, diseases) from scientific literature. We will characterize in detail our participation in the 2nd BioCreative competitive evaluation of biomedical text mining systems, describing the nature of the challenge, our own contribution, and the results obtained.

15:45 - 16:15

Medical Visual Information Retrieval
Prof. Henning Müller, University Hospital of Geneva (HUG) and University of Applied Sciences Western Switzerland (HES-SO)

Medical institutions produce enormous amounts of data every day and this includes an increasing amount of images (Geneva radiology: >80'000 images per day!). With the increasing variety of exams it becomes hard to interpret fully all the images. At the same time the computerized patient record makes all the data available to all clinicians, so also to non-specialised, who beforehand performed the reading themselves. All this creates a need for new tools to manage the visual data and give the clinicians access to all the data they require to interpret all the data of their cases. Content-based image retrieval uses visual descriptors instead of text to search for images, generally for images visually similar to an example image. This technique can make accessible visually similar images from past cases or the peer-reviewed literature to a clinician, helping to interpret the data. On the other hand purely visual retrieval can not capture the semantics of a case and the visual information can not be taken out of its context, the clinical situation of the patient. Thus structured clinical data, free text information and visual information have to be analysed together for best results.

16:15 -

Demos and Apéro

- Lookaround feature on Panoramio (google)
- Demos geographic information / google maps
- iQser
- iGesture (Globis, ETH Zürich)
- Query by Sketch (Globis, ETH Zürich / DBIS, Uni Basel)
- DelosDLMS (DBIS, Uni Basel + DELOS partners)


