DBTA Workshop on Information Retrieval:
from Web 1.0 to Web 2.0

6 October 2008

Web 2.0 is a term describing changing trends in the use of World Wide Web technology and web design that aim to enhance creativity, information sharing, and collaboration among users. These concepts have led to the development and evolution of web-based communities and hosted services, such as social-networking sites, video sharing sites, wikis, blogs, and folksonomies. Information retrieval (IR) is the science of searching for documents, for information within documents and for metadata about documents, as well as that of searching in Digital Libraries and in the World Wide Web. Naturally, the changes from Web 1.0 to Web 2.0 are also affecting the way people search for information in the World Wide Web and other information repositories. How is IR reacting to these changes?

The goal of this workshop is to provide a general introduction to the topic of Web 2.0, to the issues that it raises in information access, and in particular to new emerging areas of research in IR and related areas. Presentations will be given by key International and Swiss researchers working in IR, Digital Libraries, Web Search Engines, and Personal Information Management.



Web 2.0 Research at Yahoo!
Prof. Ricardo Baeza-Yates, Yahoo! Research, Barcelona, Spain

There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web data. Most of them are related to the Web 2.0 or what is called user generated content (UGC). In this talk we show several applications of mining the wisdom of the crowds behind these data to assess its quality, to improve image search or to generate new semantic resources, as our final goal is to produce a virtuous feedback circuit based in machine learning for leveraging the data itself.


Web Reputation Manager: a monitoring and reporting tool for brands, products and people
Paolo Mosconi, ActValue Consulting & Solutions, Rho, Milano, Italy

Reputation Manager is a commercial tool that enables companies to daily monitor the web reputation of their brands, products and C-level officers. The system is based upon a multi-stage concurrent architecture and is able to analyse, interpret and report a variety of web channel inputs, ranging from sites, to blogs to videos. In this presentation some of the key system components and architecture will be described with particular reference to the integration of Web 2.0 concepts in a standard enterprise application.


Topical Opinion Retrieval: a Dictionary-based Approach
Dr. Gianni Amati, Fondazione Ugo Bordoni, Roma, Italy

We present a method of constructing automatically dictionaries relative to a specific context and how to retrieve information with such background knowledge. The methodology is applied to the case study of sentimental analysis (topical opinion retrieval). In general the contextual retrieval problem can be represented as a couple of queries, the topic and the context, where the context is represented by a weighted dictionary. I will describe the strategy to perform contextual document retrieval. The derived contextual ranking formula is shown to be very robust and does not contain parameter to tune or learn. Then I will show how to reduce the size of the dictionary in order to maintain good performance of the system. Because we are able to reduce the size of the dictionaries, we may boost retrieval of opinionated and relevant documents at real-time with a negligible computational cost.


DelosDLMS: a Novel Infrastructure for Web-based Digital Libraries
Prof. Heiko Schuldt, University of Basel, Basel

DelosDLMS is an innovative Digital Library Management System that has been developed as an integration effort within the DELOS Network of Excellence. A key aspect of DelosDLMS is its novel generic infrastructure that allows to easily generate Digital Library Systems out of a set of Web-based Digital Library (DL) services in a modular and extensible way. It is the result of integrating various specialized DL services like feature extraction, visualization, intelligent browsing, media-type-specific indexing, relevance feedback and many others provided as Web services by partners of the DELOS network into the OSIRIS platform. Based on these services, DelosDLMS provides support for content-based retrieval in image, audio, video, and 3D collections and a combination of any of these media types with keyword queries. It allows annotating retrieved information, provides a rich set of advanced graphical user interfaces to browse and explore large collections, and supports users in interacting with the system using a speech interface and interactive paper. Thus, DelosDLMS showcases a great variety of functionality that is outlined as part of the DELOS vision for future Digital Library Systems.


Tag Data and Personalized Information Retrieval
Dr. Mark Carman, USI, Lugano

Tag data from Social Bookmarking sites has been shown to be a useful source of information for improving Web Search. In this talk I will discuss the use of this data for personalizing Web Search. In particular, I will answer the questions: Are social bookmarking data and query logs comparable and if so, how similar are they? Do we really need query logs, or can we just use public tag data as an initial testbed for evaluating personalized Information Retrieval systems?


Harvesting Adjacent Metadata in Large-Scale Tagging Systems
Gleb Skobeltsyn and Adriana Budura, EPFL, Lausanne

In this talk we consider the problem of tag prediction in collaborative tagging systems where users share and annotate resources on the Web. We put forward HAMLET, an algorithm to automatically propagate tags from one document to similar documents in Web 2.0 tagging applications. We present the core principles underlying tag propagation, for which we derive suitable scoring models. We will conclude the talk by presenting experiments on real-world data sets.


Web 2.0 and Personal Information Management
Prof. Moira Norrie and Stefania Leone, ETH, Zurich

Web 2.0 applications are increasingly being used, not just to share personal information, but also to manage it. As a result, personal data and its management become fragmented, not only across desktop applications, but also between desktop applications and various Web 2.0 applications. We will discuss some issues of personal information management in the realm of Web 2.0 and then present a data management architecture designed to support a separation of concerns between the management and the sharing of personal information.


