ssps

The First International Workshop on Scalable Stream Processing Systems (SSPS'07)

April 16-20, 2007, The Marmara Hotel, Istanbul, Turkey

Keynote Speaker

Co-located with ICDE'07

Sponsored by:

Keynote Speaker

Prof. Alfons Kemper
Dean of the Faculty of Informatics
Technische Universität München

Title:
StreamGlobe: Scalable Distributed Data Stream Management for eScience Communities by Load Balancing, Stream Sharing and Multi-Query Optimization

Abstract: 
The field of e-science currently faces many challenges. Among the most important ones are the analysis of huge, still exponentially increasing volumes of scientific data and the connection of various sciences and communities, thus enabling scientists to share scientific interests, data, and research results. These issues can be addressed by processing large persistent data volumes as well as newly generated data on-the-fly in the form of data streams and by combining multiple data sources and making the results available in a network. Together with the advent of peer-to-peer (P2P) networks and grid computing, this leads to the necessity of developing new techniques for distributing and processing continuous queries over data streams in such networks. In this paper, we present a novel approach for optimizing the integration, distribution, and execution of newly registered continuous queries over data streams in grid-based P2P networks. We introduce Windowed XQuery (WXQuery), our XQuery-based subscription language for continuous queries over XML data streams supporting window-based operators. Concentrating on filtering and window-based aggregation, we present our stream sharing algorithms. In this paper, we introduce and discuss methods for matching and evaluating disjunctive predicates which occur frequently in the context of data stream sharing in a DSMS. Data stream sharing uses one data stream for satisfying multiple similar continuous queries in a network. Sharing an existing stream for answering a new query requires, among other things, the selection predicates of the new query to be matched with the predicates describing the contents of the shared stream. Predicate matching is a combination of predicate implication checking and predicate relaxation. If no match is found, sharing can be enabled by widening the stream, e. g., by relaxing a selection predicate, which can introduce additional disjunctions in the stream predicates. We propose heuristics as well as an exact algorithm for solving the predicate matching problem and discuss the use of multi-dimensional indexing for speeding up the matching and evaluation processes for interval-based disjunctive predicates.