Keynote Speaker
Title:
StreamGlobe: Scalable Distributed Data Stream Management for eScience
Communities by Load Balancing, Stream Sharing and Multi-Query
Optimization
Abstract:
The field of e-science currently faces many challenges. Among the most
important ones are the analysis of huge, still exponentially increasing
volumes of scientific data and the connection of various sciences and
communities, thus enabling scientists to share scientific interests,
data, and research results. These issues can be addressed by processing
large persistent data volumes as well as newly generated data
on-the-fly in the form of data streams and by combining multiple data
sources and making the results available in a network. Together with
the advent of peer-to-peer (P2P) networks and grid computing, this
leads to the necessity of developing new techniques for distributing
and processing continuous queries over data streams in such networks.
In this paper, we present a novel approach for optimizing the
integration, distribution, and execution of newly registered continuous
queries over data streams in grid-based P2P networks. We introduce
Windowed XQuery (WXQuery), our XQuery-based subscription language for
continuous queries over XML data streams supporting window-based
operators. Concentrating on filtering and window-based aggregation, we
present our stream sharing algorithms. In this paper, we introduce and
discuss methods for matching and evaluating disjunctive predicates
which occur frequently in the context of data stream sharing in a DSMS.
Data stream sharing uses one data stream for satisfying multiple
similar continuous queries in a network. Sharing an existing stream for
answering a new query requires, among other things, the selection
predicates of the new query to be matched with the predicates
describing the contents of the shared stream. Predicate matching is a
combination of predicate implication checking and predicate relaxation.
If no match is found, sharing can be enabled by widening the stream, e.
g., by relaxing a selection predicate, which can introduce additional
disjunctions in the stream predicates. We propose heuristics as well as
an exact algorithm for solving the predicate matching problem and
discuss the use of multi-dimensional indexing for speeding up the
matching and evaluation processes for interval-based disjunctive
predicates.
|