원문정보
초록
영어
Recent years have witnessed an increasing interesting in data stream processing, such as network monitoring, the e-business, advertising system and etc. Join is applied to explore the correlation among the tuples from multiple streams. In this paper, we present a general method named Distributed Streams Join (DSJ) to process multi-way windowed streams θ-joins using a shared-nothing cluster. DSJ contains a distribution method named Time-Slice Distribution Method (TDM) and a join method named Transfer Join Method (TJM). Different from previous work, DSJ can (1) process multi-way θ-joins under arbitrary predicates; (2) preserve the integrity of results and load balance while distributing tuples to different nodes for parallel joining; (3) carry out the join operation in a local optimum order according to the histograms maintained in a real-time way. We have built DSJ on our own stream processing cluster to deal with multi-way streams joins and the experiments demonstrate that our DSJ can not only guarantee the load balance among all the computing nodes but also improve the throughput effectively.
목차
1. Introduction
2. Model and Definitions
3. The DSJ Execution Model
3.1 Streams Distribution in Cluster
3.1 Streams Distribution in Cluster
4. Experiments
4.1 Load Balance
4.2 Throughput
4.3 Influence Factors on Performance
5. Related Work
6. Conclusion and Future work
Acknowledgements
References