원문정보
초록
영어
The paper target at string similarity search in cloud systems. Existing works focus on query processing within a single server, and it incurs main memory overflow and external memory overflow while dealing with big data. For the above problems, the paper proposes a distributed index to support string similarity search in cloud environments. To provide efficient searching in a single node, an external memory index is designed, which adopts multiple filtering techniques and optimizing strategies. The external memory resident index supports length filter, positional filter in disks. This paper proposes the index construction method. During query processing, asymmetric q-gram is used to reduce the number of inverted lists read from disks. An adaptive algorithm is given to choose inverted lists, and seek the tradeoff between two aspects of query cost. The global index partitions the entire string dataset according the content of strings, and a char vector space partition method is proposed. In char vector space partition method, similar strings are partitioned into the same computing nodes, thus the number of computing nodes involved in a single query is reduced. The partition method is also adopted to determine necessary computing node set for a query to access. Simulation results validate the efficiency and effectiveness of our proposed index.
목차
1. Introduction
2. System Framework
3. Local Query Processing
3.1. LPA-index
3.2 Local Query Processing
4. Global Query Processing
4. Experiment Design and Discussion
4.1. Local Query Performance
4.2 Global Query Performance
5. Conclusion
References