원문정보
초록
영어
Entity resolution technique is used for recognize the duplicate tuples which signify similar real world entities. Existing resolution technique is unable to solve the problems of higher level of heterogeneity and additional continual data alteration. Working on this type of database, there is necessitated to enumerate the integrity of data. The new approach is introduced here on probabilistic databases by unmerged duplicates for processing complex queries. This is achieved by using probabilistic databases. For competent access toward entity resolution data over a large collection of possible resolution worlds, new indexing technique is presented here. Also, a computation of query processing is reduced by using indexing structure. The focus is on set similarity relation on very big probabilistic database by using MapReduce technique. MapReduce is a popular paradigm that can process large volume data more efficiently. In this paper, different approaches proposed using MapReduce to deal with this task: 1. merge data set with MapReduce and merge data set without MapReduce, 2. Merge data set with MapReduce using Hadoop. This approaches implemented on windows and Hadoop framework and performed compressing experiments to their performances. Also the speedup ratio for both is tested.
목차
1. Introduction
1.1. Key Technical Challenges and Goals
2. Motivating Example
2.1. Calculating Probability Using Jaccard Similarity
3. Review Area
3.1. Objectives
4. Proposed System
4.1. Probabilistic Database
4.2. Construction of Indexing Structure
4.3. MapReduce
4.4. Reduced Data Set
4.5. Basic Operations
5. Experiments and Results
5.1. Data Set
5.2. Methodology
5.3. Results
5.4 Contributions
6. Conclusions and Future Work
References
