원문정보
초록
영어
Finding nearest neighbors in large multi-dimensional data has always been one of the research interests in data mining field. In this paper, we present our continuous research on similarity search problems. Previously we have worked on exploring the meaning of K nearest neighbors from a new perspective in PanKNN [20]. It redefines the distances between data points and a given query point Q, efficiently and effectively selecting data points which are closest to Q. It can be applied in various data mining fields. A large amount of real data sets have irrelevant or obstacle information which greatly affects the effectiveness and efficiency of finding nearest neighbors for a given query data point. In this paper, we present our approach to solving the similarity search problem in the presence of obstacles. We apply the concept of obstacle points and process the similarity search problems in a different way. This approach can assist to improve the performance of existing data analysis approaches.
목차
1. Introduction
2. Related work
3. Fuzzy Concept
4. Solving Similarity Problem
5. Searching Nearest Neighbors in the Presence of Obstacles
5.1. Definition
5.2. Segments on Each Dimension
5.3. Distance Calculation
5.4. Finding Nearest Neighbors
5.5. Time and Space Analysis
6. Experiment
6.1. Experiments on High-Dimensional Data Set
6.2. Experiments of PanKNN vs. OPanKNN
6.3. Experiments on More Real Data Set
7. Conclusion
References
