원문정보
초록
영어
It is an importance step for near-duplication detection to perform file classification in the data mining field, in this paper an improved classification course is proposed which consists of training and test course corresponding to its algorithm respectively. It utilizes the MapReduce computing model created by Google to conduct the classification calculation. Specially, the Sogou news data with various data amounts which simulated the massive data set was used for testing effectiveness and a comparative evaluation on execution time and speedup was accomplished on the experimental circumstance. The results obtained shows that the classification course obviously reduces the execution times greatly and gains the ideal speedup ratio when increasing data amounts, achieves the better performance.
목차
1. Introduction
2. Relevant Work
3. Classification Course
4. Experimental Test
5. Conclusions
Acknowledgment
References
