earticle

논문검색

An Improved Classification Course Based on Mapreduce

초록

영어

It is an importance step for near-duplication detection to perform file classification in the data mining field, in this paper an improved classification course is proposed which consists of training and test course corresponding to its algorithm respectively. It utilizes the MapReduce computing model created by Google to conduct the classification calculation. Specially, the Sogou news data with various data amounts which simulated the massive data set was used for testing effectiveness and a comparative evaluation on execution time and speedup was accomplished on the experimental circumstance. The results obtained shows that the classification course obviously reduces the execution times greatly and gains the ideal speedup ratio when increasing data amounts, achieves the better performance.

목차

Abstract
 1. Introduction
 2. Relevant Work
 3. Classification Course
 4. Experimental Test
 5. Conclusions
 Acknowledgment
 References

저자정보

  • Haitao Wang School of Computer Science and Technology Jilin University, QianJin Street, ChangChun, JiLin, China,Henan Polytechnic University Shiji Street, Jiaozuo, Henan, China
  • Shunfeng Liu School of Computer Science and Technology Jilin University, QianJin Street, ChangChun, JiLin, China
  • Zongpu Jia Henan Polytechnic University Shiji Street, Jiaozuo, Henan, China

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.