

On Objective Keywords Extraction: Tf-Idf based Forward Words Pruning Algorithm for Keywords Extraction on YouTube



Discovery and subsequent effective retrieval of useful user generated content depends on proper meta-data annotation implemented on an object such as a title and Keywords. In this study, a simpler unsupervised non graph-based algorithm for extracting keywords is proposed. A novel key phrases chunking approach was adopted; this utilizes words sequences as they appear in the original document. The simple but effective Term frequency-inverse document frequency (tf-idf) weighting scheme was exploited to rank the novelty created key-phrases. Comparing to a similar algorithm that uses three metrics weighting scheme, the tf-idf yielded a precision of 89%.Thus, the application of tf-idf algorithm on YouTube’s metadata based keywords shows to be useful approach in its objectivity.


 1. Introduction
 2. Related Work
  2.1. Machine Learning Approaches
  2.2. Key Techniques
  2.3. TF-IDF for Single Document
 3. The Algorithm Description
 4. Experimental Result
 5. Extension to YouTube Videos
 6. Discussion
 7. Conclusion


  • Ambele Robert Mtafya Central South University, Changsha, Hunan, China; Dar es salaam Institute of Technology, Tanzania.
  • Dongjun Huang Central South University, Changsha, Hunan, China; Dar es salaam Institute of Technology, Tanzania.
  • Gaudence Uwamahoro Central South University, Changsha, Hunan, China; Dar es salaam Institute of Technology, Tanzania.


