

An Approach of XML-ifying the Crude Corpus in the Field of Opinion Mining



This paper is meant for an easy approach for XML ifying of crude corpus in the field of Opinion Mining. The XMLification is done based on regular expressions. Corpus is the plural form of ‘corpora’. It is nothing but the collection of linguistic data. In this proposed work, the corpus is reviews posted on web sites; more specifically some product reviews. The reviews or the opinions are in the html files which are collected from sites like Cnet.com, Epinions.com, Amazon.com, ebay.com etc. After getting the crude corpus of html files, it is polished further to get only the required part of review details from that web page and thus removes the rest. This corpus is processed again and yields ultimate output in the form of XML files which contains only the important parts of the review details from raw html page. These XML files are ready to be used for further steps of Opinion Mining like parts of Speech(POS) tagging or any kind of language processes for machine learning process.


 1. Introduction
 2. Terminology
  2.1. Corpus
  2.2. Natural Language Processing
  2.3. Linguistics
  2.4. Regular Expression
  2.5. Parsing
  2.6 Opinion Mining
 3. Related Works
 4. Our Work
 5. Result and Discussion
  5.1. Complexity analysis of the stated algorithm
  5.2. Test Result
 6. Conclusion


  • Debnath Bhattacharyya Computer Science and Engineering Department Heritage Institute of Technology
  • Kheyali Mitra Computer Science and Engineering Department Heritage Institute of Technology
  • Minkyu Choi Hannam University
  • Rosslin J.Robles Hannam University
  • Debashis Ganguly Computer Science and Engineering Department Heritage Institute of Technology


자료제공 : 네이버학술정보

    ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

    0개의 논문이 장바구니에 담겼습니다.