원문정보
초록
영어
Text clustering is the method of combining text or documents which are similar and dissimilar to one another. In several text tasks, this text mining is used such as extraction of information and concept/entity, summarization of documents, modeling of relation with entity, categorization/classification and clustering. This text mining categorizes only digital documents or text and it is a method of data mining. It is the method of combining text document into category and applied in various applications such as retrieval of information, web or corporate information systems. Clustering is also called unsupervised learning because like other document classification, no labeled documents are providing in clustering; hence, clustering is also known as unsupervised learning. A new method called Hierarchical Agglomerative Clustering (HAC) which manages clusters as tree like structure that make possible for browsing. In this HAC method, the nodes in the tree can be viewed as parent-child relationship i.e. topic-subtopic relationship in a hierarchy. HAC method starts with each example in its own cluster and iteratively combines them to form larger and larger clusters. The main focus of this work is to present a performance analysis of various techniques available for document clustering.
목차
1. Introduction
2. Literature survey
3. Methodology
3.1 K-Means Clustering
3.2 EM Algorithm
3.3 TCFS Method
4. Experimental Result
5. Conclusion
References