원문정보
초록
영어
This paper focuses on clustering the lines of Shakespeare Sonnets. Sonnet Line Clustering (SLC) is the task of grouping a set of lines in such a way that lines in the same cluster are more similar to each other than to those in other clusters. K-Means clustering is a very effective clustering technique well known for its observed speed and its simplicity. Its aim is to find the best division of N lines into K groups (clusters), so that the total distance between the groups’s members and corresponding centroid, is minimized. A new algorithm Sonnet Line Clustering with Random Feature Selection SLCRFS is proposed. To validate the process external validation or internal validation is done. Since, internal validation has no considerable impact in conducting research this work concentrates on the measures of external validation. Entropy and Purity are frequently used external measures of validation for K-Means. The proposed approach uses entropy as performance measure. The clusters formed are evaluated and interpreted according to the Euclidean measure between data points and cluster centers of each cluster. This paper concludes with an analysis of the results of using the proposed measure to display the clustered sonnets using K-Means algorithm with minimum entropy for different feature sets.
목차
1. Introduction
2. Literature Review
3. Methodology
3.1. Procedure for Sonnet Line Clustering
3.2 K-Means Clustering Algorithm
3.3. Performance Measure
4. Dataset Used
5. Used Environment and Libraries
6. Experimental Results and Discussion
7. Conclusion
Acknowledgements
References