원문정보
The Topical Classification of Essays by College Student English Learners Using Hierarchical Clustering.
초록
영어
Choe, Jae-Woong & Song, Ji-Young. 2013. The Topical Classification of Essays by College Student English Learners Using Hierarchical Clustering. Language Information. Volume 17. 93-115. In this study, we report on a set of experimentations for, and a successful completion of, the automatic topic classification of 3286 English essays (YELC) written by college level English learners in Korea. We adopted Hierarchical Agglomeration Clustering for our purpose. In order to find the best combination of distance measures and algorithms for hierarchical clustering, we first selected 100 essays, and then calculated precision rate on the basis of the subset of essays for each of the 15 combinations of 5 distance measures and 3 methods provided in R implementation of ‘Dist’ and ‘hclust’. As a result, the combination of ‘correlation’ and ‘ward’ method was chosen as the optimal one for our chosen corpus, which was applied to ten sets of randomly selected 100 essays for further validation. As a final step for topic classification, the ‘correlation’-‘ward’ combination was applied to classify the whole corpus into six topics. The precision rate was estimated to be 98.7%, a quite decent one for our purpose. We then conducted a Key word analysis on the six topic-groups, thereby showing some distributional characteristics of the words used in each group.
목차
1. 서론
2. 대상 코퍼스: YELC 2011
3. 분석 방법 및 절차: 비감독 기반 계층적 군집화
3.1. 코퍼스 전처리
3.2. 자료변환
3.3. 군집화
4. 검증 및 적용
4.1. 임의의 파일 집합 선택을 통한 반복 검증
4.2. 전체 자료 분석
5. 군집별 핵심어 분석
6. 결론
참고문헌