earticle

논문검색

대학생 영어 학습자 작문 코퍼스에 대한 주제별 분류 : 계층적 군집화

원문정보

The Topical Classification of Essays by College Student English Learners Using Hierarchical Clustering.

최재웅, 송지영

피인용수 : 0(자료제공 : 네이버학술정보)

초록

영어

Choe, Jae-Woong & Song, Ji-Young. 2013. The Topical Classification of Essays by College Student English Learners Using Hierarchical Clustering. Language Information. Volume 17. 93-115. In this study, we report on a set of experimentations for, and a successful completion of, the automatic topic classification of 3286 English essays (YELC) written by college level English learners in Korea. We adopted Hierarchical Agglomeration Clustering for our purpose. In order to find the best combination of distance measures and algorithms for hierarchical clustering, we first selected 100 essays, and then calculated precision rate on the basis of the subset of essays for each of the 15 combinations of 5 distance measures and 3 methods provided in R implementation of ‘Dist’ and ‘hclust’. As a result, the combination of ‘correlation’ and ‘ward’ method was chosen as the optimal one for our chosen corpus, which was applied to ten sets of randomly selected 100 essays for further validation. As a final step for topic classification, the ‘correlation’-‘ward’ combination was applied to classify the whole corpus into six topics. The precision rate was estimated to be 98.7%, a quite decent one for our purpose. We then conducted a Key word analysis on the six topic-groups, thereby showing some distributional characteristics of the words used in each group.

목차

Abstract
 1. 서론
 2. 대상 코퍼스: YELC 2011
 3. 분석 방법 및 절차: 비감독 기반 계층적 군집화
  3.1. 코퍼스 전처리
  3.2. 자료변환
  3.3. 군집화
 4. 검증 및 적용
  4.1. 임의의 파일 집합 선택을 통한 반복 검증
  4.2. 전체 자료 분석
 5. 군집별 핵심어 분석
 6. 결론
 참고문헌

저자정보

  • 최재웅 Choe, Jae-Woong. 고려대학교 언어학과
  • 송지영 Song, Ji-Young. 고려대학교

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 기관로그인 시 무료 이용이 가능합니다.

      • 6,000원

      0개의 논문이 장바구니에 담겼습니다.