대용량 코퍼스 전산적 툴에 대한 연구

김동성

[Articles]

대용량 코퍼스 전산적 툴에 대한 연구

원문정보

A Study on the Computing Tools for the Large Scale Corpus.

김동성

한국코퍼스언어학회 Corpus Linguistics Research Vol. 6 No. 1 2020.06 pp.45-63 KCI 등재후보

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Since corpus is a pile of everyday language usage, using computing tools is essential in collecting, sifting, mining and using the meaningful data from the massive text data. In this paper, we introduce two tools for handling the large scale corpus; IMS Corpus Workbench (CWB) and Sketch Engine. The architecture of the tools is the inverted index model as a type of reference database, providing corpus handlers with speed and extendibility. The limit of CWB lies in the Western language character unicode system (ISO-8859), causing unsatisfactory handling of Korean in the full-fledged scale. We need to consider more suitable architectural design for searching, storing and user-friendly interface in case of large scale corpus in Korean.

키워드

저자정보

김동성 이화여자대학교

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

5,400원

0개의 논문이 장바구니에 담겼습니다.

earticle

대용량 코퍼스 전산적 툴에 대한 연구

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문