초록 열기/닫기 버튼
This article discusses two major problems with respect to compiling early twentieth century corpora: word spacing and sampling. We first attempt to make a word spacing guideline considering mixed script texts of Chinese Characters and Hangul, and to describe the methodologies of stratified random sampling, which has attracted scant attention when building Korean historical corpora.
키워드열기/닫기 버튼
historical corpus, word spacing, sampling, mixed script