A Data Compilation of Mulitple Case-marking Constructions : Using the Sejong Spoken Corpus
The present study builds up a language dataset involving multiple case-marking constructions in Korean. Exploiting the Sejong Spoken Corpus, we extracted 1,021 sentences in which the nominative marker ‘-i/ka’ or the accusative marker ‘-ul/lul’ occur twice or more. These sentences were annotated with respect to 47 linguistic parameters, which the previous studies assume to interact with multiple case-marking constructions. These parameters are divided into five subgroups: namely, (i) distribution, (ii) semantic relation, (iii) nominal category, (iv) predication, and (iv) discourse. The constructed data are numerically analyzed, and the content characteristics are also examined. The numerical analysis looks into proportion of each parameter and correlation between two parameters. The content analysis focuses on how multiple case-marking constructions are realized in naturally occurring conversations. The whole dataset constructed in this study will be readily distributed in order for other linguists to use it for their own research purposes.
1. 서론
2. 배경
2.1. 자료 구축의 대상
2.2. 주요 쟁점
2.3. 구어 말뭉치 활용
3. 자료의 구축
3.1. 자료 추출
3.2. 주석 처리
3.3. 구축 결과
3.4. 결과 공개
4. 자료의 계량적 특성
4.1. 비율 분석
4.2. 상관성 분석
5. 자료의 내용적 특성
5.1. 자료의 구어성
5.2. 특수한 형식의 자료
5.3. 유형 분류
6. 요약 및 향후 과제