초록
영어
This study identified effects of the relationship between corpus closure and three variables; corpus size, generality of linguistic feature, and genre balance. Three linguistic features; nouns, past tense, and conditional clauses, having distinctive generality differences, were selected, and their frequencies were measured in COCA (Corpus of Contemporary American English) and its downsized and less balanced samples in order to extract correlation efficiencies between them. The statistical analyses revealed that bigger corpus size, higher generality of linguistic feature, and well balanced genre positively affect similarity growth between COCA and the samples. Since higher similarity means a higher degree of corpus closure, it is postulated that less specific linguistic features or well balanced texts are prone to be in a closure state earlier than opposite cases. The study results are not absolute, but an overall tendency, particularly with regards to the generality of linguistic features. Nevertheless, this study suggests that a primary requirement of a sound corpus is to mirror a variety of features of its population as much as possible in order to assure reliability of any studies based on it. Researchers have to estimate the size of the target text by considering its genre balance and the generality of the intended lexicon and grammar for feasibility of the study before planning data collection.
목차
2. Backgrounds
2.1. Linguistic Features
2.2. Genre Balance
2.3. Cyclical Process of Corpus Design
3. Corpora and Programs
4. Analyses
5. Results
5.1. Comparison between downsized samples
5.2. Comparison between balanced and less balanced samples
6. Discussion and Conclusion
References
Appendix
[Abstract]
