

Clusters and Key Clusters in the Maritime English Corpus


Se-Eun Jhang, Sung-Min Lee

The purpose of this study is twofold. The first is to investigate how many 4-gram clusters should be considered in Maritime English and what portion of these clusters is associated with general English. Using WordSmith tools, we extracted and identified both clusters and key clusters in terms of frequency, dispersion text thresholds, and a log-likelihood test, respectively, comparing the Maritime English Corpus(MECO) II with the British National Corpus(BNC) Baby. Our second aim is to investigate the statistical relationship between syntactic categories and semantic functions of 4-gram key clusters using a chi-square test, and to explore their semantic functions, such as stance bundles, discourse organisers, referential expressions, and special bundles, by comparing their collocates with those of the BNC Baby. Furthermore, we attempt to graphically display dispersion plots where 4-gram clusters possessing each of their semantic functions have occurred in the whole text in order to discover whether the use of their semantic functions is different from general English in comparison to the dispersion plots of the MECO II and the BNC Baby.


 1. Introduction
 2. Literature review
 3. Data and methodology
  3.1. Statistical information
  3.2. Methodology for clusters and key clusters extraction
 4. Results and discussion
  4.1. Comparison of clusters between the two corpora
  4.2. Statistical relationship between structure and function and collocational patterns of key clusters
  4.3. Dispersion plots
 5. Conclusion


  • Se-Eun Jhang 장세은. Korea Maritime and Ocean University
  • Sung-Min Lee 이성민. Korea Maritime and Ocean University


