초록 열기/닫기 버튼

The main purpose of this study is to understand the effects of task complexity on test-takers’ linguistic output and raters’ evaluation of test-takers’ performance. One hundred and fifty-six audio clips by 52 non-native-English speaking international teaching assistants (ITAs) at a U.S. university were graded by two raters with four different rating scales a holistic scale, a pronunciation scale, a vocabulary and grammar scale, and a pace and fluency scale. The clips were then transcribed and coded for linguistic complexity, accuracy, and fluency measures per AS-unit. Task complexity was evaluated based on the resource-directing task complexity criterion from Robinson’s (2001b) Triadic Componential Framework. The grades and linguistic measures were statistically analyzed using multiple Friedman tests and follow-up pairwise comparisons with Wilcoxon signed-rank tests. It was found that only one of the fluency measures, phonation-time ratio, and fluency scores, not holistic scores, with high and low task complexity, were statistically different with mid-effect sizes. In the follow-up interview, raters reported that they tended to adjust their holistic rating severity according to task complexity. Implications of the findings for rater training and evaluation rubric design are discussed.