Development of a Text Processing Automation Method for Using Modern Korean-Chinese Mixed Text in General Education

Kim, Sun-young

교양교육연구

이 학술지 인용지수 조회 이 학술지 논문 검색

2023, vol.17, no.5, 통권 75호 pp. 41-52 (12 pages)

발행기관 : 한국교양교육학회

이 기관으로 동향 분석 이 기관 일반 현황

연구분야 :

복합학

이 분야로 동향 분석 이 분야로 인용지수 조회 이 분야로 논문 검색

학제간연구

이 분야로 동향 분석 이 분야로 인용지수 조회 이 분야로 논문 검색

김선영 /Kim, Sun-young ¹

¹서울대학교

초록

기존에 한국사 교양교육 현장에서 사료 활용의 필요성은 인식되어 왔지만 사료를 곧바로 활용하기에는 난점이 있었다. 특히 학생들에게 직접 사료를 검색하며 살펴보도록 하는 방식을 취하기 위해서는그 사료가 국역이 제공되는 것이어야 했다. 본 연구에서는 교양교육에서의 근대 국한문 혼용 신문자료활용을 위한 텍스트 처리(text processing) 자동화 방식의 고안과 추가적인 과제에 관해 검토해보았다. 한국 근대 신문자료는 역사학뿐 아니라 정치학, 사회학 등의 제 영역에 있어 좋은 자료가 되며, 본연구에서는 대량으로 내려 받은 텍스트를 일괄 처리하는 것에 주안점을 두었다. 근대 국한문 혼용의신문자료를 교양교육에 활용하고자 할 때 대부분의 체언이 한자로 되어 있고 아래아 등 옛한글 요소가다수 포함되어 있는 등, 별도의 사료 가공이 전제되지 않는다면 학생들이 독해하기 어려울 수 있다. 자동화 처리가 가능하다면, 특정 토픽과 관련된 기사들을 뽑아낸 후 가독성을 높인 기사를 제공하여함께 읽어 나가는 방식으로 활용할 수 있고, 또한 툴 자체를 추출하여 학생들에게 활용하도록 제공함으로써 직접 필요한 사료를 찾아보고 활용하게 할 수 있다. 본 연구에서 고안한 자동화 처리 방식은 다음과같다. 소스 데이터에서 옛한글 부분의 문자코드를 표준화하고 옛 방식의 어조사 등을 현대어 표기로대체해주는 작업을 수행하도록 했다. 다음으로 한자 부분에 음독 한글 표기를 괄호로 병기해주는 처리를하였다. 각기 순서들이 차례대로 수행되어 별도의 파일로 저장되도록 하였으며, 결과물을 검토해보면상당한 가독성의 향상을 확인할 수 있다. 그러나 현 상황에서는 한계도 존재한다. 예를 들어 어떤한자가 복수의 독음을 가지는 경우 유니코드 대표 음가가 독음으로 달리게 되는데, 그 한자 단어에맞지 않는 독음일 경우 오히려 독해를 방해하는 경우도 있을 수 있다. 이러한 경우는 수정을 해준후 교육에 활용해야 한다. 또한 가공하고자 하는 텍스트가 국한문 혼용이기는 하나 한문 표현의 빈도가높을 경우 독음이 달린다고 하더라도 한문 지식이 없는 학생이 독해하기는 어려울 수 있다. 이러한경우까지도 포함하여 매끄럽게 변환하기 위해서는 한문 형태소를 분석하는 기술의 개발이 진전될것을 요한다.

This study aims to develop an automated text processing method for the use of modern Korean-Chinese mixed newspaper materials in liberal arts education. This article describes the process, its results, and additional tasks. In particular, the focus was placed on the batch processing of texts downloaded in large quantities. The problem with the existing computerized and serviced Korean-Chinese mixed texts is that most of the old Korean texts were computerized with PUA codes, which are not currently in Unicode standards. To process or analyze texts in computer language, it is necessary to convert these characters into standard code methods. Based on the standardized data brought in, the work of replacing the old form of words with the current Korean notation was carried out. Finally, in the text, phonological and Korean notation of Chinese characters are added in parentheses. Reviewing the results shows a significant improvement in readability. If you want to use modern Korean and Chinese newspaper materials for liberal arts education, it may be difficult unless a separate feed process is premised. When trying to use modern Korean-Chinese newspaper materials for liberal arts education, it can be difficult due to the problem of notations. Furthermore, this automated form of processing allows instructors to extract articles related to specific topics and to read articles with students that increase their readability. But as things stand, there are limits. For example, if a Chinese character has multiple consonants, there may be cases in which a Chinese character has a reading sound that does not fit the Chinese character word and rather interferes with the reading of it. These should be used for education purposes after correction. In addition, even if it is a Korean-Chinese mixed text and pronunciation is provided here, it is difficult for a student without knowledge of classical Chinese grammar to read that text if a lot of classical Chinese expressions are mixed in. This case ultimately becomes a problem that can be solved by the development of a classical Chinese morpheme analyzer. If progress is made on such things as the construction of the classic Chinese Corpus, it will be of great help in the development of history and liberal arts teaching tools.

키워드

인용현황

교양교육에서의 근대 국한문 혼용 신문자료 활용을 위한 텍스트 처리(text processing) 자동화 방식의 고안과 추가적인 과제 Development of a Text Processing Automation Method for Using Modern Korean-Chinese Mixed Text in General Education

초록 열기/닫기 버튼

키워드열기/닫기 버튼

피인용 횟수

인용현황

KCI에서 이 논문을 인용한 논문의 수는 0건입니다.

참고문헌(10) 열기/닫기 버튼 * 2023년 이후 발행 논문의 참고문헌은 현재 구축 중입니다.