목적지향 대화시스템에서 LSTM 언어모델 기반의 한국어 자연어 생성

허윤석; 강상우; 서정연

원문정보

Korean Natural Language Generation Using LSTM-based Language Model for Task-Oriented Spoken Dialogue System

허윤석, 강상우, 서정연

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 논문지 Vol.16 No.3 2020.06 pp.35-50 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Natural language generation in the dialogue system is a task that transforms the semantic frame of the system utterance determined in the dialogue management phase into a natural language that can be understood by humans. Existing studies have still faced some obstacles in that only very limited types of utterances or grammatically incomplete ones are generated from the semantic frames. In order to address these issues simultaneously, we propose a Korean natural language generation model using a long short term memory based language model. In particular, we exploit the beam search decoding method to obtain system utterances with diverse structures and grammatical correctness. The experiments were conducted individually with respect to the word, morpheme, and syllable units, and the generated utterances were evaluated in both quantitative and qualitative ways. As a result, the morpheme-based model with the beam search decoding has achieved the most robust result of all. In fact, in the quantitative evaluation result of the generated sentence, the BLEU-4 score was 0.86 and the SER was 0.03, and the qualitative evaluation was also confirmed to be grammatically correct and contextually natural.

한국어

대화시스템에서 자연어 생성은 대화관리 단계에서 결정한 시스템 발화의 의미표현을 사람이 이해할 수 있는 자연어 로 생성하는 것이다. 기존의 자연어 생성 연구는 의미표현에 대하여 매우 제한된 종류의 발화만을 생성하거나 문법 적으로 불완전한 발화를 생성한다는 문제점이 있다. 그래서 본 논문에서는 문제점들을 동시에 처리하기 위하여 Long Short Term Memory 기반의 언어모델을 이용한 한국어 자연어 생성 모델을 제안한다. 특히 우리는 시스템 발화의 다양성과 문법적 정확성을 높이기 위하여 빔서치 디코딩을 적용한다. 실험은 어절, 형태소, 음절단위에 따라 개별적으로 진행하였으며, 생성한 문장들은 정량적, 정성적 평가를 모두 진행하였다. 그 결과 형태소 단위로 학습한 제안모델에 빔서치 디코딩을 적용한 방법은 가장 좋은 성능을 보였다. 실제로 해당 생성 문장은 정량평가 결과에서 BLEU 지표는 0.86, Slot Error Rate 지표는 0.03을 기록하였으며 정성평가 역시 문법적으로 정확하고 문맥적 으로 충분히 자연스러운 결과임을 확인하였다.

요약
Abstract
1. 서론
2. 관련연구
2.1 규칙/템플릿 기반 자연어 생성 모델
2.2 말뭉치를 이용한 통계기반 자연어 생성 모델
3. 한국어 시스템 발화 생성을 위한 대용량말뭉치 소개
4. 한국어 자연어 생성을 위한 LSTM 언어모델
4.1 Recurrent Neural Network 기반의 언어모델
4.2 Long Short Term Memory를 이용한 언어모델
4.3 대화시스템에서 시스템 발화 생성을 위한 LSTM기반 언어모델
5. 실험환경 및 결과 분석
5.1 실험 환경
5.2 평가 척도
5.3 실험 결과 및 분석
6. 결론
참고문헌

earticle

목적지향 대화시스템에서 LSTM 언어모델 기반의 한국어 자연어 생성

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문