

Convergence of Internet, Broadcasting and Communication

Development of Tourism Information Named Entity Recognition Datasets for the Fine-tune KoBERT-CRF Model



A smart tourism chatbot is needed as a user interface to efficiently provide smart tourism services such as recommended travel products, tourist information, my travel itinerary, and tour guide service to tourists. We have been developed a smart tourism app and a smart tourism information system that provide smart tourism services to tourists. We also developed a smart tourism chatbot service consisting of khaiii morpheme analyzer, rule-based intention classification, and tourism information knowledge base using Neo4j graph database. In this paper, we develop the Korean and English smart tourism Name Entity (NE) datasets required for the development of the NER model using the pre-trained language models (PLMs) for the smart tourism chatbot system. We create the tourism information NER datasets by collecting source data through smart tourism app, visitJeju web of Jeju Tourism Organization (JTO), and web search, and preprocessing it using Korean and English tourism information Name Entity dictionaries. We perform training on the KoBERT-CRF NER model using the developed Korean and English tourism information NER datasets. The weight-averaged precision, recall, and f1 scores are 0.94, 0.92 and 0.94 on Korean and English tourism information NER datasets.


1. Introduction
2. The Korean and English Tourism Information NER Datasets
2.1 Source data of smart tourism NER datasets
2.2 Tourism information Name Entity BIO tagging dictionary
2.3 Pre-processing for tourism information NER data generation
3. Tourism Information NER Performance of the KoBERT-CRF NER model
4. Conclusions and Further Study


  • Myeong-Cheol Jwa Student, Korea University of Technology and Education, Cheonan-si, Korea
  • Jeong-Woo Jwa Professor, Department of Telecommunication Eng., Jeju National University, Jeju, Korea


자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 기관로그인 시 무료 이용이 가능합니다.

      • 4,000원

      0개의 논문이 장바구니에 담겼습니다.