A Gender Identification of Korean Blog Writers through Machine Learning

Ji-Myoung Choi

[Articles]

A Gender Identification of Korean Blog Writers through Machine Learning

원문정보

Ji-Myoung Choi

한국코퍼스언어학회 Corpus Linguistics Research Vol. 7 No. 2 2022.12 pp.71-89

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Choi, J.M.(2023). A gender identification of Korean blog writers through machine learning. Gender identification of texts is a subfield of author analysis; author profiling. This study is an preliminary experiment on an automatic gender detection model for the 1,162 posts of 13 blog owners. As linguistic features, four types of n-gram (word, function word, character, and POS), phoneme frequency, and four lexical sets were chosen, and the support vector machine was adopted as a classifier. The classification accuracy ranged from 54% to 99% depending on the feature type. But the best performing model was produced(obtained) when all the features were inputted combined minus word n-grams. The most salient features distinguishing female from male writers were found to be the first person pronouns( (‘나(I, me)’ and ‘내(+*)’ for females vs. 저(-*)’ and 제(-)’ for males)) and sentence endings(‘다, ‘ᄂ다’ and ‘었다’ for females vs. , ‘습니다’, ‘ᄇ니다’, ‘습니다’, ‘네요’for males). This preliminary study could lead to further research into the gender language variations, and contribute to the development of a stable and robust author profiling system.

키워드

저자정보

Ji-Myoung Choi Yonsei University

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

5,400원

0개의 논문이 장바구니에 담겼습니다.

earticle

A Gender Identification of Korean Blog Writers through Machine Learning

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문