Predicting Employee Job Satisfaction through Explainable Machine Learning - A Text-Analytic Study of Glassdoor Reviews -

Ziyi Wang; Hanjun Lee

Predicting Employee Job Satisfaction through Explainable Machine Learning - A Text-Analytic Study of Glassdoor Reviews -

원문정보

설명가능한 머신러닝을 통한 직원 직무만족도 예측 - 글래스도어 리뷰의 텍스트 분석 연구 -

Ziyi Wang, Hanjun Lee

대한경영정보학회 경영과 정보연구 제44권 제4호 2025.12 pp.251-270 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

[Purpose] This study proposes a novel explainable AI framework designed to predict employee job satisfaction from online reviews. Explicitly highlighting the study’s novelty, the framework integrates topic-level semantics(Latent Dirichlet Allocation(LDA)) and LLM-based linguistic diversity indicators to reveal how employees’ narrative complexity reflects their underlying job satisfaction. This approach moves beyond traditional feature engineering to capture the subtle linguistic-psychological patterns. [Methodology] A total of 141,854 Glassdoor reviews from 120 firms were analyzed through four stages: data collection, preprocessing, modeling, and evaluation. Ratings of 4–5 were labeled as satisfied and 3 or below as dissatisfied. LDA topic modeling identified three themes from pros and cons sections, while linguistic diversity indicators such as cons_entropy, cons_sem_div, cons_pos_ttr, and cons_mtld were extracted using Sentence-BERT and MPNet embeddings. Random Forest and XGBoost models were trained with Synthetic Minority Over-sampling Technique(SMOTE)-based balancing, and SHapley Additive exPlanations(SHAP) analysis was applied for interpretability. [Findings] Random Forest achieved the best performance(accuracy and F1 = 0.85). Key predictors included employees’ company recommendations, CEO evaluations, business outlook perceptions, and linguistic diversity measures capturing complex or coherent expression patterns. [Implications] Results support the Job Demands-Resources(JD-R) model, showing that language reflecting workload and conflict lowers satisfaction, while positive references to pay and support enhance it. The explainable AI framework offers theoretical and practical insights for transparent HR analytics.

한국어

[연구목적] 본 연구는 온라인 리뷰 데이터를 기반으로 직원의 직무만족도를 예측하기 위한 새로운 설명가 능한 인공지능(XAI) 프레임워크를 제안한다. 본 연구의 참신성은 토픽 수준의 의미 정보(Latent Dirichlet Allocation(LDA))와 대형언어모델(LLM) 기반 언어 다양성 지표를 통합하여 직원 서술의 복잡성이 잠재된 직 무만족도를 어떻게 반영하는지를 규명하는 데 있다. 이러한 접근은 전통적인 피처 엔지니어링을 넘어, 미묘한 언어·심리적 패턴을 포착할 수 있도록 한다. [연구방법] 총 120개 기업의 Glassdoor 리뷰 141,854건을 수집하여 데이터 수집, 전처리, 모델링, 평가의 네 단계를 거쳐 분석하였다. 평점 4–5는 만족(1), 3 이하의 평점은 불만족(0)으로 구분하였다. LDA 토픽모델링을 통해 장점(pros)과 단점(cons) 섹션에서 각각 세 가지 주제를 도출하였으며, Sentence-BERT와 MPNet 임베딩 을 활용하여 cons_entropy, cons_sem_div, cons_pos_ttr, cons_mtld 등 언어 다양성 지표를 산출하였다. Synthetic Minority Over-sampling Technique(SMOTE)를 적용하여 Random Forest와 XGBoost 모델을 학습 하였으며, 해석력을 확보하기 위해 SHapley Additive exPlanations(SHAP) 분석을 수행하였다. [연구결과] Random Forest 모델이 가장 높은 예측 성능(정확도 및 F1 값 = 0.85)을 보였다. 핵심 예측 변수 에는 추천 여부, CEO 평가, 기업 전망뿐 아니라 복잡하거나 일관된 표현 패턴을 포착하는 언어 다양성 지표가 포함되었다. [연구의 시사점] 연구결과는 직무요구–자원(Job Demands-Resources, JD-R) 모형을 지지하며, 장시간 근 무와 갈등을 반영하는 언어는 만족도를 낮추고, 보상과 지원을 언급하는 긍정적 표현은 만족도를 높이는 것으 로 나타났다. 제안된 설명가능한 AI 프레임워크는 HR 분석의 투명성과 실무적 활용에 기여할 수 있는 이론적 및 실천적 시사점을 제공한다.

Ⅰ. Introduction
Ⅱ. Related Research
2.1 Classic Determinants of Job Satisfaction
2.2 Online Reviews and Job Satisfaction
Ⅲ. Analysis Methods
Ⅳ. Results
Ⅴ. Discussion and Conclusion
5.1 Theoretical Implications
5.2 Practical Implications
5.3 Limitations and Future Work
References
< 국문요약 >

키워드

저자정보

Ziyi Wang 왕자예. PhD Candidate, Myongji University, Department of Management Information Systems
Hanjun Lee 이한준. Asssociate Professor, Myongji University, Department of Management Information Systems

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

5,500원

0개의 논문이 장바구니에 담겼습니다.

earticle