LightGBM 기반 수입 화물 RBI를 위한 다중 위험 등급 예측 및 영향 요인 분석

차주호

LightGBM 기반 수입 화물 RBI를 위한 다중 위험 등급 예측 및 영향 요인 분석

원문정보

LightGBM-Based Prediction of Multiple Risk Levels and Factor Analysis for Import Cargo RBI

차주호

제주대학교 지능소프트웨어 교육연구소 지능정보융합과 미래교육 제4권 제22호 2025.10 pp.1-7 KCI 등재후보

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

This paper aims to develop a machine learning-based system to predict the risk levels of import cargo for efficient risk management. To this end, we transformed a publicly available binary classification dataset of import declarations into a multi-class (Red, Yellow, Green) dataset suitable for a real-world Risk-Based Inspection (RBI) environment. Based on this data, a model was trained using LightGBM (Light Gradient Boosting Machine), an ensemble model proven for its performance and efficiency. Specifically, to enhance the capability of identifying high-risk cargo, a core objective of the RBI system, we set the Recall for the Red class as the key performance indicator. The model was optimized by adjusting class weights and the prediction threshold. The performance evaluation demonstrates the model achieved a high Red class Recall of 0.8700 while maintaining a stable overall performance with a Macro F1 Score of 0.8259. Furthermore, we secured the model’s reliability by applying an Explainable AI (XAI) technique, Permutation Importance, which confirmed that features such as the country of origin and HS Code are key variables in risk classification. This study is significant in that it presents an empirical methodology for developing a practical and effective machine learning-based RBI system, incorporating data transformation and model tuning processes that reflect real-world business requirements.

한국어

본 연구는 수입 화물의 위험 관리를 효율적으로 수행하기 위해 머신러닝 기반의 위험 등급 예측 시스템 개발을 목표로 한다. 이를 위해, 이진 분류용으로 공개된 수입 신고 데이터를 실제 위험 기반 검사(RBI) 환경에 적합한 다중 분류(Red, Yellow, Green) 데이터셋으로 변환했다. 이를 바탕으로 성능과 효율성이 검증된 앙상블 모델인 LightGBM(Light Gradient Boosting Machine)을 사용하여 학습을 진행했다. 특히, RBI 시스템의 핵심 목표인 고위험 화물 식별 능력을 높이기 위해 Red 등급의 재현율(Recall)을 목표 지표로 설정하고, 클래스 가중치(class weight) 및 임계값(threshold) 조정을 통해 모델을 최적화했다. 성능 평가 결과, 모델의 전체적인 성능을 나타내 는 Macro F1 Score는 0.8259 수준을 유지하면서도, Red 등급 재현율은 0.8700이라는 높은 수준을 달성했다. 또한, 설명가능 인공지 능(XAI) 기법인 순열 중요도(Permutation Importance) 분석을 통해 원산지 국가와 HS Code 등이 위험 등급 분류의 주요 변수임을 확인함으로써 모델의 신뢰성을 확보했다. 본 연구는 실제 비즈니스 요구사항을 반영한 데이터 변환 및 모델 튜닝 과정을 통해, 실용적 이고 효과적인 머신러닝 기반 RBI 시스템 개발의 실증적인 방법론을 제시했다는 점에서 의의를 갖는다.

키워드

저자정보

차주호 Jooho Cha. 청운대학교 스마트시티전문대학원 및 공과대학 멀티미디어학과 교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle