원문정보
초록
영어
Credit scoring is essential for assessing financial soundness and serves as a fundamental tool for loan screening, capital allocation, and risk management in financial institutions. The accuracy and reliability of credit scoring models are directly linked to financial system stability, making their continuous improvement essential. Traditional models primarily rely on Generalized Linear Models (GLM), particularly Logistic Regression. While these models provide interpretable relationships between financial variables and default risk, they are constrained by their linear functional form and reliance on a limited set of features. This restricts their adaptability to evolving financial markets and the increasing availability of unstructured data sources. Advancements in machine learning (ML) and artificial intelligence (AI) have introduced various models to enhance predictive accuracy and address the limitations of conventional credit scoring models. ML-based approaches such as Random Forest, Support Vector Machines (SVM), XGBoost, and LightGBM, along with deep learning techniques, have been widely applied to credit risk modeling. These methods process large volumes of financial and transactional data, capturing complex patterns in credit risk assessment. However, their adoption requires further validation regarding interpretability and regulatory compliance. This study makes four key contributions to credit scoring research. First, unlike previous studies that relied on subjectively selected financial variables, we incorporate all financial features collected by credit agencies and adopt a data-driven selection approach, minimizing researcher bias and ensuring greater objectivity. This enables us to identify the most relevant predictors based on empirical evidence rather than predetermined assumptions. Second, we address the class imbalance issue, a common challenge in credit risk modeling. Since default cases are rare, traditional logistic regression models often suffer from biased estimates, where the model underweights defaulting firms. To mitigate this, we apply the Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset before applying ML techniques. Third, we integrate multiple ML techniques to derive a comprehensive interpretation of feature importance. Specifically, we compare classification performance across Random Forest, Extreme Gradient Boosting (XGBoost), and Category Boosting (CatBoost). Unlike prior studies that analyze a single ML model independently, our approach integrates feature importance rankings across multiple models, providing a more robust estimation of the importance of financial variables in credit risk. Fourth, while ML models enhance predictive accuracy, their complexity can hinder interpretability, making adoption challenging for financial institutions. This study emphasizes the importance of explainable AI (XAI) in credit scoring. By applying Shapley Additive Explanations (SHAP), we provide insights into how key financial variables influence credit risk and default probabilities, offering practical guidance on the appropriateness of financial variables and threshold settings used in credit scoring. This study analyzes credit scoring data of manufacturing firms evaluated by Korea Enterprise Assessment from 2010 to 2024. By applying multiple ML techniques, we identify key financial variables influencing credit risk and integrate results for a comprehensive interpretation. Our analysis highlights differences between realized credit risk, which reflects actual defaults and missed payments, and implied credit risk, which is assessed by the current credit risk model. Realized credit risk is primarily driven by short-term liquidity and profitability indicators, such as inventory turnover period, current ratio, return on equity, and return on capital employed. In contrast, implied credit risk is largely influenced by firm size and long-term financial stability, with key variables including EBITDA, cost-to-sales ratio, pre-tax continuous operating income, total sales, and total liabilities. These findings suggest that while current credit scoring models emphasize long-term financial health, actual credit events are more influenced by short-term financial constraints. This discrepancy underscores the need to supplement credit scoring models by incorporating financial variables, particularly those related to short-term liquidity, especially for high-risk firms. Further analysis reveals that the importance of financial variables varies across rating levels. For A-level firms, short-term financial stability and debt repayment capacity are critical, emphasizing the importance of liquidity management. In contrast, B-level firms are more affected by structural financial indicators such as the debt-to-equity ratio and capital adequacy ratio, highlighting the significance of long-term solvency and debt management. These differences underscore the need to tailor credit scoring criteria based on risk levels. SHAP results indicate that while higher debt-to-equity and capital growth ratios generally reduce the likelihood of default, their impact on credit risk is nonlinear. This suggests that simple threshold-based classification may be insufficient for credit scoring. Instead, a more nuanced approach that accounts for interactions between financial indicators and their varying effects across credit risk levels is needed. Beyond feature importance analysis, we examine credit transitions. Credit scores evolve based on firms' financial conditions. Our findings show that while most firms maintain stable credit scores, downgrades occur more frequently than upgrades, particularly within the B-level category between 2022 and 2023. While some A-level firms experienced rating upgrades between 2019 and 2022, the trend shifted toward downgrades from 2022 to 2023. These patterns highlight the need for dynamic credit transition models that account for temporal changes in creditworthiness.
한국어
현대 금융시장에서 신용평가는 금융건전성 평가뿐만 아니라 금융 기관의 대출 심사와 리스크 관리에도 필수적이다. 그러나 금융 환경이 급변하고 머신러닝 기술이 발전함에 따라 기존 신용평가 모형은 한계가 드러나고 있다. 본 논문에서는 2010년부터 2024년 까지 한국기업평가에서 부여한 제조업 기업들의 신용등급 데이터를 바탕으로 신용평 가 모형의 개선 방향을 논의하였다. 본 논문은 데이터 탐색을 통해 기존 신용평가 모형의 문제점을 파악하고, 다양한 머신러닝 기법을 적용하여 신용평가 모형을 개선하 고자 하였다. Random Forest, XGBoost, CatBoost를 활용해 주요 재무 변수의 중요도를 분석하고 신용위험 예측력을 향상시키는 데 초점을 맞추었다. 또한, 데이터 불균형 문제를 해결하기 위해 SMOTE를 적용하고, XAI 기법인 SHAP을 활용하여 신용등급 산정에 사용되는 재무 변수와 임계값 설정의 적정성을 평가하였다. 분석 결과, 실현된 신용위험과 기존 평가 방식에서 결정된 내재적 신용위험을 설명하는 주요 재무 변수가 다름을 확인하였다. 이는 특히 고 신용위험 기업의 평가 기준을 재정립할 필요성을 시사한다. 본 연구는 머신러닝 기반 신용평가 모형의 개선 가능성 을 제시하며, 금융 기관이 보다 정교한 신용위험 관리 전략을 수립하는 데 기여할 수 있다.
목차
Abstract
Ⅰ. 서론
Ⅱ. 데이터 탐색
1. 재무특성변수
2. 전 산업 데이터
Ⅲ. 신용위험 모형과 머신러닝방법의 적용
1. 모형의 설정
2. 머신러닝 방법의 적용
Ⅳ. 특성변수의 중요도 비교결과
1. 특성변수의 중요도 비교
2. SHAP
Ⅴ. 결론 및 제언
References
<부록>
