머신러닝 기반 피싱 사이트 탐지 모델

오수민; 박민서

기술 융합(TC)

머신러닝 기반 피싱 사이트 탐지 모델

원문정보

Machine Learning-based Phishing Website Detection Model

오수민, 박민서

국제문화기술진흥원 The Journal of the Convergence on Culture Technology (JCCT) Vol.10 No.4 2024.07 pp.575-580 KCI 등재

초록

영어

Detecting the status of websites, normal or phishing, is necessary to defend against intelligent phishing attacks. We propose a machine learning-based classification to predict the status of websites. First, we collect information about ‘URL’, convert it into numerical data, and remove outliers. Second, we apply VIF(Variance Inflation Factors) to understand the correlation and independence between variables. Finally, we develop a phishing website detection model with machine learning-based classifications, which predicts website status. In the test datasets, Random Forest showed the best performance, with precision of 93.74%, recall of 92.26%, and accuracy of 93.14%. In the future, we expect to apply our model to detect various phishing crimes.

한국어

소셜 미디어의 대중화로 지능화된 피싱 공격을 방어하기 위해 접근하고자 하는 사이트의 상태(정상/피싱)를 판 별하는 것이 필요하다. 본 연구에서는 머신러닝 기반 분류 모델을 통해 사이트의 정상/피싱 여부를 예측하는 모델을 제안한다. 첫째, ‘URL’에 대한 정보를 수집하여 수치 데이터로 변환한 후, 이상치를 제거한다. 둘째, 변수들 간의 상 관관계 및 독립성을 파악하기 위해 VIF(Variance Inflation Factors)를 적용한다. 셋째, 머신러닝 기반 분류 모델을 활용하여 피싱 사이트 탐지 모델을 개발하고, 이를 통해 사이트의 상태를 예측한다. 분류 모델 중 랜덤 포레스트 (Random Forest)의 성능이 가장 우수했으며, 테스트 데이터에서 정밀도(Precision) 93.74%, 재현율(Recall) 92.26%, 정확도(Accuracy) 93.14%를 보였다. 향후 이 연구는 다방면의 피싱 범죄 탐지에 적용할 수 있을 것으로 기대된다.

요약
Abstract
Ⅰ. 서론
Ⅱ. 머신러닝 기반 분류 모델
1. 로지스틱 회귀(Logistic Regression)
2. 의사결정 트리 트리(Decision Tree)
3. 랜덤 포레스트(Random Forest)
Ⅲ. 연구방법
1. 데이터 수집
2. 데이터 전처리
3. 변수 선정
4. 모델링
Ⅳ. 실험 및 결과 분석
Ⅴ. 결론
References

키워드

저자정보

오수민 Sumin Oh. 준회원, 서울여자대학교 데이터사이언스학과 학부생
박민서 Minseo Park. 정회원, 서울여자대학교 데이터사이언스학과 교수

참고문헌

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle

머신러닝 기반 피싱 사이트 탐지 모델

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문