원문정보
초록
영어
According to statistics from the National Forensic Service (NFS), the number of telephone financial fraud cases has more than doubled since onset of COVID-19 compared to the previous year. Telephone financial fraud, including false emergency calls to numbers 112 or 119, involves using telecommunication networks to threaten victims or extort money and valuables. The amount lost to voice phishing, in particular, has been steadily increasing annually posing a significant social problem. In such cases of telephone financial fraud, voice recordings serve as a crucial piece of evidence. This paper researches the forensic application of the latest deep learning-based speaker recognition technology for analyzing an individual from voice evidence. It explains the operational principles and usage of VOCALISE, a commercial speaker recognition engine developed by Oxford Wave Research. Notably, VOCALISE, widely used in forensic analysis organizations worldwide, is adept at creating sample groups from speaker datasets, measuring their statistical characteristics, and analyzing specific voice signals. To evaluate performance on korean speech data, we used 736 voice files collected from 159 individuals involved in actual voice crime cases to calculate the Equal Error Rate (EER) and Cost of Log-Likelihood Ratio (CLLR). Based on these real audio recordings when actual voice evidence is submitted by investigative agencies the paper explains how to use VOCALISE for a globally accepted likelihood ratio-based verbal expression method.
목차
Ⅰ. 서론
Ⅱ. VOCALISE의 x-vector 뉴럴 네트워크 구조
1. 훈련(Training) 과정
2. 추론(Inference) 과정
Ⅲ. VOCALISE의 x-vector 기반 화자인식 과정
1. 전처리
2. 음성 파일에 대한 특징벡터 추출
3. 추출된 음성 특징벡터 간의 스코어링
Ⅳ. VOCALISE를 활용한 법과학 화자 인식 절차 수립
1. 평가 데이터 수집
2. 성능 평가 (i-vector vs. x-vector)
3. 화자 인식 결과 해석
Ⅴ. 고찰 및 향후 계획
Ⅵ. 사사
Ⅶ. 참고문헌
