A Modified Vision Transformer-based Anomaly Recognition using Audio Data

Hikmat Yar; Amjid Ali; Zulfiqar Ahmad Khan; Noman Khan; Min Je Kim; Su Min Lee; Sung Wook Baik

Oral Session Ⅴ 차세대컴퓨팅 전 분야

A Modified Vision Transformer-based Anomaly Recognition using Audio Data

원문정보

Hikmat Yar, Amjid Ali, Zulfiqar Ahmad Khan, Noman Khan, Min Je Kim, Su Min Lee, Sung Wook Baik

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 학술대회 2024 한국차세대컴퓨팅학회 춘계학술대회 2024.04 pp.337-340

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

In recent years, anomaly recognition using audio has attracted the attention of the research community, due to the increasing number of abnormal situations day by day. In the past, researchers have mainly focused on video-based anomaly recognition. However, occlusion is one of the most important factors due to which the anomalous object is unidentifiable. Therefore, in this paper, we proposed a modified vision transformer that utilized the Shifted Patch Tokenization (SPT), and Local Self-Attention (LSA) mechanism and reduced the number of multilayer perceptrons in the head, enabling the model to capture rich spatial information within the spectrogram of anomalous data. The proposed model is implemented using the Sound Events for Surveillance Applications (SESA) dataset and obtained 87% testing accuracy. Thus, the proposed model is an efficient and effective solution for audio-based anomaly recognition.

키워드

저자정보

Hikmat Yar Sejong University
Amjid Ali Sejong University
Zulfiqar Ahmad Khan Sejong University
Noman Khan Sejong University
Min Je Kim Sejong University
Su Min Lee Sejong University
Sung Wook Baik Sejong University

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

0개의 논문이 장바구니에 담겼습니다.

earticle

A Modified Vision Transformer-based Anomaly Recognition using Audio Data

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문