원문정보
초록
영어
Recognizing anomalies in surveillance is crucial for public safety to identify events that deviate from normal patterns. Visual information is essential for effective anomaly recognition; however, audio data can enhance recognition accuracy by providing additional context. Despite this, existing systems only utilize visual information, overlooking the potential of audio modalities in anomaly recognition. This paper introduces a multi-modal framework for anomaly recognition through active learning, integrating audio and visual modalities to enhance anomaly prediction. The framework extracts features using a pretrained ResNet-50 convolutional neural network (CNN) model from the visual and audio data. The extracted features are then forwarded to the Bi-Directional Long Short-Term Memory (Bi-LSTM) network for temporal feature learning. These features are then fused and fed into a classification layer for final prediction. The proposed framework's performance is assessed on a benchmark dataset and yields promising results.
목차
1. INTRODUCTION
2. METHODOLOGY
2.1 Feature Extraction
2.2 Sequential Models
2.3 Fusion Mechanism and Classification Layer
3. EXPERIMENTAL RESULTS
3.1. Comparative Analysis
4. CONCLUSION
ACKNOWLEDGMENT
REFERENCES
