원문정보
초록
영어
Video surveillance is widely used for public safety, but anomalous behaviors often manifest patterns similar to normal ones, making detection difficult. Conventional approaches reconstruct full frames into 3D to learn global structure; however, they have the limitation of greatly increased computation due to redundant information in adjacent frames. This paper proposes a method that reduces the number of frames in powers of two and compares performance and training efficiency with the full-frame approach. Based on the UCF-Crime trimmed dataset, we trained a Video Vision Transformer (ViViT); compared to the full-frame baseline, accuracy differed from −0.74% to +1.27%, while training time was shortened by up to 3.8×. These results suggest that, within the range that preserves global structure, frame reduction can serve as an efficient alternative for video anomaly detection.
목차
I. INTRODUCTION
II. RELATED WORK
A. Dataset Preprocessing
B. Model Training
C. Evaluation Metrics
IV. EXPERIMENTAL RESULTS
V. CONCLUSION
ACKNOWLEDGMENT
REFERENCES
