earticle

논문검색

Session Ⅳ : Artificial Intelligence

Swin Transformer-based multi-scale crowd localization method

초록

영어

In this paper, we propose a new framework that enables an object detector trained with only point-level annotations to estimate the centroids and sizes of objects in dense scenes. Specifically, the framework is based on the Swin Transformer structure and introduces a self-designed resolution feature fusion module in the hierarchical structure, where the estimation of object centroids is done directly by point supervision, and the object pseudo-size is initialized based on the assumption of local uniform distribution, and the regression of object size is guided by an improved congestion-aware loss function. In the NWPU-Crowd dataset, our method outperformed the existing state-of-the-art detection counting methods in F1-measure, precision, MSE evaluation criteria.

목차

Abstract
I. INTRODUCTION
II. METHOD
A. Swin Transformer
B. Resolution feature fusion module
C. Congestion-aware loss function
III. EXPERIMENTS
A. Evaluation Criteria
B. Dataset
C. Parameter Setting
D. Ablation experiments
E. Experiment results
IV. CONCLUSION
REFERENCES

저자정보

  • Yi Ren Computer Science Chongqing University of Posts and Telecommunications Chongqing, China
  • Xin He Computer Science Chongqing University of Posts and Telecommunication Chongqing, China

0개의 논문이 장바구니에 담겼습니다.