원문정보
초록
영어
In this paper, we propose a new framework that enables an object detector trained with only point-level annotations to estimate the centroids and sizes of objects in dense scenes. Specifically, the framework is based on the Swin Transformer structure and introduces a self-designed resolution feature fusion module in the hierarchical structure, where the estimation of object centroids is done directly by point supervision, and the object pseudo-size is initialized based on the assumption of local uniform distribution, and the regression of object size is guided by an improved congestion-aware loss function. In the NWPU-Crowd dataset, our method outperformed the existing state-of-the-art detection counting methods in F1-measure, precision, MSE evaluation criteria.
목차
I. INTRODUCTION
II. METHOD
A. Swin Transformer
B. Resolution feature fusion module
C. Congestion-aware loss function
III. EXPERIMENTS
A. Evaluation Criteria
B. Dataset
C. Parameter Setting
D. Ablation experiments
E. Experiment results
IV. CONCLUSION
REFERENCES