원문정보
초록
영어
Recent advancements in object detection increasingly leverage end-to-end transformer architectures. However, many studies in this domain have applied transformer structures, originally designed for natural language processing, directly to object detection models. This direct application can lead to issues such as skipping self-attention in first decoder layer and the prediction of duplicate objects during training. In this study, we propose a novel approach to address these challenges by reversing the attention order in the transformer decoder from the self-cross to a cross-self structure. This modification structurally prevents the initial attention skip and mitigates the issue of predicting the same object multiple times by delaying the implementation of self-attention. Experimental results demonstrate that reversing the attention order in the decoder improves both the training loss and test performance across all stages of the learning process.
목차
I. INTRODUCTION
II. METHODOLOGY
A. RT-DETR
B. Reversing Attention Mechanisms
C. Experimental Results
III. CONCLUSION
ACKNOWLEDGMENT
REFERENCES
