원문정보
초록
영어
3D object detection is widely applied in robotics and autonomous driving; since 3D scenes in autonomous driving are typically outdoor environments, current methods exhibit substantial computational wastage and significant time delays when using convolution directly in the backbone network. This paper proposes a backbone network based on sparse convolutional spatial-semantic fusion modules to solve this problem. High-level semantic features and low-level spatial features extracted through sub-manifold sparse convolution and sparse convolution are fused to enhance feature representation capabilities. Our proposed backbone network achieves excellent performance on the KITTI dataset.
목차
I. INTRODUCTION
II. PROBLEM FORMULATION
A. Definition
B. Problem
III. THE PROPOSED MODEL
A. Pillar Encoding module
B. Backbone Part
C. Neck module, detection head and loss function
IV. EXPERIMENTAL RESULT
A. KITTI
B. Experiment details
C. KITTI evaluation results
D. Efficiency analysis
V. CONCLUSION
REFERENCES
