원문정보
초록
영어
Recently, hand gesture recognition based on deep 3D convolution neural networks has made great progress. However, the large number of weight parameters that need to be optimized leads to its expensive computational cost. We introduced a transformer-based framework for hand gesture recognition, which is a fully self-attentional architecture. The framework abandons the conventional methods that rely on 3D convolution and proposed an approach to classify actions by focusing on the entire video sequence. In addition, we use a lightweight hand detector to continuously sample the video only when a hand is detected in the video sequence, thus reducing the computational consumption of the system. Experiments on two human hand gesture recognition benchmark datasets show the superiority of the proposed method, compared with existing state-of-the-art methods.
목차
I. INTRODUCTION
II. METHODOLOGY
III. EXPERIMENTS
ACKNOWLEDGMENT
REFERENCES
