원문정보
초록
영어
The purpose of this study is to evaluate the impact of intermediate features on FER performance. To achieve this objective, intermediate features were extracted from the input images at specific layers (FM1~FM4) of the pre-trained network (Resnet-18). These extracted intermediate features and original images were used as inputs to the vision transformer (ViT), and the FER performance was compared. As a result, when using a single image as input, using intermediate features extracted from FM2 yielded the best performance (training accuracy: 94.35%, testing accuracy: 75.51%). When using the original image as input, the training accuracy was 91.32% and the testing accuracy was 74.68%. However, when combining the original image with intermediate features as input, the best FER performance was achieved by combining the original image with FM2, FM3, and FM4 (training accuracy: 97.88%, testing accuracy: 79.21%). These results imply that incorporating intermediate features alongside the original image can lead to superior performance. The findings can be referenced and utilized when designing the preprocessing stages of a deep learning model in FER. By considering the effectiveness of using intermediate features, practitioners can make informed decisions to enhance the performance of FER systems.
목차
1. Introduction
2. Methods
2.1 Dataset
2.2 Intermediate Feature Extraction
2.3 Implementation Details
3. Results
3.1 Comparison between the original images and the extracted intermediate feature maps
3.2 Comparison of FER accuracy according to different input images
3.3 Comparison of FER accuracy for different combinations of feature maps
3.4 Comparison of FER accuracy when using both feature maps and the original image
4. Conclusion
Acknowledgement
References