원문정보
Implementing Cost-Effective CNNs through INT8 Quantization Aware Training on Embedded Systems
초록
영어
The rising popularity of intelligent embedded systems, coupled with the substantial computational and memory requirements of convolutional neural networks (CNNs), necessitates cost-effective on-device model inference. Various post-optimization techniques are used to reduce the model size and precision bits. However, these techniques often result in a significant reduction in performance. To solve these issues, we propose a quantization-aware training (QAT) strategy for optimizing the CNNs to low-bit integers, resulting in faster inference and less memory utilization. We inject fake quantization modules into the original architecture, train the model in complete precision, and then convert the model to an 8-bit integer (INT8). The resultant QAT model performs all the computation of the convolution layers, activation layers, and batch-normalization in INT8. Our method reduces the size of ResNet50 and ResNet101 by a factor of 3.9x and improves the inference speed by more than 2x. We utilize the CIFAR-10 and CIFAR-100 datasets to test the performance of the models.
목차
1. Introduction
2. Methods
2.1. Dataset
2.2. Experiment Setup
3. Experiment result
3.1. Analysis and Future Refinement
4. Conclusions
Acknowledgement
References
