원문정보
초록
영어
A very complex task in deep learning such as image classification must be solved with the help of neural networks and activation functions. The backpropagation algorithm advances backward from the output layer towards the input layer, the gradients often get smaller and smaller and approach zero which eventually leaves the weights of the initial or lower layers nearly unchanged, as a result, the gradient descent never converges to the optimum. We propose a two-factor non-saturating activation functions known as Bea-Mish for machine learning applications in deep neural networks. Our method uses two factors, beta (β) and alpha (α), to normalize the area below the boundary in the Mish activation function and we regard these elements as Bea. Bea-Mish provide a clear understanding of the behaviors and conditions governing this regularization term can lead to a more principled approach for constructing better performing activation functions. We evaluate Bea-Mish results against Mish and Swish activation functions in various models and data sets. Empirical results show that our approach (Bea-Mish) outperforms native Mish using SqueezeNet backbone with an average precision (AP50val) of 2.51% in CIFAR-10 and top-1accuracy in ResNet-50 on ImageNet-1k. shows an improvement of 1.20%.
목차
1. Introduction
2. Theory
3. Experiment Settings
4. Results and Discussion
4.1 CIFAR-10 Dataset on Various Baseline Activation Functions
4.2 CIFAR-10 Dataset on Various Standard Neural Network Architectures
4.3 ImageNet-1k Dataset on Various Standard Neural Network Architectures
4.4 Ablation Study of α and β on CIFAR-10
5. Conclusion
Acknowledgement
References