Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

Daegun Yoon; Sangyoon Oh

Session Ⅱ : Artificial Intelligence

Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

원문정보

Daegun Yoon, Sangyoon Oh

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 학술대회 The 8th International Conference on Next Generation Computing 2022 2022.10 pp.72-75

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting to select meaningful gradients such as Top-k gradient sparsification (Top-k SGD). However, Top-k SGD has a limit to increase the speed up overall training performance because gradient sorting is significantly inefficient on GPUs. In this paper, we conduct experiments that show the inefficiency of Top-k SGD and provide the insight of the low performance. Based on observations from our empirical analysis, we plan to yield a high performance gradient sparsification method as a future work.

키워드

저자정보

Daegun Yoon Department of Artificial Intelligence Ajou University
Sangyoon Oh Department of Artificial Intelligence Ajou University

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

0개의 논문이 장바구니에 담겼습니다.

earticle

Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문