Yet another Hybrid Strategy for Auto-tuning SpMV on GPUs

Zhaohui Wang; Xiaojie Qiu; Aimin Zhang; Yichao Cheng; Yi Peng; Sun Sun

Yet another Hybrid Strategy for Auto-tuning SpMV on GPUs

원문정보

Zhaohui Wang, Xiaojie Qiu, Aimin Zhang, Yichao Cheng, Yi Peng, Sun Sun

보안공학연구지원센터(IJSEIA) International Journal of Software Engineering and Its Applications Vol.9 No.5 2015.05 pp.97-106 SCOPUS

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Sparse matrix-vector multiplication (SpMV) is a key linear algebra algorithm and is widely used in many application domains. Besides multi-core architecture, there is also extensive research focusing on accelerating SpMV on many-core Graphics Processing Units (GPUs). SpMV computations have many indirect and irregular memory accesses, and load imbalance could occur while mapping computations onto single-instruction, multiple-data (SIMD) GPUs. SpMV is highly memory bandwidth-bound, though GPUs have massive computational resources, the performance of SpMV on GPUs is still unsatisfying. In this paper, we present a new hybrid strategy for auto-tuning SpMV on GPUs. Our strategy combines the advantages of row-major storage and column-major storage. Like many other strategies, we reordered a given sparse matrix according to row lengths in decreasing order. In order to be more adaptive and efficient, we proposed a new hybrid Blocked CSR and JDS (BCJ) format based on original CSR and JDS. BCJ splits a sparse matrix into a denser part and a sparser part after reordering and uses different kernels to process the corresponding part. And we proposed corresponding auto-tuning framework to help transforming matrix and launching kernels according to the sparsity characteristics of the matrix. A CUDA implementation of BCJ outperforms the original formats significantly on a broad range of unstructured sparse matrices.

Abstract
1. Introduction
2. Background
  2.1. GPUs and CUDA Programming Model
  2.2. Formats for Sparse Matrix
3. Hybrid auto-tuning Framework
  3.1. Blocked and Padded CSR (B-CSR)
  3.2. Blocked and Padded JDS (B-JDS)
  3.3. Parameters to be Modeled
4. Experimental Results
5. Related Work
6. Conclusions
References

키워드

저자정보

Zhaohui Wang University of Science and Technology of China
Xiaojie Qiu University of Science and Technology of China
Aimin Zhang University of Science and Technology of China
Yichao Cheng University of Science and Technology of China
Yi Peng University of Science and Technology of China
Sun Sun University of Science and Technology of China

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle