earticle

논문검색

Optimization of GEMV on Intel AVX Processor

초록

영어

To improve the performance of BLAS 2 GEMV subroutine under the latest instruction set, Intel AVX, this paper presents a new approach to analyze the new generation instruction set and enhance the efficiency of current data-oriented math subroutines. The whole optimizing process involves memory access optimization, SIMD optimization and parallel optimization. Also, this paper shows the comparison between the traditional SSE instruction set and the AVX instruction set. Experiments show that the optimized GEMV function has obtained considerable increase on performance. Compared with the Intel MKL, GotoBLAS, ATLAS, this optimized GEMV exceeds these BLAS implementations from 5% to 10%.

목차

Abstract
 1. Introduction
 2. Analysis of GEMV Subroutine
  2.1. BLAS Level 2 Subroutines
  2.2. Analysis of GEMV Implementation
 3. Optimization
  3.1. Hardware Platform
  3.2. Memory Access Optimization
  3.3. SIMD Optimization for GEMV Kernel
  3.4. Parallel Optimization
 4. Performance Evaluation
  4.1. Experiment Platform
  4.2. Single-Core Result
  4.3. Multi-Core Result
 5. Related Works
 6. Summary
 Acknowledgment
 References

저자정보

  • Jun Liang Training Center of Electronic Information, Beijing Union University, Beijing, China
  • Yunquan Zhang State Key Lab of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China

참고문헌

자료제공 : 네이버학술정보

    ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

    0개의 논문이 장바구니에 담겼습니다.