원문정보
초록
영어
To improve the performance of BLAS 2 GEMV subroutine under the latest instruction set, Intel AVX, this paper presents a new approach to analyze the new generation instruction set and enhance the efficiency of current data-oriented math subroutines. The whole optimizing process involves memory access optimization, SIMD optimization and parallel optimization. Also, this paper shows the comparison between the traditional SSE instruction set and the AVX instruction set. Experiments show that the optimized GEMV function has obtained considerable increase on performance. Compared with the Intel MKL, GotoBLAS, ATLAS, this optimized GEMV exceeds these BLAS implementations from 5% to 10%.
목차
1. Introduction
2. Analysis of GEMV Subroutine
2.1. BLAS Level 2 Subroutines
2.2. Analysis of GEMV Implementation
3. Optimization
3.1. Hardware Platform
3.2. Memory Access Optimization
3.3. SIMD Optimization for GEMV Kernel
3.4. Parallel Optimization
4. Performance Evaluation
4.1. Experiment Platform
4.2. Single-Core Result
4.3. Multi-Core Result
5. Related Works
6. Summary
Acknowledgment
References