GPU 코어에 할당된 CTA 수에 따른 GPGPU 구조의 성능 및 전력 효율성 분석

최홍준; 손동오; 김종면; 김철홍

원문정보

Analysis of Performance and Energy-efficiency for GPGPU Architecture according to Number of Cooperative Thread Arrays on GPU Cores

최홍준, 손동오, 김종면, 김철홍

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 논문지 Vol.10 No.6 2014.12 pp.46-58 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Recently, many research groups have focus on GPGPU by exploiting GPU, which is developed to execute graphics-related operations, has been focused in order to improve performance of computing systems. GPGPU architecture fully utilizes computational resources by increasing parallelism to improve performance. To achieve this, in GPGPU, the thousands of threads are grouped into thread blocks, called CTA (Cooperative Thread Arrarys) and the CTA is assigned to one GPU core, called SM (Streaming Multiprocessors). The CTA scheduling scheme to assign CTAs to SMs has influence on overall GPGPU performance, significantly. Ideal CTA scheduling scheme, which consider the characteristics of benchmarks, can assign CTAs to each SM properly, leading to performance improvement. However, current CTA scheduling scheme assigns the maximum number of CTAs to each SM, so as to improve parallelism and resource utilization. Therefore, this paper analyzes the GPGPU performance according to various number of CTAs assigned to SMs. Since GPGPU accounts for a considerable portion of total power consumption on computing system, power efficiency as well as performance should be considered to enhance the competitiveness of computing systems when designing CTA scheduling scheme. For this reason, this paper also analyzes the power consumption on GPGPU by using GPUWattch simulator. Simulation results show that current CTA scheduling scheme, which assigns the maximum number of CTAs to each SM, does not guarantee better performance. At future work, we will investigate high performance and low power CTA scheduling scheme considering the characteristics of workload. This results can provide the guideline to research the efficient CTA scheduling scheme for GPGPU.

한국어

최근에는 컴퓨터 시스템의 성능을 향상시키기 위하여 많은 연구자들이 그래픽스 관련 작업을 위해 개발된 프로세서 인 GPU를 활용하는 GPGPU에 관심을 가지고 있다. GPGPU 구조에서 성능을 향상시키기 위해서는 병렬성 증가를 통하여 연산자원을 가능한 많이 활용해야 한다. 이를 위해서, GPGPU는 수천개의 스레드들을 포함한 스레드 블록, 즉 CTA를 GPU 코어인 스트리밍 멀티프로세서에 할당한다. CTA를 스트리밍 멀티프로세서에 할당하는 CTA 스케 쥴링 기법은 GPGPU 컴퓨팅 시스템의 성능에 상당한 영향을 준다. 이상적인 CTA 스케쥴링 기법은 수행되는 벤치 마크 프로그램의 특성을 반영하여 CTA들을 적절하게 각 스트리밍 멀티프로세에 할당시킴으로써 GPGPU 성능을 향 상시킬 것이다. 하지만, 현재의 CTA 스케쥴링 기법은 GPGPU의 병렬성과 자원활용률을 증가시키기 위하여 가능한 많은 CTA를 스트리밍 멀티프로세서에 할당하고 있다. 그러므로 본 논문에서는 스트리밍 멀티프로세서에 할당되는 CTA의 숫자에 따른 GPGPU 컴퓨팅 시스템의 성능을 평가해보고자 한다. GPGPU 컴퓨팅 시스템에서 소모되는 전 력의 상당 부분을 GPGPU가 차지하고 있기 때문에 우수한 CTA 스케쥴링 기법 개발을 위해서는 GPGPU의 성능과 더불어 전력 소모량 또한 고려해야 한다. 이와 같은 이유로 본 논문에서는 GPUWattch 시뮬레이터를 사용하여 소모 전력 또한 정량적으로 분석한다. 실험결과는 스트리밍 멀티프로세에 가능한 많은 CTA를 할당하는 현재의 CTA 스케 쥴링 기법이 항상 우수한 GPGPU의 성능을 보장하지 않는다는 것을 보여준다. 우리는 실험 결과를 활용하여 향후에 응용프로그램의 특성을 고려한 고성능, 저전력의 CTA 스케쥴링 기법을 개발하고자 한다. 본 연구의 분석결과는 GPGPU 구조에 효과적인 CTA 스케쥴링 기법을 개발하는 방향 설정에 필요한 정보로 활용될 것으로 기대된다.

earticle

GPU 코어에 할당된 CTA 수에 따른 GPGPU 구조의 성능 및 전력 효율성 분석

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문