SystemC 기반 스토리지와 버퍼 및 딥러닝 가속기 시뮬레이터 시스템 구현

이재빈; 김건명; 김진영; 임승호

원문정보

Implementation of SystemC-based Deep Learning Accelerator Simulator with Storage Device and Buffer

이재빈, 김건명, 김진영, 임승호

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 논문지 Vol.17 No.6 2021.12 pp.7-17 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Recently many researches are being conducted to perform data distributed processing with embedded edge devices in IoT systems, and artificial intelligence inference is one of them. Many studies are underway at the software or hardware level to perform artificial intelligence operations in embedded systems. In particular, the hardware-supported deep learning operations, such as GPU, in embedded system are limited, so a hardware deep learning accelerator is considered to be added in the architecture. Since such a deep learning accelerator performs a lot of data storage and movement and iterative parallel operation internally to perform complex neural network computation, it is required to analyze and optimize a precise internal buffer and data movement path management for efficient design of deep learning accelerator. In this paper, to model and analyze a deep learning accelerator in a virtual platform based on RISC-V, a deep learning accelerator is designed and implemented at the ESL level based on SystemC as well as main memory and NAND flash controller, then the data movement with storage and buffering effect were analyzed and examined on the developed deep learning accelerator. Using the implemented deep learning accelerator simulator, the usability of the internal buffer of the deep learning accelerator and the data movement amount and buffering effect according to the deep learning operation can be analyzed.

한국어

최근 IoT 시스템에서 엣지 디바이스를 이용한 데이터 저장 및 분산 처리 연산을 수행하기 위해서 다양한 연구가 진 행되고 있다. 인공지능 추론 연산도 그중 하나로써 임베디드 장치에서 인공지능 연산을 수행하기 위해서 소프트웨어 또는 하드웨어 레벨에서 많은 연구가 진행 중이다. 특히, 하드웨어 레벨에서 임베디드 프로세서나 임베디드 GPU를 이용한 연산 처리는 한계가 있어서 독립적인 하드웨어 딥러닝 가속기를 추가하는 추세이다. 이러한 딥러닝 가속기는 복잡한 신경망 연산을 하드웨어에서 독립적으로 수행하기 위해서 많은 데이터 저장 및 이동이 필요하며, 내부적으로 는 반복 병렬 연산을 수행하기 때문에 내부 저장 시스템 및 버퍼 구조와 데이터 이동 경로에 대한 분석과 최적화가 필요하다. 딥러닝 가속기의 데이터 사용성에 대한 분석을 통하여 딥러닝 가속기의 최적화 설계를 돕기 위해서, 본 논문에서는 RISC-V 기반 가상 플랫폼에서 SystemC 기반으로 ESL 수준에서 딥러닝 가속기와 낸드 플래시 메모 리 시스템으로 구성된 가상 엣지 디바이스 플랫폼을 제공하고, RISC-V 기반 가상 플랫폼에서 딥러닝 가속기를 이 용한 응용 프로그램을 실행하고 분석하는 환경을 제공하였다. 구현한 딥러닝 가속기 시뮬레이터를 이용해서 딥러닝 가속기의 저장장치 및 내부 버퍼의 사용성과 딥러닝 연산에 따른 데이터 이동량 및 버퍼링 효과를 분석할 수 있는 기반을 마련하였다.

earticle

SystemC 기반 스토리지와 버퍼 및 딥러닝 가속기 시뮬레이터 시스템 구현

원문정보

초록

목차

키워드

저자정보

참고문헌

함께 이용한 논문