이기종 빅데이터 분석을 위한 Spark 기반 join 기법

최병은; 박진영

이기종 빅데이터 분석을 위한 Spark 기반 join 기법

원문정보

Spark-based join technique for big data analysis of heterogeneous

최병은, 박진영

한국차세대컴퓨팅학회 한국차세대컴퓨팅학회 논문지 Vol.11 No.6 2015.12 pp.91-99 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

This paper studies in data virtualization, which logically integrate the distributed heterogeneous databases into a single DBMS, to discuss the implementation method of the data virtualization system for big data analysis. Depending on big data saved in the target heterogeneous DBMS tables are analytical purposes, run the query, but must implement a schema to navigate, inter wherein a large record table join processing is applied to the key. Adopting the system configuration of the Spark base through the join performance comparison test of Spark and Hive in order to achieve the goal, ace editor and tajo sql, by applying such as queries converter, an implementation of the schema browser. Thus, it was possible to ensure the technique of data virtualization system for big data analysis.

한국어

본 논문은 분산 이기종 데이터베이스들을 단일 DBMS로 논리적으로 통합하는 데이터 가상화 연구에 있어, 빅데이터(Big Data) 분석을 위한 데이터 가상화(Data Virtualization) 시스템 구현 기법에 대해 논의한다. 빅데이터를저장하고 있는 이기종 DBMS들의 테이블을 대상으로 분석 목적에 따라 쿼리(Query)를 수행하고, 스키마(Schema)를 탐색할 수 있도록 구현해야 하는데, 여기서 대규모 레코드의 테이블 간 join 처리가 관건으로 작용한다. 목표 달성을 위해 Spark와 Hive의 join 성능 비교시험을 통해 Spark 기반의 시스템 구성을 채택했고, ace editor와 tajo sql, 쿼리 변환기 등을 적용해서 스키마 브라우저를 구현했다. 이를 통해 빅데이터 분석용 데이터 가상화 시스템 기술을 확보할 수 있었다.

요약
Abstract
1. 서론
2. 관련 연구
  2.1 가상화 기술
  2.2 데이터 가상화
3. 빅데이터 분석을 위한 데이터 가상화 시스템 구현 기법
  3.1 빅데이터 분석 데이터 가상화 시스템 요구사항
  3.2 데이터 가상화 시스템 설계 및 구현
4. 연구 결과
  4.1 스키마 브라우저 및 쿼리 패널 구현 결과
  4.2 Spark을 이용한 Join 처리 결과
5. 결론 및 향후 연구
참고문헌

키워드

저자정보

최병은 Byung-Eun Choi. (주)나눔기술
박진영 Jin-Young Park. (주)나눔기술

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

0개의 논문이 장바구니에 담겼습니다.

earticle