스파크 상에서의 대용량 시공간 데이터 조인 질의 처리 개발

김용기

스파크 상에서의 대용량 시공간 데이터 조인 질의 처리 개발

원문정보

Development of Large-scale Spatio-temporal Data Join Query Processing Algorithm on Spark

김용기

국제차세대융합기술학회 차세대융합기술학회논문지 제5권 4호 2021.08 pp.516-522 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Apache Spark is an open source, distributed, in-memory computing framework and architecture. Because it runs on a distributed system cluster with data parallelism and fault tolerance through Apache Spark, it can be applied and utilized in various fields such as agriculture, information and communication industries. Meanwhile, in the era of big data, large amount of spatio-temporal data is being generated. However, Apache Spark cannot efficiently process join operation because it does not support join operation that requires many computations in a distributed computing environment. Therefore, in this paper, we proposed an join query processing algorithm, i.e., withindistance and contain join, based on grid partitioning technique using large-scale spatio-temporal data. As a result of performance evaluation, our algorithm shows 20% better performance than the existing algorithm in terms of query processing time.

한국어

아파치 스파크(Apache Spark)는 오픈 소스, 분산, 인메모리 컴퓨팅 프레임워크 및 아키텍처이다. 아파치 스파크를 통해 데이터 병렬 및 내결함성을 갖춘 분산 시스템 클러스터에서 실행되기 때문에 농업, 정보통신 산업 등 다양한 분야에 적용 및 활용할 수 있다. 한편, 빅데이터 시대를 맞아 많은 시공간 데이터가 발생하고 있다. 그러나, 아파치 스파크를 통한 분산 컴퓨팅 환경에서 많은 연산이 필요한 조인과 같은 연산을 제공하고 있지 않기 때문에 효율적으로 처리하지 못한다. 따라서, 본 논문에서는 대용량 시공간 데이터를 이용하여 그리드 분할 기법에 근거한 withindistance, contain 조인 질의처리 알고리즘을 제안한다. 성능평가 결과, 제안한 알고리즘이 기존 알고리즘보다 약 20%의 우수한 성능을 보인다.

키워드

저자정보

김용기 Yong-Ki Kim. 전주비전대학교 IT융합시스템과 교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle