Adaptive Preshuffling in Hadoop Clusters

Jiong Xie; FanJun Meng; HaiLong Wang; JinHong Cheng; Hongfang Pan; Xiao Qin

Adaptive Preshuffling in Hadoop Clusters

원문정보

Jiong Xie, FanJun Meng, HaiLong Wang, JinHong Cheng, Hongfang Pan, Xiao Qin

보안공학연구지원센터(IJGDC) International Journal of Grid and Distributed Computing Vol.6 No.2 2013.04 pp.79-92

초록

영어

MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop–an open-source implementation of MapReduce is widely used for short jobs requiring low response time. In this paper, we proposed a new preshuffling strategy in Hadoop to reduce high network loads imposed by shuffle-intensive applications. Designing new shuffling strategies is very appealing for Hadoop clusters where network interconnects are performance bottleneck when the clusters are shared among a large number of applications. The network interconnects are likely to become scarce resource when many shuffle-intensive applications are sharing a Hadoop cluster. We implemented the push model along with the preshuffling scheme in the Hadoop system, where the 2-stage pipeline was incorporated with the preshuffling scheme. We implemented the push model and a pipeline along with the preshuffling scheme in the Hadoop system. Using two Hadoop benchmarks running on the 10-node cluster, we conducted experiments to show that preshuffling-enabled Hadoop clusters are faster than native Hadoop clusters. For example, the push model and the preshuffling scheme powered by the 2-stage pipeline can shorten the execution times of the WordCount and Sort Hadoop applications by an average of 10% and 14%, respectively.

Abstract
1. Introduction
  1.1. Shuffle-Intensive Hadoop Applications
  1.2. Alleviate Network Load in the Shuffle Phase
  1.3. Benefits and Challenges of the Preshuffling Scheme
  1.4. Organization
2. Background
  2.1. MapReduce Overview
  2.2. Hadoop Distributed File System
3. Design Issues
  3.1. Push Model of the Shuffle Phase
  3.2. A Pipeline in Preshuffling
  3.3. In-memory Buffer
4. Implementation
5. Evaluation Performance
  5.1. Experimental Environment
  5.2. In Cluster
  5.3. Large Blocks vs. Small Blocks
6. Related work
7. Conclusion
Acknowledgments
References

키워드

저자정보

Jiong Xie Inner Mongolia electric power information and communication center, China, Department of Computer Science and Software Engineering, Auburn University
FanJun Meng Computer & Information Engineering College, Inner Mongolia Normal University
HaiLong Wang Computer & Information Engineering College, Inner Mongolia Normal University
JinHong Cheng Inner Mongolia electric power information and communication center, China
Hongfang Pan Inner Mongolia electric power information and communication center, China
Xiao Qin Department of Computer Science and Software Engineering, Auburn University

참고문헌

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

0개의 논문이 장바구니에 담겼습니다.

earticle