earticle

논문검색

Adaptive Preshuffling in Hadoop Clusters

초록

영어

MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop–an open-source implementation of MapReduce is widely used for short jobs requiring low response time. In this paper, we proposed a new preshuffling strategy in Hadoop to reduce high network loads imposed by shuffle-intensive applications. Designing new shuffling strategies is very appealing for Hadoop clusters where network interconnects are performance bottleneck when the clusters are shared among a large number of applications. The network interconnects are likely to become scarce resource when many shuffle-intensive applications are sharing a Hadoop cluster. We implemented the push model along with the preshuffling scheme in the Hadoop system, where the 2-stage pipeline was incorporated with the preshuffling scheme. We implemented the push model and a pipeline along with the preshuffling scheme in the Hadoop system. Using two Hadoop benchmarks running on the 10-node cluster, we conducted experiments to show that preshuffling-enabled Hadoop clusters are faster than native Hadoop clusters. For example, the push model and the preshuffling scheme powered by the 2-stage pipeline can shorten the execution times of the WordCount and Sort Hadoop applications by an average of 10% and 14%, respectively.

목차

Abstract
 1. Introduction
  1.1. Shuffle-Intensive Hadoop Applications
  1.2. Alleviate Network Load in the Shuffle Phase
  1.3. Benefits and Challenges of the Preshuffling Scheme
  1.4. Organization
 2. Background
  2.1. MapReduce Overview
  2.2. Hadoop Distributed File System
 3. Design Issues
  3.1. Push Model of the Shuffle Phase
  3.2. A Pipeline in Preshuffling
  3.3. In-memory Buffer
 4. Implementation
 5. Evaluation Performance
  5.1. Experimental Environment
  5.2. In Cluster
  5.3. Large Blocks vs. Small Blocks
 6. Related work
 7. Conclusion
 Acknowledgments
 References

저자정보

  • Jiong Xie Inner Mongolia electric power information and communication center, China, Department of Computer Science and Software Engineering, Auburn University
  • FanJun Meng Computer & Information Engineering College, Inner Mongolia Normal University
  • HaiLong Wang Computer & Information Engineering College, Inner Mongolia Normal University
  • JinHong Cheng Inner Mongolia electric power information and communication center, China
  • Hongfang Pan Inner Mongolia electric power information and communication center, China
  • Xiao Qin Department of Computer Science and Software Engineering, Auburn University

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.