원문정보
초록
영어
Hadoop–a popular open-source implementation of MapReduce is widely used for the analysis of large datasets. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. In this paper we evaluate performance of Hadoop Platform and Oracle for Distributed Parallel Processing in large datasets. For evaluation, we implement a prototype of a virtual datacenter using distributed and parallel computing technology. The purpose of this paper is to reduce datacenter implementation cost using commodity hardware and provide high performance. Hadoop is installed on a commodity Linux cluster the distributed processing of large data sets across clusters of computers using distributed and parallel computing architecture. This paper also helps to explain about some new technology and framework which are open source; that can easily utilize those technologies for our complex data analysis which resembling structured, semi structured and non-structured data. Here we tried to demonstrate a performance comparison by executing some queries between distributed parallel computing system and traditional single computing system. For the simulation of the infrastructure Hadoop cluster has been used for distributed parallel processing and Oracle 11g is used for traditional single processing system. We prepare three virtual host for Hadoop cluster and a high-end hardware for Oracle 11g.
목차
1. Introduction
2. Related Technologies and Architectures
2.1. Hardware Virtualization
3. Observations
4. Conclusion
References
