earticle

논문검색

An Empirical Study for Handling Scientific Datasets

초록

영어

Since the volume of data generated by a scientific data experiment has grown exponentially, new scientific methods to analyze and organize the data are required. Hence, these methods need to be used effective infrastructure composed of computing resources that are used for pre-processing and post-processing data. The demanding requirement has led to development of methods to reduce the size of dataset and to apply a new programming model and its implementation like MapReduce. In this paper, we describe an empirical study for handling the dataset of a scientific data experiment to support data transformation, which is an essential phase to handling large-scale data in scientific data experiments. In this experiment we show a way to optimize the dataset written in netCDF by a data reduction as a sub-setting method and to process the dataset about tornado outbreak in the US by Hadoop, a MapReduce framework. These methods can be applied to pre-processing and post-processing in scientific data experiments.

목차

Abstract
 1. Introduction
 2. Related Works
  2.1. PolarGrid: Scientific Data Project
  2.2 MapReduce
 3. Scientific Data Experiment Framework
 4. Examples of Scientific Data Experiment
  4.1 Data Reduction of Dataset
  4.2 Data Transformation for MapReduce Application
 5. Conclusion
 References

저자정보

  • Yunhee Kang Baekseok University, Samsung Advanced Institute of Technology
  • Heeyoul Choi Baekseok University, Samsung Advanced Institute of Technology

참고문헌

자료제공 : 네이버학술정보

    함께 이용한 논문

      ※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

      0개의 논문이 장바구니에 담겼습니다.