쿠쿠 필터 유사도를 적용한 다중 필터 분산 중복 제거 시스템 설계 및 구현

김영아; 김계희; 김현주; 김창근

쿠쿠 필터 유사도를 적용한 다중 필터 분산 중복 제거 시스템 설계 및 구현

원문정보

Design and Implementation of Multiple Filter Distributed Deduplication System Applying Cuckoo Filter Similarity

김영아, 김계희, 김현주, 김창근

중소기업융합학회 융합정보논문지(구 중소기업융합학회논문지) 제10권 제10호 2020.10 pp.1-8 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

The need for storage, management, and retrieval techniques for alternative data has emerged as technologies based on data generated from business activities conducted by enterprises have emerged as the key to business success in recent years. Existing big data platform systems must load a large amount of data generated in real time without delay to process unstructured data, which is an alternative data, and efficiently manage storage space by utilizing a deduplication system of different storages when redundant data occurs. In this paper, we propose a multi-layer distributed data deduplication process system using the similarity of the Cuckoo hashing filter technique considering the characteristics of big data. Similarity between virtual machines is applied as Cuckoo hash, individual storage nodes can improve performance with deduplication efficiency, and multi-layer Cuckoo filter is applied to reduce processing time. Experimental results show that the proposed method shortens the processing time by 8.9% and increases the deduplication rate by 10.3%.

한국어

최근 몇 년 동안 기업이 수행하는 비즈니스 활동에서 생성된 데이터를 기반으로 하는 기술이 비즈니스 성공의 열쇠로 부상함에 따라 대체 데이터에 대한 저장, 관리 및 검색 기술에 대한 필요성이 대두되었다. 기존 빅 데이터 플랫폼 시스템은 대체 데이터인 비정형 데이터를 처리하기 위해 실시간으로 생성된 대량의 데이터를 지체 없이 로드하고 중복 데이터 발생 시 서로 다른 스토리지의 중복 제거 시스템을 활용하여 스토리지 공간을 효율적으로 관리해야 한다. 본 논문 에서는 빅 데이터의 특성을 고려하여 쿠쿠 해싱 필터 기법의 유사도를 이용한 다중 계층 분산 데이터 중복 제거 프로세스 시스템을 제안한다. 가상 머신 간의 유사성을 쿠쿠 해시로 적용함으로써 개별 스토리지 노드는 중복 제거 효율성으로 성능 을 향상시키고 다중 레이어 쿠쿠 필터를 적용하여 처리 시간을 줄일 수 있다. 실험 결과 제안한 방법은 기존 블룸 필터를 이용한 중복 제거 기법에 의해 8.9%의 처리 시간 단축과 중복 제거율이 10.3% 높아짐을 확인하였다.

요약
Abstract
1. 서론
2. 관련연구
2.1 데이터 중복 제거(De-duplication)
2.2 쿠쿠 해싱 필터 (Cuckoo Hashing Filter)
3. 제안 모델
3.1 시스템 아키텍처
3.2 중복 데이터 배제 프로세서
3.3 클러스터링 및 중복 제거 기법
4. 실험 및 평가
4.1 실험 환경
4.2 결과 및 분석
5. 결론
REFERENCES

키워드

저자정보

김영아 Yeong-A Kim. 엔코아 데이터 HRD 본부 연구원
김계희 Gea-Hee Kim. 경남과학기술대학교 컴퓨터공학과 강사
김현주 Hyun-Ju Kim. 경남과학기술대학교 컴퓨터공학과 교수
김창근 Chang-Geun Kim. 경남과학기술대학교 컴퓨터공학과 교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle