Content-Defined Chunking Algorithms in Data Deduplication: Performance, Trade-Offs and Future-Oriented Techniques

Authors

  • Safa Ali Abdo Hussein Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis, Pauh Putra Campus, 02600 Arau, Perlis Malaysia
  • R. Badlishah Ahmad Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis, Pauh Putra Campus, 02600 Arau, Perlis Malaysia
  • Naimah Yaakob Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis, Pauh Putra Campus, 02600 Arau, Perlis Malaysia
  • Fathey Mohammed Faculty of Engineering and Information Technology, Taiz University, Taiz 6803, Yemen
  • Abdul Ghani Khan School of Computing and Communication, Lancaster University, Bailrigg, Lancaster LA1 4WA, United Kingdom

DOI:

https://doi.org/10.37934/araset.52.1.2134

Keywords:

Data deduplication, Chunking method, Content-defined chunking, Hashing-based algorithms, Hash-less algorithms

Abstract

In the digital era, the exponential growth of data presents significant challenges for storage efficiency and processing speed. This paper reviews Content-Defined Chunking (CDC), a cornerstone in data deduplication technology, aimed at addressing these challenges. We systematically examine various CDC algorithms, categorising them into hashing-based and hash-less methodologies, and evaluating their performance in deduplication processes. Through a critical analysis of existing literature, the study identifies the balance between chunking speed and deduplication efficacy as a pivotal area for enhancement. Our findings reveal the need for innovative CDC algorithms to adapt to the evolving data landscape, proposing future research directions for improving storage and processing solutions. This work contributes to the broader understanding of data deduplication techniques, offering a pathway towards more efficient data management systems.

Downloads

Download data is not yet available.

Author Biographies

Safa Ali Abdo Hussein, Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis, Pauh Putra Campus, 02600 Arau, Perlis Malaysia

safaali@studentmail.unimap.edu.my

R. Badlishah Ahmad, Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis, Pauh Putra Campus, 02600 Arau, Perlis Malaysia

badli@unimap.edu.my

Naimah Yaakob, Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis, Pauh Putra Campus, 02600 Arau, Perlis Malaysia

naimahyaakob@unimap.edu.my

Fathey Mohammed, Faculty of Engineering and Information Technology, Taiz University, Taiz 6803, Yemen

fatheym@sunway.edu.my

Abdul Ghani Khan, School of Computing and Communication, Lancaster University, Bailrigg, Lancaster LA1 4WA, United Kingdom

razighani97@gmail.com

Published

2024-10-01

How to Cite

Abdo Hussein, S. A., Ahmad, R. B., Yaakob, N., Mohammed, F., & Khan, A. G. (2024). Content-Defined Chunking Algorithms in Data Deduplication: Performance, Trade-Offs and Future-Oriented Techniques. Journal of Advanced Research in Applied Sciences and Engineering Technology, 52(1), 21–34. https://doi.org/10.37934/araset.52.1.2134

Issue

Section

Articles

Most read articles by the same author(s)