A Cluster-Based Data Deduplication Technology

The paper presents a cluster-based data deduplication technology that addresses the challenges of false positives in bloom filter systems, which can lead to inefficient data storage and retrieval. It proposes a new method that utilizes clustering to reduce the time required to exclude false positives by only checking one index table instead of multiple tables. Experimental results demonstrate improved performance in terms of backup speed and efficiency in managing redundant data.

Uploaded by

rodrego sandino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views5 pages

A Cluster-Based Data Deduplication Technology

Uploaded by

rodrego sandino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2014 Second International Symposium on Computing and Networking

A cluster-based data deduplication technology

Chuan-Mu Tseng Jheng-Rong Ciou, Tzong-Jye Liu
Department of Applied Digital Media Department of Information Engineering and Computer
Jeh-teh Junior College of Medicine Nursing and Science
Management Feng Chia University
Miaoli, Taiwan, R. O. C. Taichung, Taiwan, R. O. C.
cmtjason@jente.edu.tw cioujimmy@gmail.com, tjliu@fcu.edu.tw

Abstract—Data deduplication technology usually identifies of the index table will be up to 100TB. Memory cannot
redundant data quickly and correctly by using bloom filter cache the index table directly if the size of the index table is
technology. A bloom filter can determine whether there is larger than the memory. This was called a “disk bottleneck
redundant data. However, there are the presences of false problem [3][7].”
positives. In order to avoid false positives, we need to compare
a new chunk with chunks that have been stored. In order to Previous research [3] has resolved the disk bottleneck
reduce the time to exclude the bloom filter false positives, problem through many index tables of small size. However,
current research uses many small size index tables to store there is a drawback. When the system excludes the bloom
chunk ID. However, the target chunk ID only stores in one filter false positives, it must check all of the index tables
index table. Searching for the target chunk ID at another index
table uselessly took a great deal of time. In this paper, we
until the target chunk ID is found. The target chunk ID is
cluster the stored chunks to reduce the time of excluding the only stored in one index table. Therefore, searching for the
false positive problem induced by bloom filter. target chunk ID in another index table uselessly took a great
deal of time.
Keywords: Bloom filter; cluster; data deduplication In order to reduce the time to exclude the bloom filter’s
I. INTRODUCTION false positives, we propose a new excluding bloom filter’s
false positives module. Our method can choose one index
With the development of cloud storage, increasingly table to exclude the bloom filter’s false positives. After
more users store data in the cloud. More than 57% of virtual checking this index table, we can understand whether
data is produced annually[1]. From the cloud provider’s excluding the bloom filter will have false positives.
perspective, efficient use of limited storage space can bring
about a great deal of interest. Therefore, data deduplication The rest of this paper is organized as follows. Section II
technology is proposed, and it compares the uploaded introduces the background about data deduplication and the
chunks as well as those saved ones. Avoiding repeatedly to bloom filter. Section III focuses on our proposed method.
store the same chunk will reduce redundant data, and Section IV presents the results of our experiment. Lastly,
therefore, the limited space can be used to provide more conclusions are presented in Section V.
services for users.
II. RELATED WORKS
In the current deduplication technology, the main cost of The main purpose of deduplication technology is to
the system is to query redundant chunks, and the overhead effectively reduce duplicate data and to increase the
will increase with the amount of data. Bloom filter available storage space. Typical data deduplication system
technology[2] was used to reduce the cost of query[3][4]. can be divided into two parts: data segmentation and
Although the bloom filter can quickly find the redundant duplication data comparison. The system will segment files
data, it has the presence of false positives. The system may into chunks and compare those chunks to find duplicate
judge unique data as redundant data. When a user data. In order to enhance the comparison efficiency, the
downloads the file, they may miss a part of the chunks. This system will use the hash function (ex: SHA-1, MD5) to
will cause the file to be irrecoverable. Such a result will calculate the chunk ID and will determine whether those
make users not want to use the cloud storage service. chunks can be uploaded according to the comparison.
In order to avoid bloom filter’s false positives, [4] In order to speed up the comparison time,
proposed that after the judgment of the bloom filter, we [8][9][10][11][12][13] used the index table to store chunk
should compare the new chunk and all saved chunks. In IDs, through the comparison to find out redundant data from
order to accelerate the comparison speed, the authors in [4] the index table. As the amount of data grows, the index table
used an index table for storage chunk identifications by becomes larger. If the size of the index table is larger than
comparing index tables to find out redundant data. the memory, this will cause a ”disk bottleneck problem
However, the index table size is proportional to the amount [3][7].” The concept of the hierarchical index table was used
of data already stored[5][6]. The ratio is about 1:100. In to prevent this result[7][8][10][11][12][13]. The hierarchical
other words, when the system stored 10PB of data, the size index table used few chunk IDs to represent files and to

978-1-4799-4152-0/14 $31.00 © 2014 IEEE 226

DOI 10.1109/CANDAR.2014.22
store them in memory. The other chunk IDs were stored in cloud services have to completely rule out false positives
the disk. The system compared the representative chunk IDs that can make the user feel at ease about uploading files.
in memory, and then according to the result, linked the next
(1) Limit False Positives:
layer to compare the entire chunk IDs. This method could
There are two methods to limit false positives. The first
reduce the disk bottleneck problems, but the index table size
one is to enlarge the capacity of bloom filters. More
would still increase with the growing amount of data.
elements can be accommodated without increasing the false
Therefore, this method could reduce the probability of the
positives. The second one is to increase the number of
disk bottleneck problems, but could not eradicate disk
bloom filters. As a result, all of the bloom filters will be
bottlenecks.
reconstructed with lower false positives.
The bloom filter [2] and the index table [3][4][5][9][14]
(2) Exclude False Positives
were used in the deduplication system. Already having
In order to solve false positives, previous research [4]
stored data as a collection, users uploaded chunks as
proposed that if the bloom filter determined this chunk was
elements. Through the bloom filter, the system could
stored, this was compared with all of the already stored
quickly check the element that exists in the collection.
chunks. However, using this way to exclude false positives
A typical bloom filter consists of the bit array and the could still result in a disk bottleneck problem. A method
hash function composition. In the initialization step, all the that solves false positives and disk bottleneck problems was
values of the bit arrays are 0. When a user uploads files, the proposed[3]. The system used amounts of a fixed-sized
system will calculate the identification, File ID, and will index table to store chunk IDs. The index table, called a
send this to the bloom filter. The file ID will be mapped by “tablet”, is fixed to 1.5 MB. It could effectively prevent disk
the number of different hash functions to the bit array. And bottleneck problems. A tablet was a B+ tree based structure.
then the location will be revised to 1. When the system It maintained the list of tablets in least recently used (LRU)
would like to check whether a file has been saved, the file manner. When the chunk ID was found in a certain tablet,
IDs are not only calculated but also mapped to the bit array system moved the respective tablet to the head of a linked
by using hash functions, and then the system observes the list. If we did not find the target chunk ID at a current tablet,
location of a bit array. If one of the values is equal to 0, the we would check the next tablet until the target chunk ID was
file is new. If all of the values are equal to 1, the file is found or the whole tablet was searched.
saved. In this way, the system can quickly find the saved
Although the solution in [3] could solve disk bottleneck
files.
problems, while excluding the false positives, the system
However, the bloom filter has the presence of false had to check all of the already stored chunks until the target
positives due to the collision of hash function. When the was found. When false positives occurred, all of the tablets
bloom filter stored n files, there are totally k hash functions did not store this chunk ID. The system would check all of
and the size of the bit array is m. Bloom filter false positives the tablets. As the amount of data grows, the number of
will be equal to (1-(1-1/m)kn)k(1-e-kn/m)k[3]. tablets becomes larger. It would waste a lot of time in
searching for the chunk ID.
The more files are saved, the more positions with values
equal to 1 there are. When the new file is mapped to the bit III. PROPOSED METHOD
array by using hash functions, the probability of mapping to
the location whose value is equal to 1 becomes larger. A. Typical data deduplication
Therefore, that system may identify unique data as Fig. 1 is the basic framework of the data deduplication
redundant ones. When a user downloads the file, one part of system that uses the bloom filter. When a user uploads a
the file may be missed. This will cause the file to be file, it first segments the file into chunks, and then calculates
irrecoverable. the chunk ID by using the hash function. In order to
For the study of false positives of bloom filter, there are determine whether to save the chunk, the system will detect
two directions on the topic. (1) Limit False Positives: limit redundancy through bloom filter modules. If the chunk is
the false positives within an acceptable range. The not saved, the system will ask the user to upload the chunk.
advantage of this approach is that it does not take the time to On the contrary, if the chunk is saved, the system will
exclude bloom filter false positives. However, it is not compare all of the stored chunks. The system excludes the
suitable to act on the cloud storage system. If users upload false positives of the bloom filter by using the excluding
important files, the bloom filter false positives will cause false positives module, and then decides whether the chunk
them to be irrecoverable and to result in a lot of problems should be saved.
for the user. (2) Exclude False Positives: compare all of the The solutions of previous research [3][4] would
already stored chunks. This method will take a great deal of compare the uploaded chunk with all of the stored chunks to
time to search for the already stored chunks. However, exclude the false positives of the bloom filter. The means of
managing the index table affects the running time to exclude

227
false positives. As the number of data grows, the running
time to exclude false positives becomes longer. It therefore
wastes a lot of time in searching for chunk IDs.

Fig.2. Mapping table

When the excluding false positives module excludes

the bloom filter false positives, the system uses formula (1)
to find the cluster ID. This is then compared with the
uploaded chunk ID, and the chunk ID is stored in a target
cluster to exclude false positives, as shown in Fig. 3.
The x value of formula (1) will affect the number of
chunk IDs stored in a cluster. The maximum number of
chunk IDs stored in a cluster will be 2160-x. The experimental
results will be provided in Section IV.B.

Fig.1. The data deduplication system framework

B. The proposed method
In order to reduce the running time, the proposed
solution replaces the excluding false positives module in
Fig. 1 by using the clustering module. The files will be Fig.3. Clustering module
segmented, and the system calculates the chunk ID through
the chunking module. Hence, the bloom filter module IV. EXPERIMENT
detects whether the chunk is saved. If the result shows that We implement a data deduplication system and examine
the chunk is already stored, the system will use the the performances of the techniques developed in Section III.
clustering module to exclude false positives. Our clustering Before backing up the file, the chunking module is used to
module only needs to check one index table and determines segment the file into chunks and to calculate the chunk IDs.
whether this chunk is stored. Therefore, we can reduce the After this, the system sends the index file into bloom filter
running time to exclude false positives. module. This module will determine whether the chunk is
saved. According to the result of bloom filter module, the
C. The clustering module
clustering module is implemented to exclude the false
The clustering module uses chunk ID to perform positives. Notice that we only save the chunk IDs, not
clustering according to formula (1). complete chunks.

Cluster ID = Chunk ID mod 2x (1) The following two designed experiments analyze the
performance of our system.
Every chunk finds its cluster ID by using formula (1).
As shown in Fig. 2, we use a mapping table to record cluster 1. Effect of the x value
IDs and an index table to record chunk IDs. Those chunks 2. The time to exclude bloom filter’s false positives
with the same cluster ID will be classified into the same
cluster. Because all of the chunk IDs in the same cluster We use 8KB fixed-size chunk as well as seven different
have identical cluster IDs, the system only needs to save a kinds of hash functions, and the length of bit array is
160-x bits chunk ID. Two chunks are stored in different 131072 [15]. The test data set is actual multimedia files and
clusters, meaning that it is impossible for these chunks to be Linux kernel files as shown in Table 1. Our experiment is
redundant. Hence, the system only needs to check one index performed on an I7-3770 (Intel Core 3.40 GHz), 2TB hard
table to determine whether this chunk is saved. disk drive, and an 8GB RAM, which runs on a 64bits
WINDOWS 7 operating system.

228
Table 1 Testing data set
File type File numbers File size Chunk numbers
Multimedia 7 5.25GB 5638963
Linux kernel 8 434MB 574013

A. Effect of the x value

The purpose of this experiment is to find the suitable x
value of formula (1) to improve the system performance.
We have implemented the clustering module to use five
different x values and to back up the test datasets of Table 1.
Fig.4. Performance of clustering module (Linux kernel)
As shown in Fig. 4 and 5, the x value affects the system
performance. The cluster number increases as the x value
increases and the cluster size will decrease as the x value
increases. For the clustering module, both the cluster size
and the number of clusters will affect the performance. At
low x values, more chunk IDs are stored in clusters. That
needs more time to exclude false positives. With the
increase of x value, it will generate a lot of clusters. If there
are a lot of clusters saved in the mapping table, it will cause
the performance decreased due to more time to locate the
target cluster.
As shown in Tables 2 and 3, whether there is the large
size of clusters or the large number of clusters, these two
Fig.5. Performance of clustering module (Multimedia)
will decrease the performance of backup. When the balance
between the cluster size and cluster number, the system B. The time to exclude bloom filter’s false positives
performance is optimized. When x is equal to 16, the system The purpose of this experiment is comparing the
can be at the best backup speed. It greatly reduces the time execution times of excluding false positives between our
to exclude false positives. clustering module and LRU-based index partition [3]. In the
experiment, we backup the test dataset, and observe the
Table 2 Multimedia file running time. According to the result of the previous
x Number of Average number of Backup speed experiment, the value of x will be set to 16. The tablet size
Clusters chunk IDs in each (MB/sec) of LRU-based index partition will be set to 1.5MB.
cluster
8 256 22027.2 269.2 As shown in Fig. 6, the clustering module significantly
reduces the execution time. The larger the test files are, the
12 4096 1376.7 271.3
more obvious the affect is. As shown in Tables 4 and 5,
16 65536 86 279.3 when the backup files become larger, a lot of index tables
20 1043731 5.4 269.6 are produced by the LRU-based index partition module. The
number of the index tables, needing to check to exclude
24 4788784 1.1 243.5
false positives, is increased. During the backup of Linux
kernel files, the bloom filter module considers 207717
Table 3 Linux kernel file chunks have been saved. When the system excludes the
x Number of Average number of Backup speed false positives, it will generate 2107052 queries in the non-
Clusters chunk IDs in each (MB/sec) target index table. An average of 11.1 queries excludes false
cluster
8 256 2242.2 217.3
positives.
12 4096 140.1 217.3 During the backup of multimedia files, the bloom filter
module considers 5250296 chunks have been saved. When
16 65526 8.8 219.1
the system excludes the false positives, it will generate
20 442423 1.3 204.5 392528094 queries at the non-target index table. That will
24 564461 1 198.7 cause the performance decrease. An average of 75.8 queries
excludes false positives.

229
Our approach greatly reduces the time to exclude false IEEE Transactions on Computers, vol. 60, no. 8, Jun.2011,
positives because the clustering module omits the query in pp. 824-840, doi: 10.1109/TC.2010.263.
the non-target index table and only needs to search in one [4] D. Meister and A. Brinkman, "Dedupv1: improving
index table. deduplication throughput using solid state drives (SSD),"
IEEE 26th Symposium on Mass Storage Systems and
Technologies, May.2010, pp.1-6,
doi:10.1109/MSST.2010.5496992.
[5] W. Jiansheng, J. Hong, Z. Ke, and F. Dan, "Efficiently
representing membershipfor variable large data sets," IEEE
Transactions on Parallel and Distributed Systems, vol. 25,
no. 4, Apr. 2014, pp. 960-970, doi:10.1109/TPDS.2013.66.
[6] Z. Zhike, J. Zejun, L. Zhiqiang, and P. Chengzhang, "LHS:
a novel method of information retrieval avoiding an index
using linear hashing with key groups in deduplication,"
International Conference on Machine Learning and
Cybernetics , Jul.2012, pp. 1312-1318, doi:
10.1109/ICMLC.2012.6359555.
Fig.6. Performance of excluding false positives [7] M. Lillibridge et al., "Sparse indexing: large scale, inline
deduplication using sampling and locality," in Proceedings
of the Eighth USENIX Conference on File and Storage
Table 4 Backup speed of excluding false positives Technologies, Dec. 2009, pp. 111-123.
Backup speed(MB/sec) Clustering LRU-based
index partition
[8] Y. Tianming et al., "3DNBS: a data de-duplication disk-
Linux kernel 219.1 97.2
based network backup system," IEEE International
Conference on Networking, Architecture, and Storage,
Multimedia 279.3 8.2
Jul.2009, pp. 9-11, doi: 10.1109/NAS.2009.56.
[9] H. Qinlu, L. Zhanhuai, and Z. Xiao, "Data deduplication
Table 5 Comparison of average query time
techniques," International Conference on Future Information
Average query time Clustering LRU-based
Technology and Management Engineering, vol. 1, Oct.2010,
index partition
pp. 430-433, doi: 10.1109/FITME.2010.5656539.
Linux kernel 1 11.1
Multimedia 1 75.8 [10] W. Can, Q. Guang Zhi, Y. Lei, and W. Juan, "A Fast
Duplicate Chunk Identifying Method Based on Hierarchical
Indexing Structure," International Conference on Industrial
V. CONCLUSION
Control and Electronics Engineering, Aug.2012, pp. 624-
We apply a clustering module to managing chunk IDs. 627,doi: 10.1109/ICICEE.2012.169.
The chunk IDs will be stored separately in different clusters [11] Z. Zhike, J. Zejun, L. Zhiqiang, and P. Chengzhang,
by using formula (1). While excluding the false positives, "Extreme Binning: Scalable, parallel deduplication for
the system knows whether the chunk is saved by a query of chunk-based file backup," IEEE International Symposium
one index table. on Modeling, Analysis & Simulation of Computer and
Telecommunication Systems, Sep.2009, pp. 1-9, doi:
The proposed method excludes bloom filter’s false 10.1109/MASCOT.2009.5366623.
positives that do not need to check all the index tables. It [12] Z. Zhike, J. Zejun, L. Zhiqiang, and P. Chengzhang, "LHS:
only needs to query one index table, effectively reducing the A novel method of information retrieval avoiding an index
time to exclude the false positives. using linear hashing with key groups in deduplication,"
International Conference on Machine Learning and
ACKNOWLEDGMENT Cybernetics, vol. 4, Jul.2012, pp. 1312-1318, doi:
10.1109/MASCOT.2009.5366623.
Part of this study was supported by Ministry of Science [13] F. Yinjin, J. Hong, X. Nong, T. Lei, and L. Fang, "AA-
and Technology of the Republic of China under the Contract Dedupe: An Application-Aware Source Deduplication
MOST 103-2221-E-035-056. Approach for Cloud Backup Services In The Personal
Computing Environment," IEEE International Conference
REFERENCE on Cluster Computing, Sep.2011 ,pp. 112-120, doi:
[1] W. Can, Q. Guang Zhi, C. Feng Sheng, and P. Jing, 10.1109/CLUSTER.2011.20.
"Deduplication oriented backup data encryption method," [14] L. Guanlin, N. Jin Young, and D. David Hung-Chang,
Journal of Computer Applications, vol. 30, no. 7, Jul.2010, "BloomStore: Bloom-Filter based memory-efficient key-
pp. 1764-1766, doi: 10.3724/SP.J.1087.2010.01763. value store for indexing of data deduplication on flash,"
[2] B. H. Bloom, "Space/time trade-offs in hash coding with IEEE 28th Symposium on Mass Storage Systems and
allowable errors," Communications of the ACM, vol. 13, no. Technologies, Apr.2012, pp. 1-11, doi:
7, Jul.1970, pp. 422-426, doi: 10.1145/362686.362692. 10.1109/MSST.2012.6232390.
[3] M. Jaehong, Y. Daeyoung, and W. Youjip, "Efficient [15] Timdoug, “timdoug.com” [Online]. Available:Aug.2014.
deduplication techniques for modern backup operation," http://www.timdoug.com/etc/bloom_filter.c

230

Efficient Deduplication Techniques For Modern Backup Operation - 2011
No ratings yet
Efficient Deduplication Techniques For Modern Backup Operation - 2011
17 pages
Unit 2 and 3 (2 Part)
No ratings yet
Unit 2 and 3 (2 Part)
9 pages
Data Routing Strategy in Cloud Environment (Autorecovered)
No ratings yet
Data Routing Strategy in Cloud Environment (Autorecovered)
5 pages
A Survey: Enhanced Block Level Message Locked Encryption For Data Deduplication
No ratings yet
A Survey: Enhanced Block Level Message Locked Encryption For Data Deduplication
4 pages
Ijctt V3i3p108
No ratings yet
Ijctt V3i3p108
6 pages
Cloud Backup Deduplication Guide
No ratings yet
Cloud Backup Deduplication Guide
3 pages
Paper Publish
No ratings yet
Paper Publish
6 pages
Final Paper v2.5
No ratings yet
Final Paper v2.5
6 pages
Dup Det 06 Sigmod
No ratings yet
Dup Det 06 Sigmod
12 pages
IJCDS Latex 4 8 2024-2
No ratings yet
IJCDS Latex 4 8 2024-2
12 pages
Paper Review: "Dedupv1: Improving Deduplication Throughput Using Solid State Drives"
No ratings yet
Paper Review: "Dedupv1: Improving Deduplication Throughput Using Solid State Drives"
3 pages
A Study On Data Deduplication Techniques For Optimized Storage
No ratings yet
A Study On Data Deduplication Techniques For Optimized Storage
7 pages
An Efficient Framework and Techniques of Data Deduplication in Data Center
No ratings yet
An Efficient Framework and Techniques of Data Deduplication in Data Center
5 pages
A Survey On: Application-Aware Big Data Deduplication in Cloud Environment
No ratings yet
A Survey On: Application-Aware Big Data Deduplication in Cloud Environment
7 pages
Improving Accessing Efficiency of Cloud Storage Using De-Duplication and Feedback Schemes
No ratings yet
Improving Accessing Efficiency of Cloud Storage Using De-Duplication and Feedback Schemes
3 pages
Amol PCX - Report
No ratings yet
Amol PCX - Report
15 pages
Ayesha
No ratings yet
Ayesha
43 pages
A Novel Chunk Coalescing Algorithm For Data Deduplication in Cloud Storage
No ratings yet
A Novel Chunk Coalescing Algorithm For Data Deduplication in Cloud Storage
5 pages
Solving Data De-Duplication Issues On Cloud Using Hashing and Md5 Techniques
No ratings yet
Solving Data De-Duplication Issues On Cloud Using Hashing and Md5 Techniques
11 pages
MJNW06
No ratings yet
MJNW06
4 pages
BDA Questions
No ratings yet
BDA Questions
20 pages
Dedup Slides
No ratings yet
Dedup Slides
51 pages
Duplicate Detection Using Algorithms
No ratings yet
Duplicate Detection Using Algorithms
3 pages
Rank-Indexed Hashing: A Compact Construction of Bloom Filters and Variants
No ratings yet
Rank-Indexed Hashing: A Compact Construction of Bloom Filters and Variants
10 pages
Consistent Hashing With Bounded Load
No ratings yet
Consistent Hashing With Bounded Load
9 pages
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
No ratings yet
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
4 pages
Nptel Dbms 2 Extra Session
No ratings yet
Nptel Dbms 2 Extra Session
63 pages
Li 2020
No ratings yet
Li 2020
13 pages
Data Duplication Removal Using File Checksum
No ratings yet
Data Duplication Removal Using File Checksum
2 pages
Iaetsd Controlling Data Deuplication in Cloud Storage
No ratings yet
Iaetsd Controlling Data Deuplication in Cloud Storage
6 pages
B.tech Bloom Filter 3
No ratings yet
B.tech Bloom Filter 3
14 pages
File Sharing and Data Duplication Removal in Cloud Using File Checksum
No ratings yet
File Sharing and Data Duplication Removal in Cloud Using File Checksum
3 pages
CLOUD COMPUTING - Using Fog Computing To Deduplicate Data For Efficient Cloud Storage.
No ratings yet
CLOUD COMPUTING - Using Fog Computing To Deduplicate Data For Efficient Cloud Storage.
4 pages
Novel and Efficient Approach For Duplicate Record Detection
No ratings yet
Novel and Efficient Approach For Duplicate Record Detection
5 pages
Yen 2009
No ratings yet
Yen 2009
10 pages
Rsa 2008
No ratings yet
Rsa 2008
32 pages
Ref E-Bandwidth-efficient Storage Services For Mitigating Side Channel Attack
No ratings yet
Ref E-Bandwidth-efficient Storage Services For Mitigating Side Channel Attack
11 pages
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
No ratings yet
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
4 pages
Module 4
No ratings yet
Module 4
10 pages
Assign em NT
No ratings yet
Assign em NT
2 pages
Assocrules 2
No ratings yet
Assocrules 2
49 pages
Compusoft, 3 (7), 1012-1015 PDF
No ratings yet
Compusoft, 3 (7), 1012-1015 PDF
4 pages
BDA Assignment2 BE6 20
No ratings yet
BDA Assignment2 BE6 20
9 pages
Streaming Algorithms Overview
No ratings yet
Streaming Algorithms Overview
90 pages
Image Compression Using DWT and Principal Component Analysis
No ratings yet
Image Compression Using DWT and Principal Component Analysis
4 pages
Bloom Filter & Algorithms Guide
No ratings yet
Bloom Filter & Algorithms Guide
9 pages
Secure Destributed De-Duplication System in Reliable Cloud Storage1
No ratings yet
Secure Destributed De-Duplication System in Reliable Cloud Storage1
45 pages
Sathe 2018
No ratings yet
Sathe 2018
4 pages
Data Science 5
No ratings yet
Data Science 5
82 pages
Unit-4 DM
No ratings yet
Unit-4 DM
7 pages
A New Dynamic Hash Index For Flash-Based Storage
No ratings yet
A New Dynamic Hash Index For Flash-Based Storage
6 pages
Bloom Filter: Algorithm Description
No ratings yet
Bloom Filter: Algorithm Description
11 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
Ribbon Filter: Practically Smaller Than Bloom and Xor: Peter C. Dillinger Stefan Walzer
No ratings yet
Ribbon Filter: Practically Smaller Than Bloom and Xor: Peter C. Dillinger Stefan Walzer
14 pages
2023TSC Security-Aware and Efficient Data Deduplication For Edge-Assisted Cloud Storage Systems
No ratings yet
2023TSC Security-Aware and Efficient Data Deduplication For Edge-Assisted Cloud Storage Systems
12 pages
Secure Data Deduplication and Auditing For Cloud Data Storage 1 1
No ratings yet
Secure Data Deduplication and Auditing For Cloud Data Storage 1 1
4 pages
Bloom Filter PDF
No ratings yet
Bloom Filter PDF
13 pages
Hash
No ratings yet
Hash
10 pages
A Study On Data Deduplication in HPC Storage Systems.
No ratings yet
A Study On Data Deduplication in HPC Storage Systems.
11 pages
Cas Survey Bo Zhao
No ratings yet
Cas Survey Bo Zhao
10 pages
Cluster
No ratings yet
Cluster
10 pages
BACULA CommunityDiskBackupDesign
No ratings yet
BACULA CommunityDiskBackupDesign
24 pages
Gear Hobbing Simuation
No ratings yet
Gear Hobbing Simuation
22 pages
Homeostasis: Body Systems Interaction
No ratings yet
Homeostasis: Body Systems Interaction
7 pages
Optimization Techniques Notes
No ratings yet
Optimization Techniques Notes
35 pages
At T Mobility Device Unlock Code Instructions
No ratings yet
At T Mobility Device Unlock Code Instructions
24 pages
2017-Binary Grey Wolf Optimizer For Large Scale Unit Commitment Problem
No ratings yet
2017-Binary Grey Wolf Optimizer For Large Scale Unit Commitment Problem
16 pages
Supreme Mathematic African Ma'at Magic - African Creative Energy
97% (147)
Supreme Mathematic African Ma'at Magic - African Creative Energy
145 pages
Quadrilateral S
No ratings yet
Quadrilateral S
40 pages
12-Threads - Multicore Programming - Multithreading Models - Thread Libraries - Implicit Threading-07!08!2024
100% (1)
12-Threads - Multicore Programming - Multithreading Models - Thread Libraries - Implicit Threading-07!08!2024
14 pages
Water and Its Treatment Lecture Notes
No ratings yet
Water and Its Treatment Lecture Notes
9 pages
Denver Ball Mill Tablas
100% (1)
Denver Ball Mill Tablas
0 pages
Escripting Guide
No ratings yet
Escripting Guide
28 pages
Thermodynamics Basics for Engineers
No ratings yet
Thermodynamics Basics for Engineers
16 pages
SAS STAT 14.3 Whatsnew
No ratings yet
SAS STAT 14.3 Whatsnew
13 pages
Automotive Test Automation Platform
No ratings yet
Automotive Test Automation Platform
5 pages
Chapter 3 Measures of Occurrence Booklet - Final 13may2021
No ratings yet
Chapter 3 Measures of Occurrence Booklet - Final 13may2021
3 pages
FRQ B
No ratings yet
FRQ B
6 pages
FANUC ARC Mate 100ib Datasheet
No ratings yet
FANUC ARC Mate 100ib Datasheet
1 page
CD 01 - MDSP Aug 2019 Final
No ratings yet
CD 01 - MDSP Aug 2019 Final
5 pages
Project Report On Pizza Hut
No ratings yet
Project Report On Pizza Hut
31 pages
Computer 7 1Q Learning Module
No ratings yet
Computer 7 1Q Learning Module
18 pages
Rename Decimal Numbers To Fractions Whose Denominators Are Factors of 10 and 100
No ratings yet
Rename Decimal Numbers To Fractions Whose Denominators Are Factors of 10 and 100
25 pages
3LE Chem 22
No ratings yet
3LE Chem 22
5 pages
Lec-07 Motion in Straight Line Physics AN421MA Amit Sir Vinod
No ratings yet
Lec-07 Motion in Straight Line Physics AN421MA Amit Sir Vinod
17 pages
Mmzen 265 013 - MZ07 - Mz03el 01 (CFD) (FD)
No ratings yet
Mmzen 265 013 - MZ07 - Mz03el 01 (CFD) (FD)
50 pages
01 Lab Dune Buggy
No ratings yet
01 Lab Dune Buggy
2 pages
ANOVA Table
No ratings yet
ANOVA Table
3 pages
Metals and Their Compounds
No ratings yet
Metals and Their Compounds
10 pages
Jackson E&M Solution Homework 2, #2 .
100% (5)
Jackson E&M Solution Homework 2, #2 .
1 page
09W - CAC Multi V - Sistena Heat Recovery
No ratings yet
09W - CAC Multi V - Sistena Heat Recovery
56 pages
Bubble Leak Testing Level I - Questionaire: Southern Inspection Services
No ratings yet
Bubble Leak Testing Level I - Questionaire: Southern Inspection Services
8 pages

A Cluster-Based Data Deduplication Technology

Uploaded by

A Cluster-Based Data Deduplication Technology

Uploaded by

2014 Second International Symposium on Computing and Networking

A cluster-based data deduplication technology

978-1-4799-4152-0/14 $31.00 © 2014 IEEE 226

Fig.2. Mapping table

When the excluding false positives module excludes

Fig.1. The data deduplication system framework

A. Effect of the x value

You might also like