[go: up one dir, main page]

0% found this document useful (0 votes)
40 views5 pages

PY028

This document discusses a data deduplication technique using file checksums in Python to identify and remove duplicate files. The technique calculates checksums for uploaded files and compares them to checksums stored in a database. If a match is found, the file is considered a duplicate and is not uploaded again, saving storage space. The system has modules for administrators to manage files and for users to upload, view, and log out of files. It uses a waterfall development model and requires Python, Django, MySQL, and minimum hardware specifications. Removing duplicate files can help save storage space, speed up indexing, and reduce backup time and size.

Uploaded by

Siddesh G d
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views5 pages

PY028

This document discusses a data deduplication technique using file checksums in Python to identify and remove duplicate files. The technique calculates checksums for uploaded files and compares them to checksums stored in a database. If a match is found, the file is considered a duplicate and is not uploaded again, saving storage space. The system has modules for administrators to manage files and for users to upload, view, and log out of files. It uses a waterfall development model and requires Python, Django, MySQL, and minimum hardware specifications. Removing duplicate files can help save storage space, speed up indexing, and reduce backup time and size.

Uploaded by

Siddesh G d
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Deduplication using File Checksum with

Python
While managing and performing file operations on computer or on other storage
devices, many duplicate files with a considerable size will be gathered on the
computer. Accumulation of these digital junk levels can be a primary cause for
shortage of storage space and decrease in computer performance. Therefore,
there is a need to search and erase duplicate files from computer hard drive.
Sometimes it is necessary to have information about such files that has replicas. If
duplicates of a requested file are present on your computer, all will be placed in
RAM hence it may cause your system to slowdown. We use a data deduplication
technique in which, whenever a file is uploaded, the system starts checking the
checksum, and the checksum verifies checksum information put away in the
database. If the file exists, at that point it will refresh the section else it will make
another passage into the database. Duplicate File searcher and Remover will help
you reclaim valuable disk space and improve data efficiency. Deleting duplicates
will help to speed up indexing and reduces back up time and size. It can quickly
and safely find the unwanted duplicate files from the system and then delete or
move the duplicate files to separate folder, according to the user requirement.
The duplicates will be removed from your system.
 Modules:

The system comprises of 3 major modules with their sub-modules as follows:

1. Admin:

 Manage File: Admin can Manage files by deleting, updating and


adding.

2. User:

 Upload File: User can Upload File.


 View File: User can view the upload file.
 Logout: User can logout.
Project Lifecycle:

Description
The waterfall Model is a linear sequential flow. In which progress is seen as
flowing steadily downwards (like a waterfall) through the phases of software
implementation. This means that any phase in the development process begins
only if the previous phase is complete. The waterfall approach does not define the
process to go back to the previous phase to handle changes in requirement. The
waterfall approach is the earliest approach that was used for software
development.
 Hardware Requirement:

 Processor –Core i3
 Hard Disk – 160 GB
 Memory – 1GB RAM
 Monitor

 Software Requirement:

 Windows 7 or higher
 Python
 Django framework
 MySQL database
 Advantages

 It will save time.


 It will remove duplicate files.
 Storage remains free.

 Limitation
 Data need to be entered properly otherwise, outcome may won’t be
accurate.

 Application
 This system can be used by the multiple peoples to get the
counselling sessions online.

 Reference
 https://shsu-ir.tdl.org/shsu-ir/bitstream/handle/
20.500.11875/1164/0781.pdf?sequence=1
 https://ieeexplore.ieee.org/document/6208293/
 https://ieeexplore.ieee.org/document/4679917/

You might also like