[go: up one dir, main page]

CN103412929A - Mass data storage method - Google Patents

Mass data storage method Download PDF

Info

Publication number
CN103412929A
CN103412929A CN2013103596144A CN201310359614A CN103412929A CN 103412929 A CN103412929 A CN 103412929A CN 2013103596144 A CN2013103596144 A CN 2013103596144A CN 201310359614 A CN201310359614 A CN 201310359614A CN 103412929 A CN103412929 A CN 103412929A
Authority
CN
China
Prior art keywords
file
data
value
service end
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013103596144A
Other languages
Chinese (zh)
Inventor
柯宗贵
柯宗庆
杨育斌
曹兴财
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bluedon Information Security Technologies Co Ltd
Original Assignee
Bluedon Information Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bluedon Information Security Technologies Co Ltd filed Critical Bluedon Information Security Technologies Co Ltd
Priority to CN2013103596144A priority Critical patent/CN103412929A/en
Publication of CN103412929A publication Critical patent/CN103412929A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a mass data storage method which specifically comprises the following steps: judging whether a file exists in a storage space through a server when a user requests to submit the file, wherein the server takes the received md5 value as an index value to judge whether the file exists; if the same md5 value exists, continuously comparing the md5 value for a sliced file; if the same file exists, updating a record block, and if the slice is different from the md5 value recorded by the data block, uploading the source file slice and the md5 value information, and updating and recording the related information of the data block through the server. According to the method, the low efficiency caused by verifying and uploading of repeating data is improved in mass data storage, a copy is dynamically adjusted, and the allocation of the storage space is improved.

Description

A kind of storage means of mass data
Technical field
The present invention relates to technical field of data storage, relate in particular to a kind of storage means of mass data.
Background technology
In the mass data storage system, the existence of a large amount of repeating datas, not only increased spending, and reduced effectiveness of retrieval, and deleting duplicated data, and then reduction storage space, be a problem demanding prompt solution.The existence of many copies has guaranteed the reliability of system, and when single node broke down, the copy of other node can continue to provide service, maintained the normal operation of system.The increase of copy amount, can make to safeguard the consistent expense that increased of copy, and a plurality of copies is synchronous, also increased bandwidth.When considering data reliability, should reasonably to copy, carry out layout.
In prior art, the linux source is heavily deleted technology, and file is divided into to some fritters, first file is made to simple proof test value relatively, does not really mate, and then carries out the md5 value relatively.
HDFS adopts complete backup policy, is defaulted as 3 parts of backups of each document creation, and the copy of 3 backups is placed dispersedly, has prevented the Single Point of Faliure that may occur.
But in mass data, by filename, identify file not too reliable, in system, may exist not of the same name, but the consistent data of file content.If a plurality of small data pieces that large file is divided into compare one by one, computing time is too slow again.Due to the otherness of system file, access frequency is different, if all files all adopt identical backup policy, can not utilize efficiently storage space.
Summary of the invention
The objective of the invention is, in order to overcome the defect of prior art, provides a kind of storage means of mass data, and the idiographic flow of the method is:
When user's request was presented a paper, the md5 value that service end will receive, as index value, judged whether file exists, if there is identical md5 value to exist, for the file of section, continue relatively md5 value, if there has been identical file, upgrade recording data blocks; If section is different from the md5 value of data block record, source file section and md5 value information to be uploaded, service end is upgraded the recording data blocks relevant information.
In said method, judge file whether Already in the determination strategy flow process in storage space be specially:
If the source file byte-sized is size, given constant m_size, as the judgement radix, when size is less than or equal to m_size, carries out the md5 computing to whole file, after having calculated, md5 value and source file length is passed to service end; If the source file byte-sized is greater than m_size, using the m_length constant as computational length, to the source file head, in, the content of tail three parts calculates the md5 value; These three md5 are connected with source file length and generate character string, and calculating character string md5 value, md5 value and source file length are sent to service end.
The beneficial effect that technical solution of the present invention is brought:
While by the present invention, not only having improved mass data storage, the efficiency that the repeating data checking is uploaded, and dynamically adjust copy, improved the distribution of storage space.
The accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, below will the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is method flow diagram of the present invention;
Fig. 2 is File determination strategy process flow diagram of the present invention;
Fig. 3 is the storage node composition that records complete file in the present invention;
Fig. 4 is that File of the present invention is fetched process flow diagram;
Fig. 5 is that in the present invention, many copies are eliminated process flow diagram.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills, not making under the creative work prerequisite the every other embodiment obtained, belong to the scope of protection of the invention.
The invention provides a kind of storage means of mass data, its problem to be solved, the one, improve while presenting a paper, the efficiency of repeating data checking, the 2nd, dynamically adjust copy, rationally utilize hard drive space.
Method flow of the present invention as shown in Figure 1, is specially:
S1: when user's request was presented a paper, the md5 value that service end will receive, as index value, judged whether file exists, if there is identical md5 value to exist, for the file of section, continue relatively md5 value, if there has been identical file, upgrade recording data blocks.
S2: if section is different from the md5 value of data block record, source file section and md5 value information are uploaded, service end is upgraded the recording data blocks relevant information.
Already in whether the determination strategy flow process in storage space is as shown in Figure 2, specific as follows in said method, to judge file:
If the source file byte-sized is size, given constant m_size, as the judgement radix, when size is less than or equal to m_size, carries out the md5 computing to whole file, after having calculated, md5 value and source file length is passed to service end; If the source file byte-sized is greater than m_size, using the m_length constant as computational length, to the source file head, in, the content of tail three parts calculates the md5 value.These three md5 are connected with source file length and generate character string, and calculating character string md5 value, md5 value and source file length are sent to service end.
Be illustrated in figure 3 the storage node composition that records complete file:
What the first node such as SID1, SID2 was deposited is the information such as md5 value and filename, uses as index file.It is node that back connects, and is the section pointer of file, index section use, and these nodes form complete file.
If while when concrete enforcement is of the present invention, requiring to fetch file after user's storage file, its flow process is as shown in Figure 4, specific as follows:
At first judge that the file possibility exists, if there is no, response file does not exist.
If the file of fetching exists, in the data block that receives the service end transmission, if client is wanted deleted file, service end also will subtract 1 by the number of times of quoting of piece storage, is 0 if quote number of times, just deletes all information of whole file.The data block that the client service end sends, merge and obtain original file according to number order.
In concrete implementation and operation, in order rationally to utilize storage space, need to eliminate some expired data, as shown in Figure 5, idiographic flow is as follows for its flow process:
During service end storage data, adopt two-level memory.The preferential SAS hard disk of selecting, according to lru algorithm, at first stale data eliminates the SATA hard disk; If eliminate rear space, be not enough to hold the data that will deposit in, so by certain hour, for several times less data of access are eliminated the SATA hard disk; If SATA hard drive space deficiency, compare data and the data SATA of from SAS, eliminating, at first eliminate out stale data; If after deleting stale data, free space still is not enough to hold the data that will deposit in, and by certain hour, the data that access times are less eliminate, and sends message to the daily record center, the warning memory space inadequate.
While by the present invention, not only having improved mass data storage, the efficiency that the repeating data checking is uploaded, and dynamically adjust copy, improved the distribution of storage space.
The storage means of above a kind of mass data that the embodiment of the present invention is provided is described in detail, applied specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment is just be used to helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (5)

1. the storage means of a mass data, is characterized in that, the idiographic flow of the method is:
When user's request was presented a paper, the md5 value that service end will receive, as index value, judged whether file exists, if there is identical md5 value to exist, for the file of section, continue relatively md5 value, if there has been identical file, upgrade recording data blocks;
If section is different from the md5 value of data block record, source file section and md5 value information to be uploaded, service end is upgraded the recording data blocks relevant information.
2. method according to claim 1, is characterized in that, in said method, judge file whether Already in the determination strategy flow process in storage space be specially:
If the source file byte-sized is size, given constant m_size, as the judgement radix, when size is less than or equal to m_size, carries out the md5 computing to whole file, after having calculated, md5 value and source file length is passed to service end; If the source file byte-sized is greater than m_size, using the m_length constant as computational length, to the source file head, in, the content of tail three parts calculates the md5 value; These three md5 are connected with source file length and generate character string, and calculating character string md5 value, md5 value and source file length are sent to service end.
3. method according to claim 2, is characterized in that, md5 value and filename exist in first node, and as index file use, it is node that back connects, and is the section pointer of file, index section use, and these nodes form complete file.
4. method according to claim 1, is characterized in that, while after user's storage file, requiring to fetch file, the concrete operations flow process is:
At first judge that the file possibility exists, if there is no, response file does not exist;
If the file of fetching exists, in the data block that receives the service end transmission, if client is wanted deleted file, service end also will subtract 1 by the number of times of quoting of piece storage, if quote number of times, be 0, just delete all information of whole file, the data block that the client service end sends, merge and obtain original file according to number order.
5. method according to claim 1, is characterized in that, in order rationally to utilize storage space need to eliminate some expired data, idiographic flow is in the process of data storage:
During service end storage data, adopt two-level memory, preferentially select the SAS hard disk, according to lru algorithm, at first stale data eliminates the SATA hard disk; If eliminate rear space, be not enough to hold the data that will deposit in, so by certain hour, for several times less data of access are eliminated the SATA hard disk; If SATA hard drive space deficiency, compare data and the data SATA of from SAS, eliminating, at first eliminate out stale data; If after deleting stale data, free space still is not enough to hold the data that will deposit in, and by certain hour, the data that access times are less eliminate, and sends message to the daily record center, the warning memory space inadequate.
CN2013103596144A 2013-08-16 2013-08-16 Mass data storage method Pending CN103412929A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013103596144A CN103412929A (en) 2013-08-16 2013-08-16 Mass data storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013103596144A CN103412929A (en) 2013-08-16 2013-08-16 Mass data storage method

Publications (1)

Publication Number Publication Date
CN103412929A true CN103412929A (en) 2013-11-27

Family

ID=49605941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013103596144A Pending CN103412929A (en) 2013-08-16 2013-08-16 Mass data storage method

Country Status (1)

Country Link
CN (1) CN103412929A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997517A (en) * 2014-05-06 2014-08-20 广州金山网络科技有限公司 CDN-node file synchronization method and device
CN104503862A (en) * 2014-12-09 2015-04-08 北京奇虎科技有限公司 Method and device for obtaining check value of application channel package
CN104639629A (en) * 2015-01-30 2015-05-20 英华达(上海)科技有限公司 File comparing method and system at client and cloud
CN105528460A (en) * 2016-01-12 2016-04-27 中国测绘科学研究院 Establishing method of tile pyramid model and tile reading method
CN106202173A (en) * 2016-06-26 2016-12-07 厦门天锐科技股份有限公司 The Intelligent drainage weighing method of a kind of file repository storage and system
CN106951192A (en) * 2017-03-25 2017-07-14 广州硕点电子科技有限公司 A data storage method, device and system
CN108243207A (en) * 2016-12-23 2018-07-03 航天星图科技(北京)有限公司 A kind of date storage method of network cloud disk
CN110806949A (en) * 2019-11-05 2020-02-18 广东紫晶信息存储技术股份有限公司 Verification data generation method and system and data verification method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN101908077A (en) * 2010-08-27 2010-12-08 华中科技大学 A data deduplication method suitable for cloud backup
CN102307221A (en) * 2011-03-25 2012-01-04 国云科技股份有限公司 Cloud storage system and implementation method thereof
WO2012081099A1 (en) * 2010-12-15 2012-06-21 富士通株式会社 Data transfer program, computer, and data transfer method
CN102833294A (en) * 2011-06-17 2012-12-19 阿里巴巴集团控股有限公司 File processing method and system based on cloud storage, and server cluster system
CN103246730A (en) * 2013-05-08 2013-08-14 网易(杭州)网络有限公司 File storage method and device and file sensing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN101908077A (en) * 2010-08-27 2010-12-08 华中科技大学 A data deduplication method suitable for cloud backup
WO2012081099A1 (en) * 2010-12-15 2012-06-21 富士通株式会社 Data transfer program, computer, and data transfer method
CN102307221A (en) * 2011-03-25 2012-01-04 国云科技股份有限公司 Cloud storage system and implementation method thereof
CN102833294A (en) * 2011-06-17 2012-12-19 阿里巴巴集团控股有限公司 File processing method and system based on cloud storage, and server cluster system
CN103246730A (en) * 2013-05-08 2013-08-14 网易(杭州)网络有限公司 File storage method and device and file sensing method and device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997517A (en) * 2014-05-06 2014-08-20 广州金山网络科技有限公司 CDN-node file synchronization method and device
CN104503862A (en) * 2014-12-09 2015-04-08 北京奇虎科技有限公司 Method and device for obtaining check value of application channel package
CN104503862B (en) * 2014-12-09 2017-12-19 北京奇虎科技有限公司 The method and apparatus for obtaining the check value using channel bag
CN104639629A (en) * 2015-01-30 2015-05-20 英华达(上海)科技有限公司 File comparing method and system at client and cloud
CN105528460A (en) * 2016-01-12 2016-04-27 中国测绘科学研究院 Establishing method of tile pyramid model and tile reading method
CN106202173A (en) * 2016-06-26 2016-12-07 厦门天锐科技股份有限公司 The Intelligent drainage weighing method of a kind of file repository storage and system
CN106202173B (en) * 2016-06-26 2019-11-12 厦门天锐科技股份有限公司 A kind of intelligent rearrangement and system of file repository storage
CN108243207A (en) * 2016-12-23 2018-07-03 航天星图科技(北京)有限公司 A kind of date storage method of network cloud disk
CN106951192A (en) * 2017-03-25 2017-07-14 广州硕点电子科技有限公司 A data storage method, device and system
CN110806949A (en) * 2019-11-05 2020-02-18 广东紫晶信息存储技术股份有限公司 Verification data generation method and system and data verification method and system

Similar Documents

Publication Publication Date Title
US12197292B2 (en) Tiered cloud storage for different availability and performance requirements
CN103412929A (en) Mass data storage method
US9798629B1 (en) Predicting backup failures due to exceeding the backup window
US9442954B2 (en) Method and apparatus for achieving optimal resource allocation dynamically in a distributed computing environment
CN104408091B (en) The date storage method and system of distributed file system
US9298385B2 (en) System, method and computer program product for deduplication aware quality of service over data tiering
US11093387B1 (en) Garbage collection based on transmission object models
CN104615606B (en) A kind of Hadoop distributed file systems and its management method
CN103793425A (en) Data processing method and data processing device for distributed system
CN103020255B (en) Classification storage means and device
CN103955530B (en) Data reconstruction and optimization method of on-line repeating data deletion system
CN110727406B (en) Data storage scheduling method and device
US11914894B2 (en) Using scheduling tags in host compute commands to manage host compute task execution by a storage device in a storage system
CN103152395A (en) Storage method and device of distributed file system
GB2518158A (en) Method and system for data access in a storage infrastructure
US9984139B1 (en) Publish session framework for datastore operation records
CN105573859A (en) Data recovery method and device of database
CN107506466B (en) Method and system for storing small files
WO2024131379A1 (en) Data storage method, apparatus and system
CN102722450B (en) Storage method for redundancy deletion block device based on location-sensitive hash
CN105574008B (en) Task scheduling method and device applied to distributed file system
CN109710454A (en) A kind of cloud host snapshot method and device
CN104917788A (en) Data storage method and apparatus
CN108897822A (en) A kind of data-updating method, device, equipment and readable storage medium storing program for executing
US20120311021A1 (en) Processing method of transaction-based system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20131127

WD01 Invention patent application deemed withdrawn after publication