CN109684121A - A kind of file access pattern method and system - Google Patents
A kind of file access pattern method and system Download PDFInfo
- Publication number
- CN109684121A CN109684121A CN201811577499.7A CN201811577499A CN109684121A CN 109684121 A CN109684121 A CN 109684121A CN 201811577499 A CN201811577499 A CN 201811577499A CN 109684121 A CN109684121 A CN 109684121A
- Authority
- CN
- China
- Prior art keywords
- file
- target
- type
- recovery
- target recovery
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0727—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
File access pattern method and system provided in an embodiment of the present invention, it is related to data recovery technique field, by utilizing k-means algorithm, identify that target restores each keyword of file, calculate separately the frequency of each keyword, generate the feature vector that target restores file, utilize k-means algorithm, the feature vector that target restores file is matched with each standard feature vector respectively, obtain the type probability table that the target restores file, according to probability tables, determine that target restores the file type of file, according to file type, it selects corresponding file template to restore file to target to restore, it can identify badly damaged file, expand the range of file identification, improve the accuracy of file type identification.
Description
Technical Field
The invention relates to the technical field of data recovery, in particular to a file recovery method and a file recovery system.
Background
The existing file recovery technology comprises various modes such as using a cache file, using a sector residual file and performing manual operation, and the modes have the following common defects: only the type of the file without serious damage can be identified, and the identification rate of the file with serious damage, particularly the executable file, is not high, so that the identification range of the file type is narrow and the accuracy is low.
Disclosure of Invention
In order to overcome the defects in the prior art, embodiments of the present invention provide a file recovery method and system.
In a first aspect, an embodiment of the present invention provides a file recovery method, where the method includes:
identifying each keyword of the target recovery file by using a k-means algorithm;
respectively calculating the frequency of each keyword to generate a feature vector of the target recovery file;
respectively matching the eigenvectors of the target recovery file with each standard eigenvector by using a k-means algorithm to obtain a type probability table of the target recovery file;
determining the file type of the target recovery file according to the probability table;
and selecting a corresponding file template to recover the target recovery file according to the file type.
Further, before identifying the keywords of the target recovery file using the k-means algorithm, the method further comprises:
the method comprises the steps of obtaining keywords of a common type file by using an kmp algorithm, respectively calculating the frequency of the keywords of the common type file, and generating a plurality of standard feature vectors, wherein the common type file is a blank file and comprises files in DOC, PDF and PE formats.
In a second aspect, an embodiment of the present invention provides a file recovery system, including:
the identification module is used for identifying each keyword of the target recovery file by using a k-means algorithm;
the calculation module is used for calculating the frequency of each keyword respectively and generating a feature vector of the target recovery file;
the matching module is used for respectively matching the feature vectors of the target recovery file with each standard feature vector by using a k-means algorithm to obtain a type probability table of the target recovery file;
a determining module, configured to determine a file type of the target restoration file according to the probability table;
and the recovery module is used for selecting a corresponding file template to recover the target recovery file according to the file type.
Further, the computing module is further configured to:
the method comprises the steps of obtaining keywords of a common type file by using an kmp algorithm, respectively calculating the frequency of the keywords of the common type file, and generating a plurality of standard feature vectors, wherein the common type file is a blank file and comprises files in DOC, PDF and PE formats.
The file recovery method and the file recovery system provided by the embodiment of the invention have the following beneficial effects:
the method can identify the seriously damaged files, enlarge the range of file identification and improve the accuracy of file type identification.
Drawings
Fig. 1 is a schematic flowchart of a file recovery method according to an embodiment of the present invention;
fig. 2 is a schematic composition diagram of a file recovery system according to an embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and the embodiments.
Referring to fig. 1, a file recovery method provided in an embodiment of the present invention includes the following steps:
s101, identifying each keyword of the target recovery file by using a k-means algorithm.
And S102, respectively calculating the frequency of each keyword, and generating a feature vector of the target recovery file.
And S103, respectively matching the feature vectors of the target recovery file with each standard feature vector by using a k-means algorithm to obtain a type probability table of the target recovery file.
And S104, determining the file type of the target recovery file according to the probability table.
As a specific example, if the similarity between the feature vector of the target restoration file and the feature vector of the DOC-format file is 70%, the similarity between the feature vector of the DOC-format file and the feature vector of the PDF-format file is 90%, and the similarity between the feature vector of the PE-format file and the feature vector of the DOC-format file is 80%, the format of the target restoration file is determined to be the PDF format.
And S105, selecting a corresponding file template to recover the target recovery file according to the file type.
Optionally, before identifying the keywords of the target recovery file by using a k-means algorithm, the method further comprises:
the method comprises the steps of obtaining keywords of a common type file by using an kmp algorithm, respectively calculating the frequency of the keywords of the common type file, and generating a plurality of standard feature vectors, wherein the common type file is a blank file and comprises files in DOC, PDF and PE formats.
The frequencies of keywords of two or more common type files can be combined together to form a standard feature vector;
a file is described according to a set of keywords of a file header of the file, and a corresponding file browser may focus on each keyword in the file so as to perform various operations on the file.
As shown in fig. 2, the file recovery system provided in the embodiment of the present invention includes an identification module, a calculation module, a matching module, a determination module, and a recovery module. Wherein,
the identification module is used for identifying each keyword of the target recovery file by using a k-means algorithm;
the calculation module is used for calculating the frequency of each keyword respectively and generating a feature vector of the target recovery file;
the matching module is used for respectively matching the feature vectors of the target recovery file with each standard feature vector by using a k-means algorithm to obtain a type probability table of the target recovery file;
a determining module, configured to determine a file type of the target restoration file according to the probability table;
and the recovery module is used for selecting a corresponding file template to recover the target recovery file according to the file type.
Optionally, the calculation module is further configured to obtain keywords of a common type file by using an kmp algorithm, calculate frequencies of the keywords of the common type file respectively, and generate a plurality of standard feature vectors, where the common type file is a blank file and includes files in DOC, PDF, and PE formats.
According to the file recovery method and system provided by the embodiment of the invention, each keyword of the target recovery file is identified by using a k-means algorithm, the frequency of each keyword is respectively calculated, the feature vector of the target recovery file is generated, the feature vector of the target recovery file is respectively matched with each standard feature vector by using the k-means algorithm, a type probability table of the target recovery file is obtained, the file type of the target recovery file is determined according to the probability table, and a corresponding file template is selected according to the file type to recover the target recovery file, so that the severely damaged file can be identified, the file identification range is expanded, and the accuracy of file type identification is improved.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In addition, the memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (4)
1. A file recovery method, comprising:
identifying each keyword of the target recovery file by using a k-means algorithm;
respectively calculating the frequency of each keyword to generate a feature vector of the target recovery file;
respectively matching the eigenvectors of the target recovery file with each standard eigenvector by using a k-means algorithm to obtain a type probability table of the target recovery file;
determining the file type of the target recovery file according to the probability table;
and selecting a corresponding file template to recover the target recovery file according to the file type.
2. The file recovery method of claim 1, wherein before identifying each keyword of the target recovery file using a k-means algorithm, the method further comprises:
the method comprises the steps of obtaining keywords of a common type file by using an kmp algorithm, respectively calculating the frequency of the keywords of the common type file, and generating a plurality of standard feature vectors, wherein the common type file is a blank file and comprises files in DOC, PDF and PE formats.
3. A file recovery system, comprising:
the identification module is used for identifying each keyword of the target recovery file by using a k-means algorithm;
the calculation module is used for calculating the frequency of each keyword respectively and generating a feature vector of the target recovery file;
the matching module is used for respectively matching the feature vectors of the target recovery file with each standard feature vector by using a k-means algorithm to obtain a type probability table of the target recovery file;
a determining module, configured to determine a file type of the target restoration file according to the probability table;
and the recovery module is used for selecting a corresponding file template to recover the target recovery file according to the file type.
4. The file recovery system of claim 3, wherein the computing module is further configured to:
the method comprises the steps of obtaining keywords of a common type file by using an kmp algorithm, respectively calculating the frequency of the keywords of the common type file, and generating a plurality of standard feature vectors, wherein the common type file is a blank file and comprises files in DOC, PDF and PE formats.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811577499.7A CN109684121A (en) | 2018-12-20 | 2018-12-20 | A kind of file access pattern method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811577499.7A CN109684121A (en) | 2018-12-20 | 2018-12-20 | A kind of file access pattern method and system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109684121A true CN109684121A (en) | 2019-04-26 |
Family
ID=66188117
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811577499.7A Pending CN109684121A (en) | 2018-12-20 | 2018-12-20 | A kind of file access pattern method and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109684121A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113326511A (en) * | 2021-06-25 | 2021-08-31 | 深信服科技股份有限公司 | File repair method, system, device and medium |
| CN118394574A (en) * | 2024-04-26 | 2024-07-26 | 广州锦高信息科技有限公司 | A virus-invaded system recovery method and system |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1458580A (en) * | 2002-03-01 | 2003-11-26 | 惠普开发有限公司 | File classification method and device |
| CN101853250A (en) * | 2009-04-03 | 2010-10-06 | 华为技术有限公司 | Method and device for classifying documents |
| US20110047168A1 (en) * | 2006-05-31 | 2011-02-24 | Ellingsworth Martin E | Method and system for classifying documents |
| TW201516713A (en) * | 2013-10-16 | 2015-05-01 | Chunghwa Telecom Co Ltd | File classification method based on group characteristic values |
| CN107862051A (en) * | 2017-11-08 | 2018-03-30 | 郑州云海信息技术有限公司 | A kind of file classifying method, system and a kind of document classification equipment |
-
2018
- 2018-12-20 CN CN201811577499.7A patent/CN109684121A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1458580A (en) * | 2002-03-01 | 2003-11-26 | 惠普开发有限公司 | File classification method and device |
| US20110047168A1 (en) * | 2006-05-31 | 2011-02-24 | Ellingsworth Martin E | Method and system for classifying documents |
| CN101853250A (en) * | 2009-04-03 | 2010-10-06 | 华为技术有限公司 | Method and device for classifying documents |
| TW201516713A (en) * | 2013-10-16 | 2015-05-01 | Chunghwa Telecom Co Ltd | File classification method based on group characteristic values |
| CN107862051A (en) * | 2017-11-08 | 2018-03-30 | 郑州云海信息技术有限公司 | A kind of file classifying method, system and a kind of document classification equipment |
Non-Patent Citations (1)
| Title |
|---|
| 完美下载小客服: "想快速还原指定文件?用数据恢复精灵来帮忙", 《HTTPS://TECH.WMZHE.COM/ARTICLE/7838.HTML》 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113326511A (en) * | 2021-06-25 | 2021-08-31 | 深信服科技股份有限公司 | File repair method, system, device and medium |
| CN113326511B (en) * | 2021-06-25 | 2024-04-09 | 深信服科技股份有限公司 | File repair method, system, equipment and medium |
| CN118394574A (en) * | 2024-04-26 | 2024-07-26 | 广州锦高信息科技有限公司 | A virus-invaded system recovery method and system |
| CN118394574B (en) * | 2024-04-26 | 2025-02-14 | 广州锦高信息科技有限公司 | A virus-invaded system recovery method and system |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106649346B (en) | Data repeatability checking method and device | |
| CN106557486A (en) | A kind of storage method and device of data | |
| KR102091913B1 (en) | Method and apparatus of controlling network payment | |
| CN110704418A (en) | Block chain information query method, device and equipment | |
| CN112487083B (en) | A data verification method and device | |
| CN112579623A (en) | Method, device, storage medium and equipment for storing data | |
| CN114239097A (en) | Method and device for generating process file, storage medium and electronic device | |
| CN109684121A (en) | A kind of file access pattern method and system | |
| CN108073595A (en) | It is a kind of to realize data update and the method and device of snapshot in olap database | |
| CN108228443B (en) | Web application testing method and device | |
| CN106648839B (en) | Data processing method and device | |
| CN106445960A (en) | Data clustering method and device | |
| CN115293243A (en) | Method, device and equipment for realizing intelligent matching of data assets | |
| CN110990096A (en) | Method, device and equipment for generating user interface and storage medium | |
| CN107016028B (en) | Data processing method and apparatus thereof | |
| CN110019357B (en) | Database query script generation method and device | |
| CN116361671B (en) | A high-entropy KNN clustering method, equipment and medium based on post-correction | |
| CN112612915B (en) | Picture labeling method and device | |
| US20240296554A1 (en) | Method and system for data balancing and hair-line fracture detection | |
| CN117520645A (en) | Financial product-based user determination method, device and electronic device | |
| CN116415156A (en) | Method, device and medium for calculating document similarity | |
| CN108121733B (en) | Data query method and device | |
| CN111125165A (en) | Set merging method, device, processor and machine-readable storage medium | |
| US20160364366A1 (en) | Entity Matching Method and Apparatus | |
| CN110210030B (en) | Statement analysis method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190426 |
|
| RJ01 | Rejection of invention patent application after publication |