[go: up one dir, main page]

CN115454948A - Metadata-based data interaction method and system - Google Patents

Metadata-based data interaction method and system Download PDF

Info

Publication number
CN115454948A
CN115454948A CN202211285466.1A CN202211285466A CN115454948A CN 115454948 A CN115454948 A CN 115454948A CN 202211285466 A CN202211285466 A CN 202211285466A CN 115454948 A CN115454948 A CN 115454948A
Authority
CN
China
Prior art keywords
file
data
metadata
description
blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211285466.1A
Other languages
Chinese (zh)
Inventor
刘驼峰
徐良
袁泉
刘源
古松景
慕荣臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN202211285466.1A priority Critical patent/CN115454948A/en
Publication of CN115454948A publication Critical patent/CN115454948A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a metadata-based data interaction method and system, including generating a metadata repository file using a metadata compression algorithm, including: generating a virtual directory description file for each target data file; decomposing a target data file into a plurality of data blocks and adding the data blocks into a description file; compressing and converting the decomposed data blocks into 7z data folder blocks, and adding 7z data folder description database nodes in the description file; generating a first-level data file, performing traversal statistics on the occurrence times of metadata in data blocks in the compressed data file, and sequencing to generate metadata warehouse files; for the metadata warehouse file which is downloaded completely, decompressing the file based on a decompression algorithm, wherein the decompression algorithm comprises the following steps: restoring the first-level data file, decompressing the last data block and obtaining a virtual directory description file; decompressing the first data block and restoring all file description data blocks; and restoring all the data file directories to finish file decompression. The method and the device can improve the utilization rate of the storage space of the website, and the influence of the decompression speed is small.

Description

Metadata-based data interaction method and system
Technical Field
The invention relates to the technical field of big data processing and data compression, in particular to a data interaction method and system based on metadata.
Background
With the improvement of the calculation and storage performance of terminal hardware such as modern computers, mobile phones and the like, audio and video stream data gradually becomes one of the main sources of network data traffic, and meanwhile, the storage capacity of application data packets such as software installation packets, disk mapping, docker containers and the like is also gradually improved. The solution of the present invention seeks to more efficiently and quickly transmit, share, and publish large data volume files on the internet.
At present, a plurality of compression algorithms and tools are mainstream, the compression ratio is also higher, but the compression ratio is rapidly reduced along with the increase of the entropy of calculation. To further increase the compression ratio, a search from a larger data range is necessary. The compression process of the present invention consumes a significant amount of computing and disk space. These requirements are very difficult to implement in personal computing. It can only be realized by using a large amount of huge computing resources on the cloud. The size of the compressed file data is smaller, and the BitTorrent sharing protocol is matched and used during downloading, so that the data transmission efficiency can be greatly improved.
Disclosure of Invention
In order to solve the technical problems in the prior art, the present invention provides a data interaction method and system based on metadata, so as to solve the technical problems.
According to one aspect of the invention, a data interaction method based on metadata is provided, which comprises the following steps:
s1: generating a metadata repository file using a metadata compression algorithm, wherein the metadata compression algorithm comprises the steps of:
s11: generating a virtual directory description file for each target data file;
s12: according to the extension name of the single target file, calling a corresponding file decomposition algorithm to decompose the target data file into a plurality of data blocks, and adding the data blocks into the virtual directory description file;
s13: compressing and converting the decomposed data blocks into 7z data folder blocks by using a 7z algorithm, and adding 7z data folder description database nodes in the virtual directory description file;
s14: generating a first-level data file, performing traversal statistics on the occurrence times of metadata in data blocks in each compressed data file, and sequencing to generate metadata warehouse files;
s2: for the metadata warehouse file which is downloaded completely, decompressing the file based on a decompression algorithm, wherein the decompression algorithm comprises the following steps:
s21: restoring the first-level data file, and calling a metadata compression algorithm to decompress the last data block to obtain a virtual directory description file;
s22: decompressing the first data block based on step S14 in the virtual directory description file and metadata compression algorithm, and restoring all file description data blocks;
s23: and (S13) restoring all data file directories based on the virtual directory description file and the metadata compression algorithm to finish file decompression.
In some specific embodiments, the virtual directory description file is described in an XML file format, and is used for recording the directory hierarchy of the target file and the basic information of the data file, and the file data block decomposition information.
In some specific embodiments, the data blocks include a file description data block, an original file data block, and a 7z data folder block, each arranged in a storage order of the original file.
In some specific embodiments, S13 includes:
after all file description data blocks are merged, calling a 7z algorithm to compress and convert the file description data blocks into 7z data folder blocks, and adding 7z data folder description database nodes in the virtual directory description file;
and respectively calling a 7z algorithm for each original file data block to compress and convert the original file data block into a 7z data folder block, and adding a 7z data folder description database node in the virtual directory description file.
In some specific embodiments, the metadata warehouse file is a 16GB metadata warehouse file, and the specific generation manner includes:
generating a secondary data file according to the primary data file, wherein the secondary data file is formed by a compressed record data set, and the compressed record data set is compressed after data blocks of the primary data file are sequenced;
merging each two-level data file one by one to generate a unique three-level data file, sequencing the three-level data files according to the repetition times, and extracting the top 2 with the most data 31 And recording the strip, and generating a maximum 16GB metadata warehouse file.
In some specific embodiments, the metadata compression algorithm further includes step S15: and reconstructing the data blocks of all the primary data files according to the size of the metadata by using the metadata warehouse file as an index table to generate a final data release file.
In some specific embodiments, step S131 is further included before step S14: and constructing a first-stage compressed data file, wherein the compressed data file comprises file header description information, data blocks and tail alignment data blocks, the file offset distance of the last data block is recorded in the file header description information, and the last data block in the data blocks calls a 7z data folder block compressed and converted by a 7z algorithm for the virtual directory description file.
In some specific embodiments, the primary data file is generated in the following manner: all the template files in S12, S13, and S131 are compressed into a primary data file.
In some specific embodiments, in S21, based on step S15, the reverse operation is performed to restore the primary data file.
According to a second aspect of the invention, a computer-readable storage medium is proposed, on which one or more computer programs are stored, which when executed by a computer processor implement the method of any of the above.
According to a third aspect of the present invention, a metadata-based data interaction system is provided, the system comprising:
a compression unit configured to generate a metadata repository file using a metadata compression algorithm, wherein the metadata compression algorithm comprises the steps of: s11: generating a virtual directory description file for each target data file; s12: according to the extension name of the single target file, calling a corresponding file decomposition algorithm to decompose the target data file into a plurality of data blocks, and adding the data blocks into the virtual directory description file; s13: compressing and converting the decomposed data blocks into 7z data folder blocks by using a 7z algorithm, and adding 7z data folder description database nodes in the virtual directory description file; s14: generating a first-level data file, performing traversal statistics on the occurrence times of metadata in data blocks in each compressed data file, and sequencing to generate metadata warehouse files;
a decompression unit: the method is configured for decompressing files based on a decompression algorithm for metadata warehouse files which are downloaded completely, and comprises the following steps: s21: restoring the first-level data file, and calling a metadata compression algorithm to decompress the last data block to obtain a virtual directory description file; s22: decompressing the first data block based on step S14 in the virtual directory description file and metadata compression algorithm, and restoring all file description data blocks; s23: and S13, restoring all data file directories based on the virtual directory description file and the metadata compression algorithm to finish file decompression.
In some specific embodiments, the virtual directory description file is described in an XML file format, and is used to record the directory hierarchy of the target file, the basic information of the data file, and the file data block decomposition information; the data blocks include a file description data block, an original file data block, and a 7z data folder block, each of which is arranged in a storage order of the original file.
In some specific embodiments, S13 includes: after all file description data blocks are merged, calling a 7z algorithm to compress and convert the file description data blocks into 7z data folder blocks, and adding 7z data folder description database nodes in the virtual directory description file; and respectively calling a 7z algorithm for each original file data block to compress and convert the original file data block into a 7z data folder block, and adding a 7z data folder description database node in the virtual directory description file.
In some specific embodiments, the metadata warehouse file is a 16GB metadata warehouse file, and the specific generation manner includes: generating a secondary data file according to the primary data file, wherein the secondary data file is formed by a compressed record data set, and the compressed record data set is compressed after data blocks of the primary data file are sequenced; combining each two-level data file one by one to generate a unique three-level data file, sequencing the three-level data files according to the repetition times, and extracting the top 2 with the most data 31 And recording the entries to generate a maximum 16GB metadata warehouse file.
In some specific embodiments, step S131 is further included before step S14: constructing a first-level compressed data file, wherein the compressed data file comprises file header description information, data blocks and tail alignment data blocks, the file offset distance of the last data block is recorded in the file header description information, and the last data block in the data blocks is a 7z data folder block which is compressed and converted by calling a 7z algorithm for a virtual directory description file; the generation mode of the primary data file in the S14 is as follows: compressing all the template files in S12, S13 and S131 into a primary data file; s14 is followed by S15: and reconstructing the data blocks of all the primary data files according to the size of the metadata by using the metadata warehouse file as an index table to generate a final data release file.
The invention provides a metadata-based data interaction method and a metadata-based data interaction system, and relates to the field of PB-level big data processing and data compression. In order to acquire target data from a metadata publishing website, a user needs to download a corresponding metadata warehouse file first and then download the target data. The metadata publishing website further compresses the bytes of the interactive data through the generated metadata warehouse file on the basis of the compression ratio of the current mainstream compression tool through the algorithm of the invention, thereby improving the utilization ratio of the storage space of the website, and the influence of the decompression speed of the method of the invention is small.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of a metadata-based data interaction method of an embodiment of the present application;
FIG. 2 is a flow diagram of file decompression according to a specific embodiment of the present application;
FIG. 3 is a block diagram of a metadata-based data interaction system according to an embodiment of the present application;
FIG. 4 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows a flowchart of a metadata-based data interaction method according to an embodiment of the present application. As shown in fig. 1, the method includes:
s1: generating a metadata repository file using a metadata compression algorithm, wherein the metadata compression algorithm comprises the steps of:
s11: a virtual directory description file is generated for each target data file. The virtual directory description file is described in an XML file format and used for recording the directory hierarchy of the target file, the basic information of the data file and the file data block decomposition information.
S12: and according to the extension of the single target file, calling a corresponding file decomposition algorithm to decompose the target data file into a plurality of data blocks, and adding the data blocks into the virtual directory description file. Wherein the data blocks comprise file description data blocks, original file data blocks and 7z data folder blocks, and each data block is arranged according to the storage sequence of the original file
S13: and compressing and converting the decomposed data blocks into 7z data folder blocks by using a 7z algorithm, and adding 7z data folder description database nodes in the virtual directory description file. In a specific embodiment, the steps specifically include: after all file description data blocks are merged, calling a 7z algorithm to compress and convert the file description data blocks into 7z data folder blocks, and adding 7z data folder description database nodes in the virtual directory description file; and respectively calling a 7z algorithm for each original file data block to be compressed and converted into a 7z data folder block, and adding a 7z data folder description database node in the virtual directory description file.
S14: and generating a first-level data file, traversing and counting the occurrence times of metadata in data blocks in each compressed data file, and sequencing to generate metadata warehouse files.
In a specific embodiment, step S131 is further included before step S14: and constructing a first-stage compressed data file, wherein the compressed data file comprises file header description information, data blocks and tail alignment data blocks, the file offset distance of the last data block is recorded in the file header description information, and the last data block in the data blocks calls a 7z data folder block compressed and converted by a 7z algorithm for the virtual directory description file. The generation mode of the primary data file is as follows: all the template files in S12, S13, and S131 are compressed into a primary data file.
In a specific embodiment, the metadata warehouse file is a 16GB metadata warehouse file, and the specific generation manner includes:
generating a secondary data file according to the primary data file, wherein the secondary data file is formed by a compressed record data set, and the compressed record data set is compressed after data blocks of the primary data file are sequenced;
merging each secondary data one by oneGenerating a unique three-level data file, sorting the three-level data file according to the repetition times, and extracting the top 2 with the most data 31 And recording the entries to generate a maximum 16GB metadata warehouse file.
In some specific embodiments, the metadata compression algorithm further includes step S15: and reconstructing the data blocks of all the primary data files according to the size of the metadata by using the metadata warehouse file as an index table to generate a final data release file.
S2: for the metadata warehouse file which is downloaded completely, decompressing the file based on a decompression algorithm, wherein the decompression algorithm comprises the following steps:
s21: and restoring the first-level data file, and calling a metadata compression algorithm to decompress the last data block to obtain the virtual directory description file. The primary data file is restored by performing the reverse operation based on the step S15.
S22: decompressing the first data block based on step S14 in the virtual directory description file and metadata compression algorithm, and restoring all file description data blocks;
s23: and (S13) restoring all data file directories based on the virtual directory description file and the metadata compression algorithm to finish file decompression.
The method and the system compress the PB-level data of the total station according to the metadata aiming at the data release network resource station with rich data resources. The basic idea of the invention is as follows: calling a 7z algorithm one by one for compressing all data interactive files to generate a plurality of grouped 7z data Folder blocks, traversing all compressed data, taking 64 bits as the size of metadata to generate 2-degree ordered metadata with the highest metadata utilization rate, establishing a database, and finally re-indexing a target file according to an ordered metadata warehouse to obtain a published file cluster with the compression rate of 30-48%.
For a data receiver in data interaction, before data is downloaded for the first time, a corresponding metadata warehouse must be downloaded first, and then a compressed target file is downloaded. The method specifically comprises the following steps:
the method comprises the following steps of firstly, generating a virtual directory description file, wherein the corresponding virtual directory description file is described by adopting an XML file format and is used for recording the directory hierarchy of a target file, the basic information of a data file and the file data quick decomposition information.
In a specific embodiment, the nodes are divided into directory nodes, file nodes, data block nodes, and 7z data Folder description nodes, and the attributes and contents of the nodes are as follows:
the directory node is < Dn = 'directory name' > file node </D >,
the file node is < F n = "file name" ex = "file type" sid = "inverse operation algorithm ID" uuid = "file unique code" > data block node </F >,
data block node < B t = "data block type" len = "block byte length" >7z data Folder describes the uuid of the node,
the 7Z data Folder describes node < Z uuid = "file unique code" arg = "7Z compression parameter" len = "block byte length" > data block node uuid,. ·. </Z >,
the data block types are divided into three file description data blocks, original file data blocks and 7z data folder blocks; the content of the file description data block mainly consists of text information; the original file data block is a random binary data byte information block; the 7z data folder block is 7z standard folder format data that has been compression processed using the 7z algorithm.
And secondly, calling a corresponding file decomposition algorithm by the target file according to the extension name of the target file, decomposing the target data file into a plurality of data blocks, and adding the data blocks into the virtual directory description file.
In a specific embodiment, each data block is arranged according to the storage sequence of the original file; the decomposed data blocks are divided into three types of 'file description' data blocks, original file data blocks and 7z data folder blocks; for example: xml, txt, docx, xlsx, http and the like, and the plain text file directly marks a data block of 'file description'; the 7z file is decomposed into a header description file block, a tail folder description file block and a plurality of 7z data folder blocks according to a 7z file format; the mp4 file is decomposed into a plurality of box data blocks according to an mp4 file format, the type mark bit original file data blocks, wherein the box data blocks of ftyp and moov types which start describing function are registered as file description data blocks; and other boxes are original file data blocks, and so on, so as to construct file decomposition algorithms for various data types.
And thirdly, after all the file description data blocks are combined, calling a 7z algorithm, compressing and converting the file description data blocks into 7z data Folder blocks, and adding 7z data Folder description database nodes into the virtual directory description file.
And fourthly, calling a 7z algorithm for each original file data block respectively, compressing and converting the original file data block into a 7z data Folder block, and adding a 7z data Folder description database node into the virtual directory description file.
And fifthly, constructing a first-level compressed data file. The compressed data file is divided into file header description information, data blocks and tail alignment data blocks. The last data block in the data blocks is a 7z compression folder information block of the virtual directory description file; a file header description information header for recording a file offset distance (preset 36 bytes) of the last data block to support a PB-level file; the 7z compression folder information block of the virtual directory description file calls a 7z algorithm for the virtual directory description file, and is compressed and converted into a 7z data folder block; the tail aligned data block is binary data according to the data block, and the length of the tail aligned data block is determined by dividing the total number of bytes of the data block by the remainder of 8. After the second to fifth steps, the template file is compressed into a primary data file.
And sixthly, counting the occurrence times of the metadata in the data blocks in each compressed data file in a traversing manner, and generating the metadata warehouse files with 16GB bytes in sequence. The specific algorithm comprises the following steps:
step one, generating a secondary data file according to the primary data file. The secondary data file is composed of a compressed record data set; after the compressed record data set is sorted by the data blocks of the primary file, the compressed record data set is compressed into the following format:
{
{ metadata, number of repetitions (8 bytes 128 bit statistics) } 16 bytes per stripe
{ metadata, number of repetitions }
..
};
And step two, combining the secondary data files one by one to generate a unique tertiary data file. The third-level data file is formed by a compressed record data set; in the merging process, if the number of data set records of the three-level data file is about 2, and the number of the data set records is 32, the metadata record with the least repeated times of losing the cards is recorded;
and step three, sequencing the three-level data files according to the repetition times, extracting the top 2 31 square records at most, ensuring the extraction, and generating a maximum 16GB metadata warehouse file.
And seventhly, reconstructing the data blocks of all the primary data files according to the size of the metadata by using the metadata warehouse file as an index table to generate a final data release file.
In a specific embodiment, if metadata appears in a data block, the record is a metadata index, otherwise beginning with 128, 8 bytes of original data:
Figure BDA0003899368420000081
the downloading and decompressing process of the present invention adopts a decompression algorithm, and fig. 2 shows a file decompression flowchart of a specific embodiment of the present application, which is performed according to the following steps as shown in fig. 2:
firstly, ensuring that the metadata warehouse file is completely downloaded;
secondly, performing inverse operation by using the seventh step of the compression algorithm to restore the data into a primary data file;
thirdly, calling a compression algorithm, decompressing the last data block to obtain a virtual directory description file
Fourthly, according to the fifth step in the virtual directory description file and the compression algorithm, decompressing the first data block and restoring all the file description data blocks
And fifthly, restoring all data files, namely directories according to the virtual directory description file and the fourth step in the compression algorithm, and completing file decompression.
In a specific embodiment, the inventors of the present application performed relevant tests on the aforementioned method: there are 2 separate object files, a.7z file at 50GB and b.mp4 file at 34 GB. Three files, a 16GB metadata warehouse file, an A.7z.y file 25GB, and a 15GB B.MP $. Y file, are formed, calculated according to the aforementioned compression steps of the present invention; if the user downloads the file A.7z.y for the first time and needs to download 31GB of data, the compression ratio is obtained to be 62%. If the user has downloaded the metadata warehouse file and then downloads the B file, 15GB of data is needed to be downloaded altogether, and the compression ratio is 44%. If the user downloads the metadata warehouse file and then downloads the file A and the file B, 40GB of data is required to be downloaded, and the compression ratio is 54%. The method further compresses the bytes of the interactive data through the generated metadata warehouse file on the basis of the compression rate of the current mainstream compression tool, thereby improving the utilization rate of the storage space of the website, and the influence of the decompression speed of the method is small.
With continued reference to FIG. 3, FIG. 3 illustrates a block diagram of a metadata-based data interaction system, in accordance with an embodiment of the present invention. The system comprises in particular a compression unit 301 and a decompression unit 302. The compression unit 301 is configured to generate a metadata repository file using a metadata compression algorithm, wherein the metadata compression algorithm comprises the steps of: s11: generating a virtual directory description file for each target data file; s12: according to the extension of the single target file, calling a corresponding file decomposition algorithm to decompose the target data file into a plurality of data blocks, and adding the data blocks into the virtual directory description file; s13: compressing and converting the decomposed data blocks into 7z data folder blocks by using a 7z algorithm, and adding 7z data folder description database nodes in the virtual directory description file; s14: generating a first-level data file, performing traversal statistics on the occurrence times of metadata in data blocks in each compressed data file, and sequencing to generate metadata warehouse files; the decompression unit 302 is configured to decompress the file based on a decompression algorithm with respect to the metadata warehouse file that has been downloaded, including: s21: restoring the first-level data file, and calling a metadata compression algorithm to decompress the last data block to obtain a virtual directory description file; s22: decompressing the first data block based on step S14 in the virtual directory description file and metadata compression algorithm, restoring all file description data blocks; s23: and S13, restoring all data file directories based on the virtual directory description file and the metadata compression algorithm to finish file decompression.
Referring now to FIG. 4, shown is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.
As shown in fig. 4, the computer system includes a Central Processing Unit (CPU) 401, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. Note that the computer-readable storage medium of the present application can be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: generating a metadata repository file using a metadata compression algorithm, wherein the metadata compression algorithm comprises the steps of: generating a metadata repository file using a metadata compression algorithm, comprising: generating a virtual directory description file for each target data file; decomposing a target data file into a plurality of data blocks and adding the data blocks into a description file; compressing and converting the decomposed data blocks into 7z data folder blocks, and adding 7z data folder description database nodes in the description file; generating a first-level data file, traversing and counting the occurrence times of metadata in data blocks in the compressed data file, and sequencing to generate metadata warehouse files; for the metadata warehouse file which is downloaded completely, decompressing the file based on a decompression algorithm, wherein the decompression algorithm comprises the following steps: restoring the first-level data file, decompressing the last data block and obtaining a virtual directory description file; decompressing the first data block and restoring all file description data blocks; and restoring all data file directories to finish file decompression.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (16)

1. A data interaction method based on metadata is characterized by comprising the following steps:
s1: generating a metadata repository file using a metadata compression algorithm, wherein the metadata compression algorithm comprises the steps of:
s11: generating a virtual directory description file for each target data file;
s12: according to the extension of a single target file, calling a corresponding file decomposition algorithm to decompose the target data file into a plurality of data blocks, and adding the data blocks into the virtual directory description file;
s13: compressing and converting the decomposed data blocks into 7z data folder blocks by using a 7z algorithm, and adding 7z data folder description database nodes in the virtual directory description file;
s14: generating a first-level data file, performing traversal statistics on the occurrence times of metadata in data blocks in each compressed data file, and sequencing to generate metadata warehouse files;
s2: for the metadata warehouse file which is downloaded completely, decompressing the file based on a decompression algorithm, comprising:
s21: restoring the first-level data file, and calling the metadata compression algorithm to decompress the last data block to obtain the virtual directory description file;
s22: decompressing the first data block based on the virtual directory description file and the step S14 in the metadata compression algorithm, restoring all file description data blocks;
s23: and based on the virtual directory description file and the step S13 in the metadata compression algorithm, restoring all data file directories to finish file decompression.
2. The metadata-based data interaction method according to claim 1, wherein the virtual directory description file is described in an XML file format, and is used for recording a directory hierarchy of the target file and basic information of the data file, and file data block decomposition information.
3. The metadata-based data interaction method of claim 1, wherein the data blocks comprise a file description data block, an original file data block and a 7z data folder block, and each of the data blocks is arranged according to a storage order of an original file.
4. The metadata-based data interaction method according to claim 3, wherein the S13 comprises:
after all the file description data blocks are merged, calling a 7z algorithm to compress and convert the file description data blocks into 7z data folder blocks, and adding 7z data folder description database nodes in the virtual directory description file;
and respectively calling a 7z algorithm to compress and convert each original file data block into a 7z data folder block, and adding a 7z data folder description database node in the virtual directory description file.
5. The metadata-based data interaction method according to claim 1, wherein the metadata warehouse file is a 16GB metadata warehouse file, and the specific generation manner includes:
generating a secondary data file according to the primary data file, wherein the secondary data file is formed by a compressed record data set, and the compressed record data set is compressed after data blocks of the primary data file are sequenced;
combining the second-level data files one by one to generate a unique third-level data file, sequencing the third-level data files according to the repetition times, and extracting the top 2 with the most data 31 And recording the entries to generate a maximum 16GB metadata warehouse file.
6. The metadata-based data interaction method according to claim 1, wherein the metadata compression algorithm further comprises step S15: and using the metadata warehouse file as an index table to reconstruct the data blocks of all the primary data files according to the size of metadata to generate a final data release file.
7. The metadata-based data interaction method according to claim 1, wherein said S14 is preceded by a step S131 of: and constructing a first-level compressed data file, wherein the compressed data file comprises file header description information, data blocks and tail alignment data blocks, the file offset distance of the last data block is recorded in the file header description information, and the last data block in the data blocks calls a 7z data folder block which is compressed and converted by a 7z algorithm for the virtual directory description file.
8. The metadata-based data interaction method of claim 7, wherein the primary data file is generated in a manner of: all the template files in S12, S13, and S131 are compressed into a primary data file.
9. The method for data interaction based on metadata as claimed in claim 6, wherein in S21, based on the step S15, an inverse operation is performed to restore the primary data file.
10. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any of claims 1 to 9.
11. A metadata-based data interaction system, the system comprising:
a compression unit configured to generate a metadata repository file using a metadata compression algorithm, wherein the metadata compression algorithm comprises the steps of: s11: generating a virtual directory description file for each target data file; s12: calling a corresponding file decomposition algorithm to decompose the target data file into a plurality of data blocks according to the extension of the single target file, and adding the data blocks into the virtual directory description file; s13: compressing and converting the decomposed data blocks into 7z data folder blocks by using a 7z algorithm, and adding 7z data folder description database nodes in the virtual directory description file; s14: generating a first-level data file, performing traversal statistics on the occurrence times of metadata in data blocks in each compressed data file, and sequencing to generate metadata warehouse files;
a decompression unit: for the metadata warehouse file which is downloaded completely, decompressing the file based on a decompression algorithm, comprising: s21: restoring the primary data file, and calling the metadata compression algorithm to decompress the last data block to obtain the virtual directory description file; s22: decompressing a first data block based on the virtual directory description file and step S14 in the metadata compression algorithm, restoring all file description data blocks; s23: and based on the virtual directory description file and the step S13 in the metadata compression algorithm, restoring all data file directories to finish file decompression.
12. The metadata-based data interaction system of claim 11, wherein the virtual directory description file is described in an XML file format, and is used for recording a directory hierarchy of a target file, basic information of a data file, and file data block decomposition information; the data blocks comprise file description data blocks, original file data blocks and 7z data folder blocks, and each data block is arranged according to the storage sequence of the original file.
13. The metadata-based data interaction system of claim 12, wherein the S13 comprises: after all the file description data blocks are merged, calling a 7z algorithm to compress and convert the file description data blocks into 7z data folder blocks, and adding 7z data folder description database nodes in the virtual directory description file; and respectively calling a 7z algorithm to compress and convert each original file data block into a 7z data folder block, and adding a 7z data folder description database node in the virtual directory description file.
14. The metadata-based data interaction system of claim 11, wherein the metadata warehouse file is a 16GB metadata warehouse file, and the specific generation manner includes: generating a secondary data file according to the primary data file, wherein the secondary data file is formed by a compressed record data set, and the compressed record data set is compressed after data blocks of the primary data file are sequenced; merging each second-level data file one by one to generate a unique third-level data file, and according to the three-level data fileSorting the repetition times to extract the top 2 with the most number 31 And recording the entries to generate a maximum 16GB metadata warehouse file.
15. The metadata-based data interaction system of claim 11, wherein said S14 is preceded by the step S131 of: constructing a first-level compressed data file, wherein the compressed data file comprises file header description information, data blocks and tail alignment data blocks, the file offset distance of the last data block is recorded in the file header description information, and the last data block in the data blocks is a 7z data folder block which is obtained by calling 7z algorithm for compression conversion of the virtual directory description file; the generation mode of the first-level data file in the S14 is as follows: compressing all the template files in S12, S13 and S131 into a primary data file; the step S14 is followed by step S15: and reconstructing the data blocks of all the primary data files according to the size of the metadata by using the metadata warehouse file as an index table to generate a final data release file.
16. The metadata-based data interaction system of claim 15, wherein in S21, based on the step S15, an inverse operation is performed to restore the primary data file.
CN202211285466.1A 2022-10-20 2022-10-20 Metadata-based data interaction method and system Pending CN115454948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211285466.1A CN115454948A (en) 2022-10-20 2022-10-20 Metadata-based data interaction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211285466.1A CN115454948A (en) 2022-10-20 2022-10-20 Metadata-based data interaction method and system

Publications (1)

Publication Number Publication Date
CN115454948A true CN115454948A (en) 2022-12-09

Family

ID=84311467

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211285466.1A Pending CN115454948A (en) 2022-10-20 2022-10-20 Metadata-based data interaction method and system

Country Status (1)

Country Link
CN (1) CN115454948A (en)

Similar Documents

Publication Publication Date Title
US10901950B2 (en) Efficient data compression and analysis as a service
CN101777056B (en) Data storage method and device
CN108733317B (en) Data storage method and device
CN107395209B (en) Data compression method, data decompression method and equipment thereof
CN110147239B (en) Method, equipment and storage medium for multiple compression of game installation inclusion
US20050027731A1 (en) Compression dictionaries
CN112527752B (en) Data compression method, data compression device, computer readable storage medium and electronic equipment
CN113742518B (en) Methods, apparatus and computer program products for storing and providing video
CN106503003A (en) A kind of compression of expandable mark language XML document, decompressing method and device
CN112948486A (en) Batch data synchronization method and system and electronic equipment
CN105975495A (en) Big data storage and search method and apparatus
CN113553300A (en) File processing method and device, readable medium and electronic equipment
CN117540021B (en) Large language model training method, device, electronic equipment and computer readable medium
CN110505289B (en) File downloading method and device, computer readable medium and wireless communication equipment
CN110019985B (en) Index file establishing and inquiring methods and devices
CN117633197B (en) Search information generation method and device applied to paraphrasing document and electronic equipment
CN115454948A (en) Metadata-based data interaction method and system
CN111726475A (en) Video processing method, system, electronic device and storage medium
CN102750287A (en) Method for including index information and download authentication server
CN114840634B (en) Information storage method and device, electronic equipment and computer readable medium
JP5194936B2 (en) File conversion apparatus, file conversion method, and program
CN113448920B (en) Method, apparatus and computer program product for managing indexes in a storage system
US9189488B2 (en) Determination of landmarks
CN115080571A (en) Index updating method and device, electronic equipment and computer readable medium
CN110545108B (en) Data processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination