[go: up one dir, main page]

CN119315994A - Data compression method, data decompression method, device, equipment and storage medium - Google Patents

Data compression method, data decompression method, device, equipment and storage medium Download PDF

Info

Publication number
CN119315994A
CN119315994A CN202411256964.2A CN202411256964A CN119315994A CN 119315994 A CN119315994 A CN 119315994A CN 202411256964 A CN202411256964 A CN 202411256964A CN 119315994 A CN119315994 A CN 119315994A
Authority
CN
China
Prior art keywords
data
character string
coding
encoding
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411256964.2A
Other languages
Chinese (zh)
Inventor
宋法志
杨彭年
李彬
于冬梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Xiongan ICT Co Ltd
China Mobile System Integration Co Ltd
China Mobile Information System Integration Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Xiongan ICT Co Ltd
China Mobile System Integration Co Ltd
China Mobile Information System Integration Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Xiongan ICT Co Ltd, China Mobile System Integration Co Ltd, China Mobile Information System Integration Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202411256964.2A priority Critical patent/CN119315994A/en
Publication of CN119315994A publication Critical patent/CN119315994A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a data compression method which is used for solving the problem of low compression efficiency in the existing data compression scheme. The method comprises the steps of conducting block processing on original data to obtain at least two data blocks, constructing an encoding dictionary corresponding to the original data, searching matching character strings in the data blocks according to the encoding dictionary, determining matching positions and character string lengths corresponding to the matching character strings, conducting encoding on the data blocks according to the encoding dictionary, the matching positions and the character string lengths to obtain first encoding character strings corresponding to the data blocks, conducting encoding on the matching positions and the character string lengths to obtain second encoding character strings corresponding to the data blocks, conducting predictive compression encoding on the data blocks according to the first encoding character strings and the second encoding character strings to obtain block compression data corresponding to the data blocks, and combining the block compression data to obtain compression data corresponding to the original data.

Description

Data compression method, data decompression method, device, equipment and storage medium
Technical Field
The present application relates to the field of data compression, and in particular, to a data compression method, a data decompression method, a device, an apparatus, a storage medium, and a computer program product.
Background
With the rapid development of internet technology and cloud computing technology, services and applications surrounding networks and data are explosive, so that users can conveniently produce daily life, and simultaneously, massive data are generated by using the services and the applications, and the storage and transmission costs of the massive data are very high. In order to reduce the loss of data transmission and storage resources, the common scheme at present is to perform data transmission and storage after compressing massive data.
The data compression refers to a technical method for reducing the data volume to reduce the storage space and improve the transmission, storage and processing efficiency of the data or reorganizing the data according to a certain algorithm on the premise of not losing useful information and reducing the data redundancy and the storage space.
The existing data compression scheme is mainly implemented by encoding and decoding original data, specifically, in the data compression process, a sender encodes the original data by using a compression algorithm, converts the original data into a smaller representation form, and a receiver decodes the compressed data by using the same compression algorithm to recover the original data. Common data compression algorithms include huffman coding algorithms, LZ77 coding algorithms, and the like. The algorithms use different strategies for data compression according to the characteristics and structure of the data, so as to achieve the purpose of reducing the storage and transmission cost.
However, in the encoding compression process, the existing data compression algorithm needs to sequentially calculate characters or symbols corresponding to the original data, which results in slower compression speed and lower data compression rate.
Therefore, how to improve the data compression efficiency and the data compression rate is a problem to be solved at present.
Disclosure of Invention
The embodiment of the application provides a data compression method which is used for solving the problem of lower compression efficiency in the existing data compression scheme.
The embodiment of the application also provides a data decompression method which is used for solving the problem of lower compression efficiency of the existing data compression scheme.
The embodiment of the application also provides a data compression device which is used for solving the problem of lower compression efficiency in the existing data compression scheme.
The embodiment of the application also provides data compression equipment, which is used for solving the problem of lower compression efficiency in the existing data compression scheme.
The embodiment of the application also provides a computer readable storage medium which is used for solving the problem of low compression efficiency existing in the existing data compression scheme.
The embodiment of the application also provides a computer program product which is used for solving the problem of lower compression efficiency existing in the existing data compression scheme.
The embodiment of the application adopts the following technical scheme:
A data compression method comprises the steps of conducting block processing on original data to obtain at least two data blocks, constructing an encoding dictionary corresponding to the original data, searching matching character strings in the data blocks according to the encoding dictionary, determining matching positions and character string lengths corresponding to the matching character strings, conducting encoding on the data blocks according to the encoding dictionary, the matching positions and the character string lengths to obtain first encoding character strings corresponding to the data blocks, conducting encoding on the matching positions and the character string lengths to obtain second encoding character strings corresponding to the data blocks, conducting predictive compression encoding on the data blocks according to the first encoding character strings and the second encoding character strings to obtain block compression data corresponding to the data blocks, and combining the block compression data to obtain compression data corresponding to the original data.
A data decompression method comprises the steps of decompressing compressed data to be decompressed to obtain coding dictionary information, coding character strings and coding mode information corresponding to the compressed data, constructing an empty decoding dictionary database according to the coding dictionary information, decoding the coding character strings according to the coding mode information to obtain matching positions, character string lengths and un-decoded character strings, and decoding the un-decoded character strings according to the coding dictionary information, the matching positions and the character string lengths to obtain original data corresponding to the compressed data.
The data compression device comprises a block unit, a dictionary construction unit, a coding unit, a matching unit, a coding unit and a compression unit, wherein the block unit is used for carrying out block processing on original data to obtain at least two data blocks, the dictionary construction unit is used for constructing a coding dictionary corresponding to the original data, the coding unit is used for respectively coding each data block according to the coding dictionary to obtain a first coding character string corresponding to the data block, the matching unit is used for respectively searching for the matching character string in each data block according to the coding dictionary and determining a matching position and a character string length corresponding to the matching character string, the coding unit is used for coding the matching position and the character string length to obtain a second coding character string corresponding to each data block, the compression unit is used for respectively carrying out predictive compression coding on each data block according to the first coding character string and the second coding character string to obtain block compression data corresponding to each data block, and the compression unit is used for combining the block compression data to obtain compression data corresponding to the original data.
The data decompression device comprises a decompression unit, a dictionary construction unit, a decoding unit and a decoding unit, wherein the decompression unit is used for decompressing compressed data to be decompressed to obtain coding dictionary information, a coding character string and coding mode information corresponding to the compressed data, the dictionary construction unit is used for constructing an empty decoding dictionary database according to the coding dictionary information, the decoding unit is used for decoding the coding character string according to the coding mode information to obtain a first decoding character string and an un-decoding character string, the decoding unit is used for decoding the un-decoding character string according to a range coding method to obtain a matching position and a character string length, and the decoding unit is used for performing secondary decoding on the first decoding character string according to the coding dictionary information, the matching position and the character string length to obtain original data corresponding to the compressed data.
A data compression device comprises a processor and a memory arranged to store computer executable instructions, wherein the executable instructions when executed enable the processor to perform block processing on original data to obtain at least two data blocks, construct a coding dictionary corresponding to the original data, search matching character strings in the data blocks according to the coding dictionary respectively, determine matching positions and character string lengths corresponding to the matching character strings, encode the data blocks according to the coding dictionary, the matching positions and the character string lengths respectively to obtain first coding character strings corresponding to the data blocks, encode the matching positions and the character string lengths to obtain second coding character strings corresponding to the data blocks, predict and compress the data blocks according to the first coding character strings and the second coding character strings to obtain block compressed data corresponding to the data blocks, and combine the block compressed data to obtain compressed data corresponding to the original data.
The computer readable storage medium stores one or more programs that, when executed by an electronic device that includes a plurality of application programs, cause the electronic device to perform a block processing on original data to obtain at least two data blocks, construct an encoding dictionary corresponding to the original data, search matching strings in the data blocks according to the encoding dictionary, respectively, determine matching positions and string lengths corresponding to the matching strings, encode the data blocks according to the encoding dictionary, the matching positions and the string lengths, respectively, obtain first encoding strings corresponding to the data blocks, encode the matching positions and the string lengths, obtain second encoding strings corresponding to the data blocks, respectively, perform predictive compression encoding on the data blocks according to the first encoding strings and the second encoding strings, respectively, obtain compressed data corresponding to the data blocks, and perform compression encoding on the compressed data corresponding to the data blocks.
A computer program product comprises a computer program, wherein the computer program is implemented when being executed by a processor and is used for carrying out block processing on original data to obtain at least two data blocks, constructing a coding dictionary corresponding to the original data, searching matching character strings in the data blocks according to the coding dictionary, determining matching positions and character string lengths corresponding to the matching character strings, respectively coding the data blocks according to the coding dictionary, the matching positions and the character string lengths to obtain first coding character strings corresponding to the data blocks, coding the matching positions and the character string lengths to obtain second coding character strings corresponding to the data blocks, respectively carrying out predictive compression coding on the data blocks according to the first coding character strings and the second coding character strings to obtain block compression data corresponding to the data blocks, and combining the block compression data to obtain compression data corresponding to the original data.
The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects:
According to the data compression method provided by the embodiment of the application, before data transmission, a communication system can carry out block processing on original data to be transmitted to obtain at least two data blocks, each data block is respectively encoded in a mode of constructing an encoding dictionary to obtain a first encoding character string corresponding to the data block, meanwhile, according to the encoding dictionary, matching character strings in each data block are respectively searched, the matching position and the character string length corresponding to the matching character strings are determined, the range encoding method is utilized to carry out secondary encoding on the matching position and the character string length to obtain a second encoding character string, and finally, the first encoding character string and the second character string are subjected to compression processing by utilizing a predictive compression encoding method to obtain final compressed data. By adopting the data compression method provided by the scheme, as the original data are sequentially encoded and compressed by using a plurality of encoding methods (an encoding dictionary, range encoding and predictive compression encoding), compared with the existing compression scheme which only uses a single encoding method, the data compression method provided by the scheme can realize data compression with high compression rate, on one hand, the data compression efficiency is improved, and on the other hand, the occupation of bandwidth resources in the data transmission process is greatly reduced and the data transmission efficiency is improved by transmitting the compressed data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a specific flow chart of a data compression method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a data decompression method according to an embodiment of the present application
Fig. 3 is a schematic diagram of a specific structure of a data compression device according to an embodiment of the present application;
Fig. 4 is a schematic diagram of a specific structure of a data decompression device according to an embodiment of the present application;
fig. 5 is a schematic diagram of a specific structure of a data compression device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
The data compression method provided by the embodiment of the application is used for solving the problem of lower compression efficiency in the existing data compression scheme.
The execution body of the data compression method provided by the embodiment of the application can be at least one of a communication server, a data transmission server, a data storage server, a video server, an audio server and the like, and in addition, the execution body of the method can also be an Application (APP) running on the servers or a system itself.
For convenience of description, an embodiment of the method will be described below taking an execution body of the method as an example of a communication system for performing data transmission. It will be appreciated that the subject of the method is merely an exemplary illustration of a communication system and should not be construed as limiting the method.
The specific implementation flow diagram of the data compression method provided by the application is shown in fig. 1, and mainly comprises the following steps:
Step 11, performing block processing on the original data to obtain at least two data blocks;
The original data is data to be transmitted, which is acquired by the communication system, and in order to improve the data transmission efficiency and reduce bandwidth resources occupied in the data transmission process, the communication system compresses the original data to be transmitted before the data transmission.
In order to avoid the problem that the time consumed for directly encoding and compressing the original data is long because the data volume of the original data is large, in the embodiment of the application, the original data can be divided into a plurality of data blocks, and then the communication system can simultaneously encode and compress the data blocks when the data encoding and compressing are carried out, so that the data compression efficiency is improved.
Step 12, constructing a coding dictionary corresponding to the original data;
in the embodiment of the application, the communication system can specifically construct the coding dictionary according to the following substeps, including:
Sub-step 1201, constructing an empty original coding dictionary;
a sub-step 1202 of initializing an original coding dictionary constructed by performing sub-step 1201;
specifically, the initializing operation of the original coding dictionary includes:
1. A dictionary database (or dictionary data table) is created, wherein the constructed dictionary database (or dictionary data table) typically includes at least three fields, a reference tag, a sample string, a hash value, or other functional field for retrieving the dictionary database.
2. An initial character string is preset, and the initial character string can have any initial value, for example, the initial value can be "space" and the like.
Sub-step 1203, scanning the original data, and sequentially obtaining at least one character from the original data to obtain a character string to be processed.
In the embodiment of the application, the communication system sequentially reads all characters contained in the original data through scanning, and adds the read characters to the data buffer area for subsequent processing.
Sub-step 1204, judging whether the character string to be processed has a matching sub-character string in the coding dictionary database;
Specifically, when the communication system scans the original data to be transmitted for the first time, the communication system may read the first character in the original data, and add the read character to the data buffer area, and at the same time, find whether there is a record corresponding to the character in the dictionary database, and execute the sub-step 1205 when the determination result is yes;
Sub-step 1205, when the judgment result is yes, generating a matching character string according to the character string to be processed;
When it is determined that the record of the character string to be processed exists in the encoding dictionary database, the character string to be processed may be reserved in the data buffer area, and the sub-steps 1203 to 1204 may be executed in a circulating manner, and the new character string is continuously obtained from the original data as a second character string to be processed, and the second character string to be processed is added to the tail of the character string currently cached in the data buffer area, so as to obtain the new character string to be processed, and further determine whether the matching sub-character string exists in the encoding dictionary database for the new character string to be processed.
Specifically, when it is determined that there is a record of a character string to be processed in the dictionary database, the communication system may move the data pointer (or the sliding time window) backward by at least one character position, and then the communication system may read a new character string to be processed according to the position pointed by the moved pointer (or the area included in the sliding time window), and continue to execute the substep 1204 until the communication system reads a character string not recorded in the dictionary database in the original data (for convenience of description, the character string not recorded in the dictionary database is hereinafter referred to as an unrecorded character string). At this time, in the data buffer, the character strings before the unrecorded character string are all recorded in the code dictionary database, so that the character strings do not need to be repeatedly recorded in the code dictionary database, and the character strings before the unrecorded character string can be further used as matching character strings, and the matching character strings can be deleted from the data buffer.
Sub-step 1206, when the judgment result is no, adding the character string to be processed into the coding dictionary database;
And when the judgment result is negative, indicating that the character string to be processed is not recorded in the code dictionary database, and storing the character string to be processed into the code dictionary database to complete updating and supplementing of the code dictionary database. After storing the character string to be processed in the dictionary database, the communication system may empty the character string temporarily cached in the data cache area, and perform the above sub-steps 1203 to 1204 in a loop.
Sub-step 1207, completing scanning of all characters in the original data, and obtaining the coding dictionary corresponding to the original data.
By scanning all characters in the original data, the communication system may use the character strings contained in the original data to update and refine the dictionary database initialized by performing the sub-step 1202, thereby obtaining the dictionary corresponding to the original data.
Step 13, searching for matching character strings in each data block according to the coding dictionary, and determining matching positions and character string lengths corresponding to the matching character strings;
In one embodiment, the communication information may search for repeated character strings in each data block according to the coding dictionary, use the repeated character strings as matching character strings, and record the position of each matching character string in the data block and the character string length in the coding string.
Step 14, coding each data block according to the coding dictionary, the matching position and the character string length to obtain a first coding character string corresponding to the data block;
after determining the coding dictionary corresponding to the original data by performing step 12, the communication system may perform data coding on each data block obtained by performing step 11 according to the coding dictionary, so as to implement first coding compression on each data block.
In particular, in one embodiment, a communication system may encode a data chunk by the sub-steps comprising:
Sub-step 1401, for characters in each data block, sequentially searching whether the characters are recorded in the coding dictionary;
Sub-step 1402, when judging that the character is recorded in the coding dictionary, converting the character into a corresponding code according to the code in the coding dictionary, and completing the coding of the character in the data block;
Sub-step 1403, when it is determined that the character is not recorded in the encoding dictionary, may convert the character into a binary number of a specific number of bits according to the encoding algorithm, complete encoding of the character, and add the character and the encoding corresponding to the character to the encoding dictionary.
Sub-step 1404 replaces the repeated strings in the encoded string with the matching positions of the repeated strings (i.e., the matching strings) and the string length to complete the first compression encoding of the data block, thereby obtaining the first encoded string corresponding to the data block.
Specifically, the first code character string records the codes corresponding to the characters in the data block, and records the codes corresponding to the repeated characters only once for the repeated characters, and records the positions of the repeated characters and the character string length.
Step 15, performing secondary coding on the matching position and the character string length in the first coding character string to obtain a second coding character string corresponding to each data block;
In order to further improve the compression rate, in the embodiment of the present application, the matching position and the string length in the first encoded string obtained through the first encoding may be secondarily encoded, so as to further reduce the data capacity.
In the embodiment of the present application, the communication system may use a range encoding method (also referred to as interval encoding) to perform secondary encoding on the matching position and the string length in the first encoded string. Specifically, a range coding method is adopted, wherein a sufficiently large integer range and probability estimation of symbol or number occurrence can be given, wherein an initial interval is easily segmented into subintervals proportional to the represented symbol probability, and then a current interval is segmented into subintervals corresponding to the probability of the next symbol to be coded, and each symbol in a message can be coded by the method.
In the embodiment of the application, the matching position and the character string length in the first coding character string can be subjected to secondary coding by the following method, wherein the method comprises the steps of determining a coding range according to the matching position and the character string length, and coding the matching position and the character string length according to the coding range to obtain a second coding character string corresponding to each data block.
In one embodiment, the communication system can specifically determine the coding range according to the following method, wherein the method comprises the steps of constructing a probability model, determining the occurrence probability of the matching position and the character string length according to the probability model, and determining the coding range according to the occurrence probability.
Specifically, the communication system may calculate the occurrence frequency of the matching position and the string length of the matching string determined in the step 14 through the constructed probability model, thereby determining the occurrence probability of the matching position and the string length corresponding to each matching string, and may map the cumulative probability between the ranges [0,1] according to the occurrence probability of the matching string by using the cumulative probability distribution function (Cumulative Distribution Function, CDF) for each matching position and the cumulative probability of the string length, thereby obtaining the coding range.
It should be noted that, the range encoding method belongs to a technical means commonly used in the field, and a specific procedure of range encoding in the embodiment of the present application is not described again.
Step 16, respectively carrying out predictive compression coding on each data block according to the first coding character string and the second coding character string to obtain block compressed data corresponding to each data block;
Among them, predictive compression coding is a data compression technique that mainly uses correlation in signal or image data for compression. The technique is based on the core idea that if there is a strong correlation between certain parts of a signal or image, we can reduce the amount of data necessary by storing or transmitting only the differences between the actual data and the predicted data, thereby achieving compression of the data.
In one embodiment, the specific implementation manner of step 16 may include respectively predicting the first encoding string and the second encoding string according to a prediction model to obtain a prediction string, determining a prediction error according to the prediction string and an actual string, and encoding the prediction error according to a preset compression algorithm to obtain block compression data corresponding to each data block.
In one embodiment, the prediction model may be constructed according to a transducer model, and the training method of the specific prediction model is as follows:
Sub-step 1601, constructing a set of prediction samples from the historical transmitted data;
Sub-step 1602, extracting features from each sample data to be predicted in the prediction sample set to obtain feature data;
Sub-step 1603, inputting the extracted feature data into a pre-constructed transducer model for forward propagation training to obtain a first prediction result;
Sub-step 1604, calculating an error of the first predicted result and the actual result according to a preset error loss function;
sub-step 1605, performing back propagation according to the error, updating the weight of the transducer model, and repeating the above steps until reaching the preset condition, thereby obtaining the trained prediction model.
It should be noted here that, the specific construction of the prediction model and the training method in the embodiment of the present application are not limited.
And step 17, combining the segmented compressed data to obtain compressed data corresponding to the original data.
According to the data compression method provided by the embodiment of the application, before data transmission, a communication system can carry out block processing on original data to be transmitted to obtain at least two data blocks, each data block is respectively encoded in a mode of constructing an encoding dictionary to obtain a first encoding character string corresponding to the data block, meanwhile, according to the encoding dictionary, matching character strings in each data block are respectively searched, the matching position and the character string length corresponding to the matching character strings are determined, the range encoding method is utilized to carry out secondary encoding on the matching position and the character string length to obtain a second encoding character string, and finally, the first encoding character string and the second character string are subjected to compression processing by utilizing a predictive compression encoding method to obtain final compressed data. By adopting the data compression method provided by the scheme, as the original data are sequentially encoded and compressed by using a plurality of encoding methods (an encoding dictionary, range encoding and predictive compression encoding), compared with the existing compression scheme which only uses a single encoding method, the data compression method provided by the scheme can realize data compression with high compression rate, on one hand, the data compression efficiency is improved, and on the other hand, the occupation of bandwidth resources in the data transmission process is greatly reduced and the data transmission efficiency is improved by transmitting the compressed data.
In an implementation manner, the embodiment of the application further provides a data decompression method, which is used for solving the problem of low compression efficiency existing in the existing data compression scheme.
The specific implementation flow diagram of the data decompression method provided by the application is shown in fig. 2, and mainly comprises the following steps:
Step 21, decompressing the compressed data to be decompressed to obtain coding dictionary information, coding character strings and coding mode information corresponding to the compressed data;
in one embodiment, for the received compressed data, the communication system may parse the compressed header of the compressed data, thereby obtaining key parameters such as coding dictionary information, coding character strings, and coding mode information corresponding to the compressed data, for use in a subsequent decompression process.
Step 22, constructing an empty decoding dictionary database according to the coding dictionary information;
In the embodiment of the application, the communication system can create an empty decoding dictionary database according to the coding dictionary information in the compression header, and store the decompressed data in the decompression process through the decoding dictionary database for subsequent matching and prediction operations.
Step 23, decoding the encoded character string according to the encoding mode information to obtain a first decoded character string and an undecoded character string;
Step 24, decoding the undecoded character string according to a range coding method to obtain a matching position and a character string length;
And step 25, performing secondary decoding on the first decoding character string according to the coding dictionary information, the matching position and the character string length to obtain the original data corresponding to the compressed data.
According to the data compression method provided by the embodiment of the application, before data transmission, a communication system can carry out block processing on original data to be transmitted to obtain at least two data blocks, each data block is respectively encoded in a mode of constructing an encoding dictionary to obtain a first encoding character string corresponding to the data block, meanwhile, according to the encoding dictionary, matching character strings in each data block are respectively searched, the matching position and the character string length corresponding to the matching character strings are determined, the range encoding method is utilized to carry out secondary encoding on the matching position and the character string length to obtain a second encoding character string, and finally, the first encoding character string and the second character string are subjected to compression processing by utilizing a predictive compression encoding method to obtain final compressed data. By adopting the data compression method provided by the scheme, as the original data are sequentially encoded and compressed by using a plurality of encoding methods (an encoding dictionary, range encoding and predictive compression encoding), compared with the existing compression scheme which only uses a single encoding method, the data compression method provided by the scheme can realize data compression with high compression rate, on one hand, the data compression efficiency is improved, and on the other hand, the occupation of bandwidth resources in the data transmission process is greatly reduced and the data transmission efficiency is improved by transmitting the compressed data.
In an implementation manner, the embodiment of the application further provides a data compression device, which is used for solving the problem of low compression efficiency existing in the existing data compression scheme. The specific structure of the data compression device is schematically shown in fig. 3, and includes a partitioning unit 31, a dictionary construction unit 32, a matching unit 33, an encoding unit 34, and a compression unit 35.
The partitioning unit 31 is configured to perform partitioning processing on the original data to obtain at least two data partitions;
a dictionary construction unit 32 for constructing a coding dictionary corresponding to the original data;
A matching unit 33, configured to search matching strings in each of the data blocks according to the coding dictionary, and determine a matching position and a string length corresponding to the matching strings;
The encoding unit 34 is configured to encode each data block according to the encoding dictionary, so as to obtain a first encoded string corresponding to the data block;
The encoding unit 34 is configured to encode the matching position and the string length to obtain a second encoded string corresponding to each data block;
a compression unit 35, configured to perform predictive compression encoding on each data block according to the first encoding string and the second encoding string, so as to obtain block compressed data corresponding to each data block;
and a compression unit 35, configured to combine the block compressed data to obtain compressed data corresponding to the original data.
In one implementation, the dictionary construction unit 32 is specifically configured to construct an empty original coding dictionary, initialize the original coding dictionary, determine a coding dictionary database corresponding to the original coding dictionary, scan the original data, sequentially obtain at least one character from the original data to obtain a character string to be processed, determine whether the character string to be processed has a matching sub-character string in the coding dictionary database, generate a matching character string according to the character string to be processed when the result of the determination is yes, add the character string to be processed to the coding dictionary database when the result of the determination is no, and complete the scanning of all characters in the original data to obtain the coding dictionary corresponding to the original data.
In one embodiment, the encoding unit 34 is specifically configured to determine an encoding range according to the matching position and the string length, and encode the matching position and the string length according to the encoding range to obtain a second encoded string corresponding to each data block.
In one embodiment, the encoding unit 34 is specifically configured to construct a probability model, determine an occurrence probability of the matching location and the character string length according to the probability model, and determine the encoding range according to the occurrence probability.
In one embodiment, the compression unit 35 is specifically configured to predict the first encoding string and the second encoding string according to a prediction model to obtain a prediction string, determine a prediction error according to the prediction string and an actual string, and encode the prediction error according to a preset compression algorithm to obtain block compressed data corresponding to each data block.
According to the data compression device provided by the embodiment of the application, before data transmission, a communication system can carry out block processing on original data to be transmitted to obtain at least two data blocks, each data block is respectively encoded in a mode of constructing an encoding dictionary to obtain a first encoding character string corresponding to the data block, meanwhile, according to the encoding dictionary, matching character strings in each data block are respectively searched, the matching position and the character string length corresponding to the matching character strings are determined, a range encoding method is utilized to carry out secondary encoding on the matching position and the character string length to obtain a second encoding character string, and finally, a prediction compression encoding method is utilized to carry out compression processing on the first encoding character string and the second character string to obtain final compressed data. By adopting the data compression method provided by the scheme, as the original data are sequentially encoded and compressed by using a plurality of encoding methods (an encoding dictionary, range encoding and predictive compression encoding), compared with the existing compression scheme which only uses a single encoding method, the data compression method provided by the scheme can realize data compression with high compression rate, on one hand, the data compression efficiency is improved, and on the other hand, the occupation of bandwidth resources in the data transmission process is greatly reduced and the data transmission efficiency is improved by transmitting the compressed data.
In an implementation manner, the embodiment of the application further provides a data decompression device, which is used for solving the problem of low compression efficiency existing in the existing data decompression scheme. The specific structure of the data decompression device is shown in fig. 4, and the data decompression device comprises a decompression unit 41, a dictionary construction unit 42 and a decoding unit 43.
The decompression unit 41 is configured to decompress compressed data to be decompressed, and obtain coding dictionary information, a coding string, and coding mode information corresponding to the compressed data;
A dictionary construction unit 42 for constructing an empty decoding dictionary database from the encoding dictionary information;
a decoding unit 43, configured to decode the encoded string according to the encoding mode information, to obtain a first decoded string and an undecoded string;
A decoding unit 43, configured to decode the undecoded string according to a range encoding method, to obtain a matching position and a string length;
The decoding unit 43 is configured to secondarily decode the first decoded character string according to the coding dictionary information, the matching position and the character string length, to obtain the original data corresponding to the compressed data.
By adopting the data decompression device provided by the embodiment of the application, the communication system can carry out blocking processing on the original data to be transmitted before carrying out data transmission to obtain at least two data blocks, each data block is respectively encoded in a mode of constructing an encoding dictionary to obtain a first encoding character string corresponding to the data block, meanwhile, according to the encoding dictionary, the matching character strings in each data block are respectively searched, the matching position and the character string length corresponding to the matching character string are determined, the range encoding method is utilized to carry out secondary encoding on the matching position and the character string length to obtain a second encoding character string, and finally, the first encoding character string and the second character string are compressed by utilizing the predictive compression encoding method to obtain final compressed data. By adopting the data compression method provided by the scheme, as the original data are sequentially encoded and compressed by using a plurality of encoding methods (an encoding dictionary, range encoding and predictive compression encoding), compared with the existing compression scheme which only uses a single encoding method, the data compression method provided by the scheme can realize data compression with high compression rate, on one hand, the data compression efficiency is improved, and on the other hand, the occupation of bandwidth resources in the data transmission process is greatly reduced and the data transmission efficiency is improved by transmitting the compressed data.
Fig. 5 is a schematic structural view of an electronic device according to an embodiment of the present application. Referring to fig. 5, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 5, but not only one bus or type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs, and forms a data compression device on a logic level. The processor is used for executing programs stored in the memory and is specifically used for performing block processing on original data to obtain at least two data blocks, constructing an encoding dictionary corresponding to the original data, searching matching character strings in the data blocks according to the encoding dictionary, determining matching positions and character string lengths corresponding to the matching character strings, encoding the data blocks according to the encoding dictionary, the matching positions and the character string lengths to obtain first encoding character strings corresponding to the data blocks, encoding the matching positions and the character string lengths to obtain second encoding character strings corresponding to the data blocks, performing predictive compression encoding on the data blocks according to the first encoding character strings and the second encoding character strings to obtain block compression data corresponding to the data blocks, and combining the block compression data to obtain compression data corresponding to the original data.
The method performed by the data compression electronics disclosed in the embodiment of the application shown in fig. 5 may be applied to, or implemented by, a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The Processor may be a general-purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (Network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
Of course, other implementations, such as a logic device or a combination of hardware and software, are not excluded from the electronic device of the present application, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or a logic device.
The embodiments of the present application also provide a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment of fig. 1, and in particular to perform the operations of:
The method comprises the steps of carrying out block processing on original data to obtain at least two data blocks, constructing an encoding dictionary corresponding to the original data, searching matching character strings in the data blocks according to the encoding dictionary, determining matching positions and character string lengths corresponding to the matching character strings, encoding the data blocks according to the encoding dictionary, the matching positions and the character string lengths to obtain first encoding character strings corresponding to the data blocks, encoding the matching positions and the character string lengths to obtain second encoding character strings corresponding to the data blocks, carrying out predictive compression encoding on the data blocks according to the first encoding character strings and the second encoding character strings to obtain block compression data corresponding to the data blocks, and combining the block compression data to obtain compression data corresponding to the original data.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A method of data compression, comprising:
performing block processing on the original data to obtain at least two data blocks;
constructing a coding dictionary corresponding to the original data;
searching for matching character strings in each data block according to the coding dictionary, and determining matching positions and character string lengths corresponding to the matching character strings;
According to the coding dictionary, the matching position and the character string length, respectively coding each data block to obtain a first coding character string corresponding to the data block;
Encoding the matching position and the character string length to obtain a second encoded character string corresponding to each data block;
according to the first coding character string and the second coding character string, respectively carrying out predictive compression coding on each data block to obtain block compression data corresponding to each data block;
and combining the block compressed data to obtain compressed data corresponding to the original data.
2. The method according to claim 1, wherein the constructing the coding dictionary corresponding to the original data specifically comprises:
constructing an empty original coding dictionary;
Initializing the original coding dictionary and determining a coding dictionary database corresponding to the original coding dictionary;
Scanning the original data, and sequentially acquiring at least one character from the original data to obtain a character string to be processed;
judging whether the character string to be processed has a matching sub-character string in the coding dictionary database;
when the judgment result is yes, generating a matching character string according to the character string to be processed;
when the judgment result is negative, adding the character string to be processed into the coding dictionary database;
And finishing scanning all characters in the original data to obtain the coding dictionary corresponding to the original data.
3. The method of claim 1, wherein the encoding the matching location and the string length to obtain a second encoded string corresponding to each data block specifically includes:
Determining a coding range according to the matching position and the character string length;
and encoding the matching position and the character string length according to the encoding range to obtain a second encoding character string corresponding to each data block.
4. A method according to claim 3, wherein said determining the encoding range based on said matching location and said string length comprises:
constructing a probability model;
determining the occurrence probability of the matching position and the character string length according to the probability model;
And determining the coding range according to the occurrence probability.
5. The method according to claim 1, wherein the performing predictive compression encoding on each data block according to the first encoding string and the second encoding string to obtain block compressed data corresponding to each data block specifically includes:
according to a prediction model, predicting the first coding character string and the second coding character string respectively to obtain a prediction character string;
determining a prediction error according to the prediction character string and the actual character string;
And encoding the prediction error according to a preset compression algorithm to obtain block compression data corresponding to each data block.
6. A method of decompressing data, comprising:
Decompressing compressed data to be decompressed to obtain coding dictionary information, coding character strings and coding mode information corresponding to the compressed data;
Constructing an empty decoding dictionary database according to the coding dictionary information;
Decoding the coded character string according to the coding mode information to obtain a first decoded character string and an undecoded character string;
Decoding the undecoded character string according to a range coding method to obtain a matching position and a character string length;
and performing secondary decoding on the first decoding character string according to the coding dictionary information, the matching position and the character string length to obtain original data corresponding to the compressed data.
7. A data compression apparatus, comprising:
the block unit is used for carrying out block processing on the original data to obtain at least two data blocks;
A dictionary construction unit for constructing a coding dictionary corresponding to the original data;
The matching unit is used for searching for matching character strings in each data block according to the coding dictionary and determining the matching position and the character string length corresponding to the matching character strings;
the coding unit is used for respectively coding each data block according to the coding dictionary to obtain a first coding character string corresponding to the data block;
The coding unit is used for coding the matching position and the character string length to obtain a second coding character string corresponding to each data block;
The compression unit is used for respectively carrying out predictive compression coding on each data block according to the first coding character string and the second coding character string to obtain block compression data corresponding to each data block;
And the compression unit is used for combining the block compressed data to obtain compressed data corresponding to the original data.
8. A data compression apparatus comprising:
Processor, and
A memory arranged to store computer executable instructions that, when executed, cause the processor to:
performing block processing on the original data to obtain at least two data blocks;
constructing a coding dictionary corresponding to the original data;
searching for matching character strings in each data block according to the coding dictionary, and determining matching positions and character string lengths corresponding to the matching character strings;
According to the coding dictionary, the matching position and the character string length, respectively coding each data block to obtain a first coding character string corresponding to the data block;
Encoding the matching position and the character string length to obtain a second encoded character string corresponding to each data block;
according to the first coding character string and the second coding character string, respectively carrying out predictive compression coding on each data block to obtain block compression data corresponding to each data block;
and combining the block compressed data to obtain compressed data corresponding to the original data.
9. A computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the data compression method of any of claims 1-6.
10. A computer program product comprising a computer program which, when executed by a processor, implements a data compression method as claimed in any one of claims 1 to 6.
CN202411256964.2A 2024-09-09 2024-09-09 Data compression method, data decompression method, device, equipment and storage medium Pending CN119315994A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411256964.2A CN119315994A (en) 2024-09-09 2024-09-09 Data compression method, data decompression method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411256964.2A CN119315994A (en) 2024-09-09 2024-09-09 Data compression method, data decompression method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN119315994A true CN119315994A (en) 2025-01-14

Family

ID=94179907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411256964.2A Pending CN119315994A (en) 2024-09-09 2024-09-09 Data compression method, data decompression method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN119315994A (en)

Similar Documents

Publication Publication Date Title
US9363309B2 (en) Systems and methods for compressing packet data by predicting subsequent data
CN100553152C (en) Encoding method and device and decoding method and device based on CABAC
WO2019153700A1 (en) Encoding and decoding method, apparatus and encoding and decoding device
WO2010044100A1 (en) Lossless compression
US20240221729A1 (en) Voice recognition method and apparatus, medium, and electronic device
JP2003218703A (en) Data coder and data decoder
CN118349528B (en) A file attribute-based adaptive compression method, system and storage medium
CN114764557A (en) Data processing method and device, electronic equipment and storage medium
CN113595557B (en) Data processing method and device
US20120110025A1 (en) Coding order-independent collections of words
JP2003273748A (en) Improved Huffman decoding method and apparatus
US12189601B2 (en) Data compression method, data decompression method, and electronic device
CN108880559B (en) Data compression method, data decompression method, compression device and decompression device
US20250139060A1 (en) System and method for intelligent data access and analysis
CN119315994A (en) Data compression method, data decompression method, device, equipment and storage medium
CN120297328A (en) Method, medium, device and program product for calculating attention score based on CPU
CN118244993B (en) Data storage method, data processing method and device, electronic equipment and medium
CN118694375A (en) A method and computing device for compressing numerical data
CN108092749B (en) Soft bit storage method and device
CN117335811A (en) Column data compression method and device and storage medium
CN117951100A (en) Data compression method, device and computer storage medium
CN119250020B (en) Text compression method, text decompression method, model training method, device and equipment
CN114070471B (en) Test data packet transmission method, device, system, equipment and medium
US12547833B2 (en) Lossless and lossy large language model-based text compression via arithmetic coding
CN118921410B (en) Protocol message transmission method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination