CN106849956A

CN106849956A - Compression method, decompression method, device and data processing system

Info

Publication number: CN106849956A
Application number: CN201611270254.0A
Authority: CN
Inventors: 任麒斌; 陆超; 柯继伟
Original assignee: Huawei Machine Co Ltd
Current assignee: Huawei Machine Co Ltd
Priority date: 2016-12-30
Filing date: 2016-12-30
Publication date: 2017-06-13
Anticipated expiration: 2036-12-30
Also published as: CN106849956B

Abstract

The present application discloses a compression method, a decompression method, a device and a data processing system, and belongs to the field of data processing. The decompression method comprises: obtaining (a*N+b)/a sub-code segments according to a code segment with a code length of a*N+b read from compressed data in the xth clock cycle; concurrently decompressing (a*N+b)/a sub-code segments to obtain (a*N+b)/a characters; and determining the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to a variable-length coding table and the code segment read in the xth clock cycle. The present application concurrently decompresses multiple sub-code segments to obtain multiple characters, and then determines the target decompressed data of the xth clock cycle from the multiple characters. There is no need to wait for a code word to be decompressed before decompressing the next code word according to the length of the code word, and the decompression speed is high.

Description

Compression method, decompression method, device and data processing system

技术领域technical field

本申请涉及数据处理领域，特别涉及一种压缩方法、解压缩方法、装置和数据处理系统。The present application relates to the field of data processing, in particular to a compression method, decompression method, device and data processing system.

背景技术Background technique

变长编码(英文：Variable Length Coding)是一种压缩数据时使用的编码方式，在该编码方式中，待压缩数据中的字符可以由长度不同的码字(码字为若干位二进制码)来表示，通常将出现概率较高的字符用较短的码字(如01)来表示，而将出现概率较低的字符用较长的码字来表示(如111001)，而字符与码字的对应关系可以记录在变长编码表中。Variable length coding (English: Variable Length Coding) is a coding method used when compressing data. In this coding method, the characters in the data to be compressed can be represented by codewords of different lengths (the codewords are several binary codes). Indicates that characters with a higher probability of occurrence are usually represented by a shorter codeword (such as 01), and characters with a lower probability of occurrence are represented by a longer codeword (such as 111001), and the character and codeword The corresponding relationship can be recorded in the variable-length coding table.

目前，采用变长编码对数据进行压缩所得到的数据称为压缩数据封装，该压缩数据封装可以包括：压缩数据和变长编码表等，对该压缩数据封装进行解压缩时，可以先对压缩数据封装进行解析，得到变长编码表，然后从压缩数据的第一位开始，根据该变长编码表依次对压缩数据进行解码，得到解压缩后的数据。At present, the data obtained by using variable-length coding to compress data is called compressed data packaging. The compressed data packaging may include: compressed data and variable-length coding tables, etc. When decompressing the compressed data The data is encapsulated and analyzed to obtain a variable-length code table, and then, starting from the first bit of the compressed data, the compressed data is sequentially decoded according to the variable-length code table to obtain decompressed data.

由于在解压缩前，上述压缩数据封装中的压缩数据中每个码字的长度是未知的，对压缩数据进行解压缩时，解出一个码字后才能确定下一个码字的起始位置，因而需要按照压缩数据的顺序一个码字接一个码字地进行解压缩，解压速度较低。Since the length of each codeword in the compressed data in the above-mentioned compressed data package is unknown before decompression, when the compressed data is decompressed, the starting position of the next codeword can only be determined after decompressing a codeword. Therefore, it is necessary to decompress one codeword after another according to the order of the compressed data, and the decompression speed is low.

发明内容Contents of the invention

为了解决解压缩时解压速度较低的问题，本发明实施例提供了一种压缩方法、解压缩方法、装置和数据处理系统。所述技术方案如下：In order to solve the problem of low decompression speed during decompression, embodiments of the present invention provide a compression method, decompression method, device and data processing system. Described technical scheme is as follows:

本发明实施例提供的解压缩方法可以由解压引擎执行，该解压引擎可以设置在x86架构(英文：The X86 architecture)的处理器或高级精简指令集处理器(英文：Advanced RISC Machines；简称：ARM)架构的处理器中。The decompression method provided by the embodiment of the present invention can be executed by a decompression engine, and the decompression engine can be set on a processor of x86 architecture (English: The X86 architecture) or an advanced reduced instruction set processor (English: Advanced RISC Machines; abbreviation: ARM ) architecture processors.

第一方面，本发明实施例提供了一种解压缩方法，该方法包括：In a first aspect, an embodiment of the present invention provides a decompression method, the method comprising:

解压引擎根据第x时钟周期从压缩数据中读取的码长为a*N+b的码段，得到的(a*N+b)/a个子码段；其中，x为大于或等于1的整数；(a*N+b)/a个子码段中每个子码段的码长为c，(a*N+b)/a个子码段中包括首个子码段，首个子码段的起始位与第x时钟周期读取的码段的起始位重合，且(a*N+b)/a个子码段中相邻两个子码段的起始位之间间隔的码长为a-1，所述N为大于0的整数。The decompression engine reads the code segment with a code length of a*N+b from the compressed data according to the xth clock cycle, and obtains (a*N+b)/a sub-code segments; where x is greater than or equal to 1 Integer; the code length of each subcode segment in (a*N+b)/a subcode segments is c, and (a*N+b)/a subcode segments include the first subcode segment, and the starting point of the first subcode segment The start bit coincides with the start bit of the code segment read in the xth clock cycle, and the code length between the start bits of two adjacent sub-code segments in (a*N+b)/a sub-code segments is a -1, the N is an integer greater than 0.

解压引擎基于变长编码表，并发地对(a*N+b)/a个子码段进行解压处理，得到(a*N+b)/a个字符，其中，解压引擎解压每一子码段能够得到一个字符。The decompression engine decompresses (a*N+b)/a sub-code segments concurrently based on the variable-length code table to obtain (a*N+b)/a characters, wherein the decompression engine decompresses each sub-code segment able to get a character.

该变长编码表可以包括多个码字，(a*N+b)/a个字符中每个字符对应于变长编码表中的一个码字；多个码字中至少两个码字对应的码长不同，多个码字中对应码长最长的码字对应的码长为c,多个码字中对应码长最短的码字对应的码长为a,且a为多个码字中每一码字对应的码长的最大公约数，其中，a为大于或等于2的整数，c为大于a的整数，c与a的差值为b。The variable-length coding table may include a plurality of codewords, and each character in the (a*N+b)/a characters corresponds to a codeword in the variable-length coding table; at least two codewords in the multiple codewords correspond to The code lengths of different code words are different, the code length corresponding to the code word with the longest code length among multiple code words is c, the code length corresponding to the code word with the shortest code length among multiple code words is a, and a is multiple codes The greatest common divisor of the code length corresponding to each code word in the word, where a is an integer greater than or equal to 2, c is an integer greater than a, and the difference between c and a is b.

解压引擎根据变长编码表和第x时钟周期读取的码段，从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据，其中，第x时钟周期的目标解压数据中，每相邻的两个有效字符中一个有效字符对应的码字的末位码元在第x时钟周期读取的码段中的位置和另一个有效字符对应的码字的起始码元在第x时钟周期读取的码段中的位置相邻。第x时钟周期的目标解压数据包括按照特定顺序排列的多个有效字符。The decompression engine determines the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length code table and the code segment read by the xth clock cycle, wherein the target of the xth clock cycle In the decompressed data, the position of the last code element of the code word corresponding to one valid character among the two adjacent valid characters and the start of the code word corresponding to the other valid character in the code segment read in the xth clock cycle The positions of the symbols in the code segment read in the xth clock cycle are adjacent. The target decompressed data of the xth clock cycle includes a plurality of valid characters arranged in a specific order.

本发明实施例提供的解压缩方法，在对压缩数据进行解压缩时，通过并发地对(a*N+b)/a个子码段进行解压处理，得到(a*N+b)/a个字符，然后根据变长编码表和第x时钟周期读取的码段，从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据。不用等待解压完一个码字后再根据该码字的长度解压下一个码字。在进行解压缩时可以同时从多个位置开始解压缩，解压速度较高。In the decompression method provided by the embodiment of the present invention, when decompressing the compressed data, the (a*N+b)/a subcode segments are decompressed concurrently to obtain (a*N+b)/a characters, and then determine the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length code table and the code segment read in the xth clock cycle. There is no need to wait for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword. During decompression, decompression can be started from multiple locations at the same time, and the decompression speed is relatively high.

可选的，第x时钟周期读取的码长为a*N+b的码段的后b位与第x+1时钟周期读取的码长为a*N+b的码段前b位相重合。Optionally, the last b bits of the code segment with a code length of a*N+b read in the xth clock cycle are in phase with the first b bits of the code segment with a code length of a*N+b read in the x+1th clock cycle coincide.

本发明实施例提供的解压缩方法，解压引擎通过在读取两个相邻的时钟周期时，使后一时钟周期的前b位为前一时钟周期的后b位，避免了在解压时遗漏未解压的码段。In the decompression method provided by the embodiment of the present invention, the decompression engine avoids omission of Uncompressed code segment.

可选的，在x为等于1的情况下，解压引擎根据变长编码表和第x时钟周期读取的码段，从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据，具体包括：Optionally, when x is equal to 1, the decompression engine determines the xth clock cycle from (a*N+b)/a characters according to the variable-length code table and the code segment read by the xth clock cycle The target decompression data, including:

解压引擎从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据，第x时钟周期的目标解压数据中第一个有效字符对应的码字的起始码元和第x时钟周期读取的码段的起始码元相重合。The decompression engine determines the target decompressed data of the xth clock cycle from (a*N+b)/a characters, and the starting code element and The starting symbols of the code segments read in the xth clock cycle are coincident.

本发明实施例提供的解压缩方法，在x等于1时，通过目标解压数据的起始码元的位置确定了第1时钟周期的目标解压数据。In the decompression method provided by the embodiment of the present invention, when x is equal to 1, the target decompressed data of the first clock cycle is determined by the position of the start symbol of the target decompressed data.

可选的，在x为大于或等于2的整数的情况下，解压引擎根据变长编码表和第x时钟周期读取的码段，从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据，具体包括：Optionally, when x is an integer greater than or equal to 2, the decompression engine determines from (a*N+b)/a characters The target decompressed data of the xth clock cycle, including:

解压引擎从(a*N+b)/a个字符中确定出c/a组候选解压数据；每组候选解压数据包括多个字符，且每相邻的两个字符中，一个字符对应的码字的末位码元在第x时钟周期读取的码段中的位置和另一个字符对应的码字的起始码元在第x时钟周期读取的码段中的位置是相邻的；每两组候选解压数据内第一个字符对应的码字的起始码元在第x时钟周期读取的码段中的位置是不相同的，且每组候选解压数据内第一个字符对应的码字的起始码元为第x时钟周期读取的码段中的第W*a个码元，W为大于或等于0且小于或等于b/a的整数。The decompression engine determines the c/a group of candidate decompression data from (a*N+b)/a characters; each group of candidate decompression data includes multiple characters, and every two adjacent characters, the code corresponding to one character The position in the code segment read by the last code element of the word is adjacent to the position in the code segment read by the code word corresponding to another character in the x clock cycle; The position of the start symbol of the code word corresponding to the first character in each group of candidate decompressed data is different in the code segment read in the xth clock cycle, and the first character in each group of candidate decompressed data corresponds to The starting symbol of the code word is the W*a-th symbol in the code segment read in the xth clock cycle, and W is an integer greater than or equal to 0 and less than or equal to b/a.

解压引擎从c/a组候选解压数据中确定出第x时钟周期的目标解压数据；在压缩数据中，第x-1时钟周期的目标解压数据中最后一个字符对应的码字的末位码元和第x时钟周期的目标解压数据中第一个字符对应的码字的起始码元是相邻的，第x-1时钟周期的目标解压数据是指第x-1时钟周期读取的码段对应的目标解压数据。The decompression engine determines the target decompressed data of the xth clock cycle from the candidate decompressed data of group c/a; in the compressed data, the last symbol of the codeword corresponding to the last character in the target decompressed data of the x-1th clock cycle The start symbol of the codeword corresponding to the first character in the target decompressed data of the xth clock cycle is adjacent, and the target decompressed data of the x-1th clock cycle refers to the code read in the x-1th clock cycle segment corresponding to the target decompressed data.

本发明实施例提供的解压缩方法，在x为大于或等于2的整数的情况下，通过先获取第x周期读取的数据的可能的c/a个候选解压数据，然后再根据第x-1时钟周期的目标解压数据选出第x时钟周期的目标解压数据，使得解压引擎可以在上一时钟周期的数据还未解压完成时就开始获取候选解压数据，提高了解压数据的速度。在x等于1的情况下，第x-1时钟周期的目标解压数据即为对第1时钟周期读取的码段进行解压得到的按照特定顺序排列的多个有效字符。In the decompression method provided by the embodiment of the present invention, when x is an integer greater than or equal to 2, by first obtaining possible c/a candidate decompression data of the data read in the x-th period, and then according to the x-th The target decompression data of 1 clock cycle selects the target decompression data of the xth clock cycle, so that the decompression engine can start to obtain the candidate decompression data before the decompression of the data of the previous clock cycle is completed, and the speed of decompression data is improved. When x is equal to 1, the target decompressed data of the x-1th clock cycle is a plurality of valid characters arranged in a specific order obtained by decompressing the code segment read in the first clock cycle.

可选的，解压引擎在确定出第x时钟周期的目标解压数据之后，该方法还包括：Optionally, after the decompression engine determines the target decompressed data of the xth clock cycle, the method further includes:

解压引擎根据压缩数据，将获取的多个时钟周期的解压数据进行拼接，获取压缩数据对应的解压数据，拼接后相邻的两个时钟周期中，一个时钟周期的目标解压数据内最后一个字符对应的码字的末位码元在压缩数据中的位置和另一个时钟周期的目标解压数据内第一个字符对应的码字的起始码元在压缩数据中的位置是相邻的。The decompression engine splices the obtained decompressed data of multiple clock cycles according to the compressed data, and obtains the decompressed data corresponding to the compressed data. The position of the last symbol of the code word in the compressed data is adjacent to the position of the start symbol of the code word corresponding to the first character in the target decompressed data of another clock cycle in the compressed data.

本发明实施例提供的解压缩方法，在得到多个时钟周期的解压数据后，将这些结果拼接起来以获得压缩数据的解压数据，解压速度较高。In the decompression method provided by the embodiment of the present invention, after obtaining the decompressed data of multiple clock cycles, these results are spliced together to obtain the decompressed data of the compressed data, and the decompression speed is relatively high.

第二方面，本发明实施例提供一种压缩方法，该压缩方法可以由压缩引擎执行，该方法包括：In a second aspect, an embodiment of the present invention provides a compression method, which can be executed by a compression engine, and the method includes:

压缩引擎确定变长编码表中每个码字对应的码长的目标公约数；变长编码表包括多个码字，目标公约数为大于或等于2的整数，待压缩数据中的每个字符对应于变长编码表中的一个码字；The compression engine determines the target common divisor of the code length corresponding to each codeword in the variable-length coding table; the variable-length coding table includes multiple codewords, the target common divisor is an integer greater than or equal to 2, and each character in the data to be compressed Corresponding to a codeword in the variable-length coding table;

压缩引擎根据待压缩数据中的每个字符和目标公约数，生成变长编码表；变长编码表中至少两个码字的码长不同，待压缩数据中出现概率较高的字符对应的码字的码长小于出现概率较低的字符对应的码字的码长，变长编码表中码字的码长的最大公约数为目标公约数；The compression engine generates a variable-length code table according to each character in the data to be compressed and the target common divisor; the code lengths of at least two codewords in the variable-length code table are different, and the code corresponding to the character with a higher probability of appearing in the data to be compressed The code length of word is less than the code length of the corresponding code word of character with lower probability of occurrence, and the greatest common divisor of the code length of code word in the variable-length coding table is target common divisor;

压缩引擎根据变长编码表对待压缩数据进行压缩，得到压缩数据；The compression engine compresses the data to be compressed according to the variable-length code table to obtain the compressed data;

压缩引擎基于压缩数据和变长编码表生成压缩数据封装。The compression engine generates compressed data packages based on the compressed data and the variable-length code table.

本发明实施例提供的压缩方法，根据待压缩数据中的每个字符和目标公约数，生成变长编码表，并根据该变长编码表对待压缩数据进行压缩，使得解压引擎在解压该压缩数据时，可以从多个位置并发开始解压，而不用等待解压完一个码字后再根据该码字的长度解压下一个码字，解压速度较高。The compression method provided by the embodiment of the present invention generates a variable-length code table according to each character in the data to be compressed and the target common divisor, and compresses the data to be compressed according to the variable-length code table, so that the decompression engine decompresses the compressed data , can start decompression concurrently from multiple locations, instead of waiting for a codeword to be decompressed and then decompress the next codeword according to the length of the codeword, and the decompression speed is relatively high.

可选的，解压引擎确定变长编码表中每个码字对应的码长的目标公约数，包括：Optionally, the decompression engine determines the target common divisor of the code length corresponding to each codeword in the variable-length coding table, including:

解压引擎首先可以确定至少一个公约数，根据对应于至少一个公约数中每一公约数的变长编码表，对待压缩数据进行压缩，得到的压缩率均小于预设值，其中，根据至少一个公约数中每一公约数，均能够生成一个对应的变长编码表；The decompression engine can first determine at least one common divisor, and compress the data to be compressed according to the variable-length coding table corresponding to each common divisor in the at least one common divisor. Each common divisor in the number can generate a corresponding variable-length code table;

之后解压引擎可以从至少一个公约数中确定出目标公约数，目标公约数的数值大于至少一个公约数中除目标公约数之外的每一公约数的数值。Afterwards, the decompression engine may determine the target common divisor from the at least one common divisor, and the value of the target common divisor is greater than the value of each common divisor in the at least one common divisor except the target common divisor.

最大公约数a越大，本发明实施例提供的压缩方法压缩的数据的解压速度就越高，但是压缩率可能会在最大公约数变大时升高。本发明实施例提供的压缩方法，通过预先确定对应的压缩率小于预设值的至少一个公约数，并从中选出最大的公约数作为目标公约数，保证了压缩率和解压速度都在一个较高的范围。The greater the greatest common divisor a, the higher the decompression speed of the data compressed by the compression method provided by the embodiment of the present invention, but the compression rate may increase when the greatest common divisor becomes larger. In the compression method provided by the embodiment of the present invention, by pre-determining at least one common divisor whose corresponding compression rate is smaller than the preset value, and selecting the largest common divisor as the target common divisor, it is ensured that both the compression rate and the decompression speed are within a relatively low high range.

第三方面，本发明实施例提供一种解压缩装置，该解压缩装置包括多个单元，该多个单元用于实现上述第一方面或第一方面中任意一种可能的实现方式所提供的解压缩方法。In the third aspect, the embodiment of the present invention provides a decompression device, the decompression device includes a plurality of units, and the plurality of units are used to implement the above-mentioned first aspect or any one of the possible implementations of the first aspect. Unzip method.

第四方面，本发明实施例提供一种压缩装置，该压缩装置包括多个单元，该多个单元用于实现上述第二方面或第二方面中任意一种可能的实现方式所提供的压缩方法。In a fourth aspect, an embodiment of the present invention provides a compression device, the compression device includes a plurality of units, and the plurality of units are used to implement the compression method provided in the second aspect or any possible implementation manner of the second aspect .

第五方面，本发明实施例提供一种数据处理系统，该系统包括第三方面提供的解压缩装置和第四方面提供的压缩装置，所述压缩装置用于对待压缩数据进行压缩得到压缩数据；所述解压缩装置用于对所述压缩数据进行解压处理。In a fifth aspect, an embodiment of the present invention provides a data processing system, the system includes the decompression device provided in the third aspect and the compression device provided in the fourth aspect, the compression device is used to compress data to be compressed to obtain compressed data; The decompression device is used for decompressing the compressed data.

第六方面，提供一种解压缩装置，所述解压缩装置包括：底板以及设置在底板上的中央处理器(英文：Central Processing Unit；简称：CPU)和内存，该底板连接有总线接口，该总线接口可以连接有解压引擎，该解压引擎用于执行第一方面提供的解压缩方法。In a sixth aspect, a decompression device is provided, the decompression device includes: a base plate and a central processing unit (English: Central Processing Unit; CPU for short) and memory arranged on the base plate, the base plate is connected with a bus interface, the The bus interface may be connected with a decompression engine, and the decompression engine is used to execute the decompression method provided in the first aspect.

第七方面，提供一种解压缩装置，所述解压缩装置包括：处理器芯片、与处理器芯片连接的内存、以及设置在处理器芯片上的CPU和总线接口，解压引擎可以集成在处理器芯片上，并通过总线接口和CPU连接，解压引擎用于执行第一方面提供的解压缩方法。In a seventh aspect, a decompression device is provided, the decompression device includes: a processor chip, a memory connected to the processor chip, and a CPU and a bus interface arranged on the processor chip, and the decompression engine can be integrated in the processor On the chip and connected to the CPU through the bus interface, the decompression engine is used to execute the decompression method provided by the first aspect.

第八方面，提供的一种压缩装置，该压缩装置包括：至少一个处理器、至少一个网络接口、存储器以及至少一个总线，存储器与网络接口分别通过总线与处理器相连；处理器被配置为执行存储器中存储的指令；处理器通过执行指令来实现第二方面提供的压缩方法。In an eighth aspect, a compression device is provided, the compression device includes: at least one processor, at least one network interface, memory, and at least one bus, the memory and the network interface are respectively connected to the processor through the bus; the processor is configured to execute Instructions stored in the memory; the processor implements the compression method provided by the second aspect by executing the instructions.

综上所述，本发明实施例提供的解压缩方法，在对压缩数据进行解压缩时，通过并发地对(a*N+b)/a个子码段进行解压处理，得到(a*N+b)/a个字符，然后根据变长编码表和第x时钟周期读取的码段，从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据。不用等待解压完一个码字后再根据该码字的长度解压下一个码字。解决了相关技术中解压速度较低的问题。在进行解压缩时可以同时从多个位置开始解压缩，解压速度较高。In summary, the decompression method provided by the embodiment of the present invention, when decompressing compressed data, decompresses (a*N+b)/a subcode segments concurrently to obtain (a*N+ b)/a characters, and then determine the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length code table and the code segment read in the xth clock cycle. There is no need to wait for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword. The problem of low decompression speed in the related technology is solved. When decompressing, it can be decompressed from multiple locations at the same time, and the decompression speed is relatively high.

附图说明Description of drawings

图1A是本发明实施例提供的一种解压缩装置的结构方框图；FIG. 1A is a structural block diagram of a decompression device provided by an embodiment of the present invention;

图1B是本发明实施例提供的另一种解压缩装置的结构方框图；FIG. 1B is a structural block diagram of another decompression device provided by an embodiment of the present invention;

图1C是本发明实施例提供的一种压缩装置的结构方框图；Fig. 1C is a structural block diagram of a compression device provided by an embodiment of the present invention;

图2A是本发明实施例示出的一种压缩方法的流程图；FIG. 2A is a flowchart of a compression method shown in an embodiment of the present invention;

图2B是图2A所示实施例中一种确定目标公约数的流程图；Fig. 2B is a flow chart of determining the common divisor of the target in the embodiment shown in Fig. 2A;

图3A是本发明实施例示出的一种解压缩方法的流程图；FIG. 3A is a flowchart of a decompression method shown in an embodiment of the present invention;

图3B是图3A所示实施例中第x时钟周期读取的数据中码字与字符的对应关系示意图；Fig. 3B is a schematic diagram of the corresponding relationship between codewords and characters in the data read in the xth clock cycle in the embodiment shown in Fig. 3A;

图3C是图3A所示实施例中解压引擎所占用的芯片面积条形图；Fig. 3C is a bar graph of chip area occupied by the decompression engine in the embodiment shown in Fig. 3A;

图4是本发明实施例提供的一种压缩装置的结构方框图；Fig. 4 is a structural block diagram of a compression device provided by an embodiment of the present invention;

图5A是本发明实施例提供的一种解压缩装置的结构方框图；FIG. 5A is a structural block diagram of a decompression device provided by an embodiment of the present invention;

图5B是本发明实施例提供的一种解压缩装置的结构方框图；FIG. 5B is a structural block diagram of a decompression device provided by an embodiment of the present invention;

图6为本发明实施例提供的一种数据处理系统的结构示意图。FIG. 6 is a schematic structural diagram of a data processing system provided by an embodiment of the present invention.

具体实施方式detailed description

为了使本公开的目的、技术方案和优点更加清楚，下面将结合附图对本公开作进一步地详细描述，显然，所描述的实施例仅仅是本公开一部份实施例，而不是全部的实施例。In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present disclosure, not all of them. .

请参考图1A，其示出了本发明实施例提供的一种解压缩装置的结构方框图，该解压缩装置可以包括：底板11以及设置在底板11上的CPU12和内存13，该底板11连接有总线接口14，该总线接口14可以连接有解压引擎15，解压引擎15可以是扩展卡，而扩展卡可以为现场可编程门阵列(英文：Field-Programmable Gate Array；简称：FPGA)或应用专用集成电路(英文：Application Specific Integrated Circuit；简称：ASIC)。总线接口14可以为PCIe(一种总线接口)。Please refer to FIG. 1A, which shows a structural block diagram of a decompression device provided by an embodiment of the present invention. The decompression device may include: a base plate 11, a CPU 12 and a memory 13 arranged on the base plate 11, and the base plate 11 is connected with Bus interface 14, this bus interface 14 can be connected with decompression engine 15, and decompression engine 15 can be expansion card, and expansion card can be Field-Programmable Gate Array (English: Field-Programmable Gate Array; Abbreviation: FPGA) or application-specific integration Circuit (English: Application Specific Integrated Circuit; Abbreviation: ASIC). The bus interface 14 may be PCIe (a bus interface).

请参考图1B，其示出了本发明实施例提供的另一种解压缩装置的结构方框图，该解压缩装置可以包括：处理器芯片16、与处理器芯片16连接的内存17，以及设置在处理器芯片16上的CPU161和总线接口162，解压引擎163可以集成在处理器芯片16上，并通过总线接口162和CPU161连接。其中，内存17可以为双倍速率同步动态随机存储器(英文：DoubleData Rate SDRAM；简称：DDR)，总线接口162可以为高级可扩展接口(英文：AdvancedeXtensible Interface；简称：AXI)。Please refer to FIG. 1B, which shows a structural block diagram of another decompression device provided by an embodiment of the present invention. The decompression device may include: a processor chip 16, a memory 17 connected to the processor chip 16, and a The CPU 161 and the bus interface 162 on the processor chip 16 and the decompression engine 163 may be integrated on the processor chip 16 and connected to the CPU 161 through the bus interface 162 . Wherein, the memory 17 may be a double rate synchronous dynamic random access memory (English: Double Data Rate SDRAM; abbreviation: DDR), and the bus interface 162 may be an advanced extensible interface (English: AdvancedeXtensible Interface; abbreviation: AXI).

请参考图1C，其示出了本发明实施例提供的一种压缩装置的结构方框图，该压缩装置20可以包括：至少一个处理器21、至少一个网络接口22、存储器23以及至少一个总线24，存储器23与网络接口22分别通过总线24与处理器21相连；处理器21被配置为执行存储器23中存储的指令231。存储器23可能包含高速随机存取存储器(英文：Random AccessMemory；简称：RAM)，也可能还包括非不稳定的存储器(英文：non-volatile memory)，例如至少一个磁盘存储器。Please refer to FIG. 1C, which shows a structural block diagram of a compression device provided by an embodiment of the present invention. The compression device 20 may include: at least one processor 21, at least one network interface 22, memory 23, and at least one bus 24, The memory 23 and the network interface 22 are respectively connected to the processor 21 through the bus 24 ; the processor 21 is configured to execute instructions 231 stored in the memory 23 . The memory 23 may include a high-speed random access memory (English: Random Access Memory; RAM for short), and may also include a non-volatile memory (English: non-volatile memory), such as at least one disk memory.

图2A是本发明实施例示出的一种压缩方法的流程图，本实施例以该压缩方法应用于对待压缩数据进行压缩来举例说明。该压缩方法可以包括如下几个步骤：FIG. 2A is a flow chart of a compression method shown in an embodiment of the present invention. This embodiment is described by taking the compression method applied to compress data to be compressed as an example. The compression method may include the following steps:

步骤201、压缩引擎获取待压缩数据中各种字符的概率分布。Step 201, the compression engine obtains the probability distribution of various characters in the data to be compressed.

在使用本发明实施例提供的压缩方法时，压缩引擎可以分析待压缩数据，以获取待压缩数据中各种字符的概率分布。待压缩数据可以是由多个不同的字符构成的，不同字符的出现概率可能不同，概率分布中可以包括每一字符的出现概率，示例性的，字符“A”的出现概率可以为4％(百分号)，字符“B”的出现概率可以为6％。其中，待压缩数据中的字符是指存储于存储装置中未压缩的字符，存储装置在存储数据时，通常是以二进制位(英文：bit)为单位的，根据编码方式的不同，每个字符所占的二进制位的个数不同，如以美国信息交换标准代码(英文：American Standard Code for Information Interchange；简称：ASCII)作为编码方式的待压缩数据中，每个字符可以占8个二进制位。When using the compression method provided by the embodiment of the present invention, the compression engine can analyze the data to be compressed to obtain the probability distribution of various characters in the data to be compressed. The data to be compressed may be composed of a plurality of different characters, and the occurrence probability of different characters may be different, and the probability distribution may include the occurrence probability of each character. Exemplarily, the occurrence probability of the character "A" may be 4% ( percent sign), the occurrence probability of the character "B" may be 6%. Wherein, the characters in the data to be compressed refer to the uncompressed characters stored in the storage device. When the storage device stores data, it usually uses binary bits (English: bit) as the unit. According to different encoding methods, each character The number of binary bits occupied is different. For example, in data to be compressed that uses American Standard Code for Information Interchange (English: American Standard Code for Information Interchange; ASCII for short) as the encoding method, each character can occupy 8 binary bits.

本发明实施例中的压缩引擎可以通过软件或硬件结合在图1C所示的压缩装置中。The compression engine in the embodiment of the present invention may be integrated in the compression device shown in FIG. 1C through software or hardware.

步骤202、压缩引擎确定变长编码表中每个码字对应的码长的目标公约数。Step 202, the compression engine determines the target common divisor of the code length corresponding to each codeword in the variable-length coding table.

该变长编码表可以包括多个码字，目标公约数为大于或等于2的整数，待压缩数据中的每个字符对应于所述变长编码表中的一个码字。The variable-length coding table may include multiple codewords, the target common divisor is an integer greater than or equal to 2, and each character in the data to be compressed corresponds to a codeword in the variable-length coding table.

该目标公约数为编码表中每一码字对应码长的最大公约数，而该编码表是由压缩引擎后续生成的。需要说明的是，码字由若干个码元组成，计算机通信中通常表现为若干位二进制数据。一个码元可以为一个二进制数。The target common divisor is the greatest common divisor of the code length corresponding to each codeword in the coding table, and the coding table is subsequently generated by the compression engine. It should be noted that a codeword is composed of several code elements, which are usually expressed as several bits of binary data in computer communication. A symbol can be a binary number.

如图2B所示，压缩引擎确定目标公约数的流程可以包括下面2个子步骤：As shown in Figure 2B, the process of the compression engine determining the target common divisor may include the following two sub-steps:

子步骤2021、压缩引擎确定至少一个公约数，根据对应于至少一个公约数中每一公约数的变长编码表，对待压缩数据进行压缩，得到的压缩率均小于预设值。Sub-step 2021, the compression engine determines at least one common divisor, and compresses the data to be compressed according to the variable-length coding table corresponding to each common divisor in the at least one common divisor, and the obtained compression ratios are all lower than the preset value.

其中，根据至少一个公约数中每一公约数，均能够生成一个对应的变长编码表。Wherein, according to each common divisor in at least one common divisor, a corresponding variable-length coding table can be generated.

子步骤2021可以包括：Substep 2021 may include:

1)压缩引擎首先可以预先设置多个公约数，这多个公约数均为大于或等于2且小于S的整数，S为待压缩数据中每个字符所占的二进制位的个数(这是因为如果公约数大于或等于字符所占的二进制位的个数，以该公约数为压缩数据中每个码字的码长的最大公约数时，压缩数据的数据量会大于待压缩数据，难以起到压缩的目的)。示例性的，待压缩数据中字符所占的位的个数为8，则公约数可以预先设置为2至7这6个数。1) The compression engine can first set multiple common divisors in advance, and these multiple common divisors are all integers greater than or equal to 2 and less than S, where S is the number of binary bits occupied by each character in the data to be compressed (this is Because if the common divisor is greater than or equal to the number of binary bits occupied by characters, when the common divisor is the greatest common divisor of the code length of each codeword in the compressed data, the data volume of the compressed data will be greater than the data to be compressed, which is difficult for the purpose of compression). Exemplarily, the number of bits occupied by characters in the data to be compressed is 8, and the common divisor can be preset as 6 numbers from 2 to 7.

2)在设置了多个公约数后，压缩引擎可以估计这多个公约数中每个公约数对应的压缩率(英文：Compression ratio)，压缩率是数据压缩后的大小与压缩前的大小之比。示例性的，待压缩数据在压缩前的大小为100兆字节(英文：Megabytes；简称：M)，而压缩后的大小为10M，则压缩率为10％。2) After setting multiple common divisors, the compression engine can estimate the compression ratio (English: Compression ratio) corresponding to each common divisor among the multiple common divisors. The compression ratio is the difference between the size of the data after compression and the size before compression. Compare. Exemplarily, if the size of the data to be compressed is 100 megabytes (English: Megabytes; M for short) before compression, and the size after compression is 10M, the compression rate is 10%.

压缩引擎在估计每个公约数的压缩率时，可以对待压缩数据进行采样得到采样数据，然后根据每一个公约数生成一个变长编码表(根据公约数生成编码表的过程可以参考步骤203)，并根据每个变长编码表对采样数据进行压缩得到每个公约数对应的压缩率。其中，根据这多个公约数中任一公约数生成的编码表中，记录有采样数据中每个字符和该字符对应的码字之间的对应关系，同时，变长编码表中每个字符对应的码字的码长为该任一公约数的正整数倍。When the compression engine estimates the compression rate of each common divisor, it can sample the data to be compressed to obtain sampled data, and then generate a variable-length code table according to each common divisor (the process of generating a code table according to the common divisor can refer to step 203), And compress the sampling data according to each variable-length coding table to obtain the compression rate corresponding to each common divisor. Among them, in the coding table generated according to any of the multiple common divisors, the corresponding relationship between each character in the sampling data and the code word corresponding to the character is recorded, and at the same time, each character in the variable-length coding table The code length of the corresponding code word is a positive integer multiple of any common divisor.

3)从多个公约数中筛选出对应的压缩率小于预设值的至少一个公约数。3) Selecting at least one common divisor whose corresponding compression ratio is smaller than a preset value from multiple common divisors.

预设值可以为操作人员预先设置的，该预设值在设置时可以考虑对于解压速度的要求，对于解压速度要求高时，可以设置一个较高的预设值，而在对解压速度要求低时，可以设置一个较小的预设值。这是因为通常压缩率越小，解压速度就会越慢。The preset value can be preset for the operator, and the preset value can consider the requirement for decompression speed when setting it. When the requirement for decompression speed is high, a higher preset value can be set, and when the requirement for decompression speed is low , you can set a smaller preset value. This is because generally the smaller the compression ratio, the slower the decompression speed will be.

子步骤2022、压缩引擎从至少一个公约数中确定出目标公约数，目标公约数的数值大于至少一个公约数中除目标公约数之外的每一公约数的数值。In sub-step 2022, the compression engine determines a target common divisor from at least one common divisor, and the value of the target common divisor is greater than the value of each common divisor in the at least one common divisor except the target common divisor.

压缩引擎确定出目标公约数，该目标公约数是指对应的压缩率小于预设值的至少一个公约数中具有最大值的公约数。The compression engine determines a target common divisor, where the target common divisor refers to a common divisor having a maximum value among at least one common divisor whose corresponding compression rate is smaller than a preset value.

应当明白的是，在前述“至少一个公约数”中，在该“至少一个公约数”的数量为1的情况下，也即该“至少一个公约数”就是指一个公约数，则该公约数即为目标公约数。只有在该“至少一个公约数”的数量大于或等于2的情况下，目标公约数才是指大于至少一个公约数中除目标公约数之外的每一公约数的数值的公约数。It should be understood that, in the aforementioned "at least one common divisor", when the number of the "at least one common divisor" is 1, that is, the "at least one common divisor" refers to a common divisor, then the common divisor is the objective common divisor. Only when the number of "at least one common divisor" is greater than or equal to 2, the target common divisor refers to the common divisor that is greater than the value of each common divisor in the at least one common divisor except the target common divisor.

由于本发明实施例提供的压缩方法对应的解压缩方法(参考图3A所示的实施例)中，变长编码表中码字的最大公约数越大，解压引擎每个时钟周期读取的数据就越多，解压速度也就越高，因而在考虑解压速度的情况下，可以将压缩率小于预设值的公约数中最大的公约数确定为目标公约数。In the decompression method corresponding to the compression method provided by the embodiment of the present invention (refer to the embodiment shown in Figure 3A), the greater the greatest common divisor of the codeword in the variable-length coding table, the greater the data read by the decompression engine per clock cycle The more there are, the higher the decompression speed is. Therefore, in consideration of the decompression speed, the largest common divisor among the common divisors whose compression rate is smaller than the preset value can be determined as the target common divisor.

此外，由于最大公约数越大，表示字符的码字的码长也会越长，进而用于表示待压缩数据的压缩数据(压缩数据是由一个个码字构成的)的数据量也会越大，因而最大公约数越大，压缩率也会越大。在考虑压缩率的情况下，也可以将压缩率小于预设值的公约数中最小的公约数确定为目标公约数。In addition, since the greater the greatest common divisor, the longer the code length of the code word representing the character will be, and the larger the data volume of the compressed data (compressed data is composed of each code word) used to represent the data to be compressed will be. Larger, so the greater the greatest common divisor, the greater the compression rate. In the case of considering the compression ratio, the smallest common divisor among the common divisors whose compression ratio is smaller than the preset value may also be determined as the target common divisor.

步骤203，压缩引擎根据待压缩数据中的每个字符和目标公约数，生成变长编码表。Step 203, the compression engine generates a variable-length code table according to each character in the data to be compressed and the target common divisor.

变长编码表中至少两个码字的码长不同，待压缩数据中出现概率较高的字符对应的码字的码长小于出现概率较低的字符对应的码字的码长，变长编码表中码字的码长的最大公约数为目标公约数。The code lengths of at least two code words in the variable-length coding table are different, and the code length of the code word corresponding to the character with a higher probability of occurrence in the data to be compressed is smaller than the code length of the code word corresponding to the character with a lower probability of occurrence. The greatest common divisor of the code lengths of the codewords in the table is the target common divisor.

变长编码表是记录待压缩数据中字符与码字的对应关系的映射表，生成变长编码表的过程是确定待压缩数据中字符与码字的对应关系的过程。其中，变长编码表中每一码字对应于待压缩数据中的一个字符，相同的字符对应的码字是相同的，不同的字符对应的码字是不同的。对于“不同的字符对应的码字不同”，一种情况是不同的字符对应的码字的位数(或者说“码长”)不同；另一种情况是虽然不同的字符对应的码字的位数相同，但是码字中的二进制数据的排列次序不同。参见表1，字符A对应的码字是000，字符B对应码字是001，则字符A对应的码字000和字符B对应码字001的虽然位数是相同的，但是码字中二进制数据的排列顺序不同。进一步参见表1，字符A对应的代码是000，字符F对应代码的101000，则字符A对应的代码000和字符F对应代码101000的位数是不同的。The variable-length coding table is a mapping table that records the correspondence between characters and codewords in the data to be compressed, and the process of generating the variable-length coding table is the process of determining the correspondence between characters and codewords in the data to be compressed. Wherein, each codeword in the variable-length coding table corresponds to a character in the data to be compressed, the codewords corresponding to the same character are the same, and the codewords corresponding to different characters are different. For "different characters correspond to different codewords", one situation is that the number of digits (or "code length") of codewords corresponding to different characters is different; another situation is that although the codewords corresponding to different characters have The number of bits is the same, but the arrangement order of the binary data in the codeword is different. Referring to Table 1, the codeword corresponding to character A is 000, and the codeword corresponding to character B is 001. Although the codeword 000 corresponding to character A and the codeword 001 corresponding to character B have the same number of digits, the binary data in the codeword are sorted in a different order. Further referring to Table 1, the code corresponding to character A is 000, and the code corresponding to character F is 101000, so the code 000 corresponding to character A and the code 101000 corresponding to character F have different digits.

在生成变长编码表时，可以按照待压缩数据中每个字符出现概率从大到小的顺序，为每个字符分配长度从小到大的码字，即字符出现的概率越大，就将该字符与长度越短的码字对应，这样就能够以较少的数据量来表示待压缩数据。When generating the variable-length code table, you can assign codewords with lengths from small to large for each character in the descending order of the occurrence probability of each character in the data to be compressed, that is, the higher the probability of a character, the more A character corresponds to a codeword with a shorter length, so that the data to be compressed can be expressed with a smaller amount of data.

示例性的，目标公约数为3时，变长编码表可以如表1所示：Exemplarily, when the target common divisor is 3, the variable-length coding table may be as shown in Table 1:

表1Table 1

在表1中，字符列中，码长较短的码字对应的字符出现的概率要大于码长较长的码字对应的字符出现的概率。与字符位于同一行的码长，代表该字符对应的码字的码长，或者代表该字符对应的码字包含的码元的数量，例如与字符“P”位于同一行的码长6代表的就是字符“P”对应的码字“110010”的码长。而与字符位于同一行的码字代表该字符对应的码字，例如与字符“Q”位于同一行的码字“110011”代表该字符“Q”对应的码字。待压缩数据中字符的排列顺序与压缩数据中码字的排列顺序是一致的。In Table 1, in the character column, the occurrence probability of a character corresponding to a code word with a shorter code length is greater than that of a character corresponding to a longer code word. The code length on the same line as the character represents the code length of the code word corresponding to the character, or the number of code elements contained in the code word corresponding to the character, for example, the code length 6 on the same line as the character "P" represents It is the code length of the codeword "110010" corresponding to the character "P". The codeword located on the same row as the character represents the codeword corresponding to the character, for example, the codeword “110011” located on the same row as the character “Q” represents the codeword corresponding to the character “Q”. The arrangement order of the characters in the data to be compressed is consistent with the arrangement order of the codewords in the compressed data.

本发明实施例中的变长编码可以是一种熵编码(英文：entropy encoding)，熵编码是无损压缩时使用的一种编码方式。The variable length encoding in the embodiment of the present invention may be an entropy encoding (English: entropy encoding), and the entropy encoding is an encoding method used in lossless compression.

步骤204、压缩引擎根据变长编码表对待压缩数据进行压缩，得到压缩数据。Step 204, the compression engine compresses the data to be compressed according to the variable-length coding table to obtain compressed data.

在得到了变长编码表之后，可以根据变长编码表对待压缩数据进行压缩，得到压缩数据。示例性的，变长编码表为表1，根据表1对待压缩数据“ABC”进行压缩后，得到的压缩数据为“000 001 010”。After the variable-length coding table is obtained, the data to be compressed can be compressed according to the variable-length coding table to obtain compressed data. Exemplarily, the variable-length coding table is Table 1, and after compressing the data "ABC" to be compressed according to Table 1, the obtained compressed data is "000 001 010".

步骤205、压缩引擎基于压缩数据和变长编码表生成压缩数据封装。Step 205, the compression engine generates a compressed data package based on the compressed data and the variable-length coding table.

在获取了变长编码表和压缩数据之后，压缩引擎基于压缩数据和变长编码表生成压缩数据封装。压缩数据封装还可以包括其他数据，如效验码(英文：CHECKSUM)等。After obtaining the variable-length encoding table and the compressed data, the compression engine generates compressed data encapsulation based on the compressed data and the variable-length encoding table. The compressed data encapsulation may also include other data, such as checksum (English: CHECKSUM) and the like.

本发明实施例提供的压缩方法的压缩率相较于基于LZ4(一种编码方法)的压缩方法的压缩率下降了30％左右。The compression rate of the compression method provided by the embodiment of the present invention is about 30% lower than that of the compression method based on LZ4 (a coding method).

综上所述，本发明实施例提供的压缩方法，通过生成码字的长度具有最大公约数的变长编码表，并根据该变长编码表对待压缩数据进行压缩，使得解压引擎在解压该压缩数据时，可以从多个位置并发开始解压，而不用等待解压完一个码字后再根据该码字的长度解压下一个码字，解压速度较高。To sum up, the compression method provided by the embodiment of the present invention generates a variable-length coding table with the largest common divisor in the length of the codeword, and compresses the data to be compressed according to the variable-length coding table, so that the decompression engine decompresses the compressed data. Data can be decompressed concurrently from multiple locations, instead of waiting for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword, and the decompression speed is relatively high.

图3A是本发明实施例示出的一种解压缩方法的流程图，本实施例以该解压缩方法应用于对图2A所示实施例提供的压缩方法生成的压缩数据封装进行解压来举例说明，该解压缩方法可以被图1A或图1B中的解压引擎来实现。该解压缩方法可以包括如下几个步骤：FIG. 3A is a flow chart of a decompression method shown in an embodiment of the present invention. In this embodiment, the decompression method is applied to decompress the compressed data package generated by the compression method provided in the embodiment shown in FIG. 2A as an example. The decompression method can be implemented by the decompression engine in FIG. 1A or FIG. 1B . The decompression method may include the following steps:

步骤301、解压引擎分析压缩数据封装，得到变长编码表和压缩数据。Step 301, the decompression engine analyzes the compressed data package, and obtains the variable-length code table and the compressed data.

在使用本发明实施例提供的解压缩方法时，解压引擎首先可以分析压缩数据封装，得到变长编码表和压缩数据。其中，压缩数据封装可以是图2A所示实施例提供的压缩方法压缩得到的压缩数据封装，该压缩数据封装中可以包括变长编码表、压缩数据和其他数据。When using the decompression method provided by the embodiment of the present invention, the decompression engine can first analyze the compressed data package to obtain the variable-length code table and compressed data. Wherein, the compressed data package may be a compressed data package compressed by the compression method provided in the embodiment shown in FIG. 2A , and the compressed data package may include a variable-length code table, compressed data and other data.

本步骤是可选的步骤，即解压引擎也可以直接得到变长编码和压缩数据。This step is an optional step, that is, the decompression engine can also directly obtain variable-length coded and compressed data.

步骤302、解压引擎按照压缩数据的生成顺序在每个时钟周期从压缩数据中读取码长为a*N+b的码段，第x时钟周期读取的码长为a*N+b的码段的后b位与第x+1时钟周期读取的码长为a*N+b的码段前b位相重合。Step 302, the decompression engine reads the code segment with a code length of a*N+b from the compressed data in each clock cycle according to the generation sequence of the compressed data, and the code segment read at the xth clock cycle is a*N+b The last b bits of the code segment coincide with the first b bits of the code segment whose code length is a*N+b read in the x+1th clock cycle.

在获取了变长编码表后，解压引擎可以按照压缩数据的生成顺序(即先生成的数据先读取，后生成数据的后读取)在每个时钟周期从压缩数据中读取码长为a*N+b的码段(a*N+b大于或等于压缩数据中最长的码字的码长c)，第x时钟周期读取的码长为a*N+b的码段的后b位与第x+1时钟周期读取的码长为a*N+b的码段前b位相重合，其中，N为大于0的整数，a代表上述实施例中提到的目标公约数，N的值可以根据硬件情况预先设置，b为压缩数据中最长码字的码长c和最短码字的码长a(最短码字的码长等于目标公约数a)的差值，a*N可以表示本发明实施例提供的解压方法的解压速度，即a和N越大，本发明实施例提供的解压方法的解压速度就越高。示例性的，在a＝3，c＝12，b＝c-a＝9时，解压引擎每个时钟周期读取3N+9位的数据。After obtaining the variable-length encoding table, the decompression engine can read the code length from the compressed data in each clock cycle according to the generation sequence of the compressed data (that is, the first-generated data is read first, and the second-generated data is read later). The code segment of a*N+b (a*N+b is greater than or equal to the code length c of the longest code word in the compressed data), the code length read in the xth clock cycle is the code segment of a*N+b The last b bits coincide with the first b bits of the code segment whose code length is a*N+b read in the x+1th clock cycle, where N is an integer greater than 0, and a represents the target common divisor mentioned in the above embodiment , the value of N can be preset according to the hardware situation, b is the difference between the code length c of the longest code word in the compressed data and the code length a of the shortest code word (the code length of the shortest code word is equal to the target common divisor a), a *N may represent the decompression speed of the decompression method provided by the embodiment of the present invention, that is, the larger a and N are, the higher the decompression speed of the decompression method provided by the embodiment of the present invention. Exemplarily, when a=3, c=12, b=c-a=9, the decompression engine reads 3N+9 bits of data every clock cycle.

压缩数据中，码字的码长的最大公约数a、压缩数据中最长码字的码长和最短码字的码长的差值b等数值，可以是解压引擎在解压之前就预先获知的，示例性的，可以是在设置解压引擎时就将这些数值写入了解压引擎。或者，码字的码长的最大公约数a、压缩数据中最长码字的码长和最短码字的码长的差值b等数值也可以是包括在变长编码表中，解压引擎可以在解析变长编码表时获取这些数值。In the compressed data, values such as the greatest common divisor a of the code length of the code word, the difference b between the code length of the longest code word and the code length of the shortest code word in the compressed data, etc., can be known in advance by the decompression engine before decompression , Exemplarily, these values may be written into the decompression engine when the decompression engine is set. Or, values such as the greatest common divisor a of the code length of the code word, the difference b between the code length of the longest code word and the code length of the shortest code word in the compressed data can also be included in the variable-length coding table, and the decompression engine can Get these values when parsing a variable-length coded table.

时钟周期(英文：Clock Cycle)也称为振荡周期，时钟周期是计算机中的时间单位。在一个时钟周期内，CPU完成一个基本的动作。A clock cycle (English: Clock Cycle) is also called an oscillation cycle, and a clock cycle is a unit of time in a computer. In one clock cycle, the CPU completes a basic action.

在步骤302中，解压引擎从压缩数据中每个时钟周期读取了固定码长的码段，因此可以以一个稳定的带宽向解压引擎传输数据，解压引擎对于带宽的利用率较高，本发明实施例提供的解压方法可以应用于带宽压缩场景，例如神经网络参数在线解压应用，而相关技术中由于每个时钟周期解压的码字的码长可能不同，因而解压引擎读取的码段的码长也不同，难以应用于带宽压缩场景。In step 302, the decompression engine reads the code segment of fixed code length from each clock cycle in the compressed data, so the data can be transmitted to the decompression engine with a stable bandwidth, and the decompression engine has a higher utilization rate of the bandwidth. The present invention The decompression method provided by the embodiment can be applied to bandwidth compression scenarios, such as the online decompression application of neural network parameters. In the related art, since the code length of the code word decompressed in each clock cycle may be different, the code segment of the code segment read by the decompression engine The length is also different, and it is difficult to apply to bandwidth compression scenarios.

在本发明实施例提供的解压方法中，解压引擎可以通过调整N的值来调节每个时钟周期读取的数据量，进而调节解压速度。In the decompression method provided by the embodiment of the present invention, the decompression engine can adjust the amount of data read in each clock cycle by adjusting the value of N, thereby adjusting the decompression speed.

由于相邻的两个时钟周期中的前一个时钟周期读取的码段末尾可能存在未解压的多位码元，为了避免这些码元遗漏，解压引擎在每个时钟周期读取码段时，将前一个时钟周期读取的码段的最后b位重复读取，这是因为整个压缩数据的码长是a的整数倍，前一个时钟周期最多可能未解压的码元的个数为b(在前一个时钟周期读取的码段中，最后一个码字为最大的长度c时，前一个时钟周期遗漏的未解压码元为最大个数b)，前一个时钟周期最少可能未解压的码元的个数为0。需要说明的是，实际上，前一个时钟周期未解压的码元的个数为区间[0,K]内一个整数与a的乘积,其中K等于b/a。Since there may be undecompressed multi-bit symbols at the end of the code segment read in the previous clock cycle of two adjacent clock cycles, in order to avoid the omission of these symbols, when the decompression engine reads the code segment in each clock cycle, The last b bits of the code segment read in the previous clock cycle are repeatedly read, because the code length of the entire compressed data is an integer multiple of a, and the number of symbols that may not be decompressed in the previous clock cycle is b( In the code segment read in the previous clock cycle, when the last code word has the maximum length c, the number of undecompressed symbols missed in the previous clock cycle is the maximum number b), and the least possible uncompressed code in the previous clock cycle The number of elements is 0. It should be noted that, in fact, the number of uncompressed symbols in the previous clock cycle is the product of an integer in the interval [0, K] and a, where K is equal to b/a.

步骤303、解压引擎根据第x时钟周期从压缩数据中读取的码长为a*N+b的码段，得到的(a*N+b)/a个子码段。Step 303, the decompression engine reads the code segment with a code length of a*N+b from the compressed data according to the xth clock cycle, and obtains (a*N+b)/a sub-code segments.

其中，x为大于或等于1的整数；(a*N+b)/a个子码段中每个子码段的码长为c，(a*N+b)/a个子码段中包括首个子码段，首个子码段的起始位与第x时钟周期读取的码段的起始位重合，且(a*N+b)/a个子码段中相邻两个子码段的起始位之间间隔的码长为a-1。Wherein, x is an integer greater than or equal to 1; the code length of each subcode segment in (a*N+b)/a subcode segments is c, and includes the first subcode segment in (a*N+b)/a subcode segments Code segment, the start bit of the first sub-code segment coincides with the start bit of the code segment read in the xth clock cycle, and the start of two adjacent sub-code segments in (a*N+b)/a sub-code segments The code length of the interval between bits is a-1.

步骤304、解压引擎基于变长编码表，并发地对(a*N+b)/a个子码段进行解压处理，得到(a*N+b)/a个字符，其中，解压每一子码段得到一个字符。在x等于1时，执行步骤305；在x为大于或等于2的整数时，执行步骤306。Step 304, the decompression engine decompresses (a*N+b)/a subcode segments concurrently based on the variable-length code table to obtain (a*N+b)/a characters, wherein each subcode is decompressed segment gets one character. When x is equal to 1, execute step 305; when x is an integer greater than or equal to 2, execute step 306.

在第x时钟周期读取的a*N+b位数据中，解压引擎可以并发的对多个子码段进行解压处理，在对每个子码段进行解压处理得到的一个字符对应的码字的起始码元为每个子码段的起始码元。In the a*N+b-bit data read in the x-th clock cycle, the decompression engine can decompress multiple sub-code segments concurrently, starting from the code word corresponding to a character obtained by decompressing each sub-code segment The initial symbol is the initial symbol of each subcode segment.

其中，解压引擎对码长为a*N+b的码段中的起始位为第N*a位至第(N+K-1)*a位开始的多个子码段根据变长编码表进行解压缩时，码长为a*N+b的码段中剩余的码元的个数不够c个，解压引擎可以将这些码元不够c个的子码段以预设值(该预设值可以随意设置，如全为0或全为1，或者0和1交替等)将码长填充至c，然后再对这些子码段进行解压缩，得到多个字符，之后可以检测每个子码段解压得到的字符在变长编码表中对应的码字的码长，如某个子码段解压得到的字符在变长编码表中对应的码字的码长大于该子码段在填充前的码长，则可以将该子码段解压得到的字符删除，在删除字符后，a*N+b位数据中存在未解压的若干码元。Among them, the decompression engine is based on the variable-length coding table for the multiple sub-code segments whose start bit is the N*ath bit to the (N+K-1)*a bit in the code segment whose code length is a*N+b When decompressing, if the number of remaining symbols in the code segment with a code length of a*N+b is less than c, the decompression engine can use a preset value (the preset The value can be set arbitrarily, such as all 0 or all 1, or 0 and 1 alternately, etc.) fill the code length to c, and then decompress these subcode segments to obtain multiple characters, and then each subcode can be detected The code length of the corresponding code word in the variable-length code table of the character obtained by segment decompression, such as the code length of the code word corresponding to the character obtained by decompressing a certain sub-code segment in the variable-length code table is greater than the code length of the sub-code segment before filling If the code length is long, the characters obtained by decompressing the sub-code segment can be deleted. After the characters are deleted, there are some code elements that have not been decompressed in the a*N+b bit data.

示例性的，第x时钟周期读取的码段为“000 001 010 011 100”，a＝3，c＝6，依照表1所示的变长编码表，同时对子码段“000 001”、“001 010”、“010 011”、“011 100”和“100000”进行解压缩，得到的字符为码字000对应的字符“A”，码字001对应的字符“B”，码字010对应的字符“C”、码字011对应的字符“D”以及码字“100”对应的字符“E”。其中“100 000”为填充为6位的子码段，根据该6位子码段解压得到的字符“E”对应的码字的码长为3，不大于“100 000”在填充前的长度，因而可以不删除字符“E”。Exemplarily, the code segment read in the xth clock cycle is "000 001 010 011 100", a=3, c=6, according to the variable-length coding table shown in Table 1, and the sub-code segment "000 001" , "001 010", "010 011", "011 100" and "100000" are decompressed, and the obtained characters are the character "A" corresponding to the code word 000, the character "B" corresponding to the code word 001, and the code word 010 The corresponding character "C", the character "D" corresponding to the code word 011, and the character "E" corresponding to the code word "100". Among them, "100 000" is a sub-code segment filled with 6 bits, and the code length of the code word corresponding to the character "E" obtained by decompressing the 6-bit sub-code segment is 3, which is not greater than the length of "100 000" before filling, Thus, the character "E" may not be deleted.

需要说明的是，压缩数据中的码字与将该压缩数据进行解压得到的目标解压数据中的字符是一一对应的。压缩数据中的第K个码字是与目标解压数据中的第K个字符对应的，确定了某个码字的起始位，就能够确定该码字(从起始位开始读取最少a位码元，最多c位码元，能够从变长编码表中得到这些码元表示的一个码字)。现有技术中，压缩数据中的每个码字的码长不全相同，不按照压缩数据中的顺序解压出一个码字前，解压引擎无法获知该码字的码长，进而也无法获知下一个码字的起始位。但由于本发明实施例提供的解压方法中的压缩数据中的码字的码长具有最大公约数a，因而解压数据中每个字符对应的码字的起始位是包括在(a*N+b)/a个子码段中每个子码段的起始位中的，可以称对压缩数据解压得到的目标解压数据中每个字符对应的码字(从变长编码表中可以得知每个字符对应的码字)的起始位为(a*N+b)/a个子码段的起始位中的正确起始位(正确起始位是以该位在整个压缩数据中的顺序来确定的，如解压数据中第一个字符对应的码字的起始位为整个压缩数据的第一位，则整个压缩数据中的第一位即为解压数据中第一个字符对应的码字的正确起始位)，而将(a*N+b)/a个子码段的起始位中除正确起始位的其他位称为错误起始位。步骤304并发对(a*N+b)/a个子码段进行解压处理得到的多个字符中，包括从正确起始位开始解压得到的有效字符和从错误起始位开始解压得到的无效字符。It should be noted that there is a one-to-one correspondence between the codewords in the compressed data and the characters in the target decompressed data obtained by decompressing the compressed data. The Kth codeword in the compressed data is corresponding to the Kth character in the target decompressed data, and the start bit of a certain codeword is determined, and the codeword can be determined (read at least a bit code elements, up to c bit code elements, can obtain a codeword represented by these code elements from the variable-length coding table). In the prior art, the code length of each codeword in the compressed data is not the same. Before a codeword is decompressed out of the order in the compressed data, the decompression engine cannot know the code length of the codeword, and thus cannot know the next codeword. The start bit of the codeword. But because the code length of the code word in the compressed data in the decompression method that the embodiment of the present invention provides has the greatest common divisor a, thereby the start bit of the code word corresponding to each character in the decompressed data is included in (a*N+ In the start bit of each subcode segment in b)/a subcode segments, it can be said that the code word corresponding to each character in the target decompressed data obtained by decompressing the compressed data (from the variable-length coding table, it can be known that each The start bit of the codeword corresponding to the character) is the correct start bit in the start bits of (a*N+b)/a subcode segments (the correct start bit is based on the order of the bit in the entire compressed data Determined, if the start bit of the codeword corresponding to the first character in the decompressed data is the first bit of the entire compressed data, then the first bit in the entire compressed data is the codeword corresponding to the first character in the decompressed data The correct start bit), and the other bits except the correct start bit among the start bits of (a*N+b)/a subcode segments are called wrong start bits. In step 304, multiple characters obtained by decompressing (a*N+b)/a subcode segments concurrently include valid characters decompressed from the correct start bit and invalid characters decompressed from the wrong start bit .

步骤305，在x等于1时，解压引擎从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据，第x时钟周期的目标解压数据中第一个有效字符对应的码字的起始码元和第x时钟周期读取的码段的起始码元相重合。Step 305, when x is equal to 1, the decompression engine determines the target decompressed data of the xth clock cycle from the (a*N+b)/a characters, and the first valid character in the target decompressed data of the xth clock cycle corresponds to The starting symbol of the code word coincides with the starting symbol of the code segment read in the xth clock cycle.

其中，第x时钟周期的目标解压数据中，每相邻的两个有效字符中一个有效字符对应的码字的末位码元在第x时钟周期读取的码段中的位置和另一个有效字符对应的码字的起始码元在第x时钟周期读取的码段中的位置相邻。第x时钟周期的目标解压数据包括按照特定顺序排列的多个有效字符。Among them, in the target decompressed data of the xth clock cycle, the position of the last code element of the codeword corresponding to one valid character in every two adjacent valid characters is the same as the other effective The starting symbols of the codewords corresponding to the characters are adjacent to each other in the code segment read in the xth clock cycle. The target decompressed data of the xth clock cycle includes a plurality of valid characters arranged in a specific order.

解压引擎在得到了(a*N+b)/a个字符后，就得知了这(a*N+b)/a个字符中每个字符对应的码字的码长。因为这多个字符中包括从错误起始位开始解压得到的无效字符，因而可以从中选出多个有效字符进行拼接，以得到第x时钟周期的目标解压数据，而选择的依据可以是先确定第x时钟周期读取的码长为a*N+b的码段中第一个码字的正确起始位，然后根据该正确起始位以及第一个码字的码长来确定第二个码字的正确起始位(第二个码字的正确起始位为第一个码字的末位的下一位)，以此类推，解压引擎就能够获得每个有效字符对应的码字的正确起始位，进而就能够选出多个有效字符，将这多个有效字符以每个有效字符对应的码字在第x时钟周期读取的a*N+b位数据中的先后顺序进行拼接，就得到了第x时钟周期读取的数据的目标解压数据。After the decompression engine obtains (a*N+b)/a characters, it knows the code length of the code word corresponding to each character in the (a*N+b)/a characters. Because these multiple characters include invalid characters decompressed from the wrong start bit, multiple valid characters can be selected from them for splicing to obtain the target decompressed data of the xth clock cycle, and the basis for selection can be to first determine The correct start bit of the first code word in the code segment whose code length is a*N+b read in the xth clock cycle, and then determine the second code word according to the correct start bit and the code length of the first code word The correct start bit of the first codeword (the correct start bit of the second codeword is the next bit after the last bit of the first codeword), and so on, the decompression engine can obtain the code corresponding to each valid character The correct start bit of the word, and then just be able to select a plurality of valid characters, these multiple valid characters are in order of the a*N+b bit data read in the code word corresponding to each valid character in the xth clock cycle By splicing in sequence, the target decompressed data of the data read in the xth clock cycle is obtained.

在x＝1时，第x时钟周期读取的码长为a*N+b的码段的起始位即为压缩数据的起始位，该起始位为第x时钟周期读取的码长为a*N+b的码段中第一个码字的正确起始位。When x=1, the start bit of the code segment whose code length is a*N+b read in the xth clock cycle is the start bit of the compressed data, and the start bit is the code read in the xth clock cycle The correct start bit of the first codeword in the code segment of length a*N+b.

示例性的，如图3B所示，其为第x时钟周期读取码长为a*N+b的码段中的前24位码元，在步骤304中，可以得到其中每个码字的码长以及每个码字对应的字符，即0至5位构成的码长为6的码字对应字符A，3至8位构成的码长为6的码字对应字符B，6至14位构成的码长为9的码字对应字符C，9至14位构成的码长为6的码字对应字符D，12至23位构成的码长为12的码字对应字符E，15至17位构成的码长为3的码字对应字符F，其中，码字的码长以及码字对应的字符可以如表2所示：Exemplarily, as shown in Figure 3B, it reads the first 24 code elements in the code segment whose code length is a*N+b for the xth clock cycle, and in step 304, the value of each codeword can be obtained The code length and the characters corresponding to each code word, that is, a code word with a code length of 6 composed of 0 to 5 bits corresponds to character A, a code word with a code length of 6 composed of 3 to 8 bits corresponds to character B, and 6 to 14 bits The formed code length is the corresponding character C of the code word of 9, the code word corresponding character D of the code length of 6 is the code word formed by the code length of 9 to 14 bits, and the corresponding character E of the code word of the code length of 12 formed by the 12 to 23 bits, 15 to 17 The code length that the bit constitutes is that the code word corresponding character F of 3, wherein, the code length of code word and the character corresponding to code word can be as shown in Table 2:

表2Table 2

在表2中，每个字符自上而下的顺序，和每个字符对应的码字的起始位在第x时钟周期读取码长为a*N+b的码段中的先后顺序是一致的。从图3B示出的24位码元以及该24位码元对应的多个字符中确定有效字符的方式可以为：以该24位码元的起始位作为第一个码字的起始位(该起始位为第一个有效字符对应的码字的正确起始位)，确定了第一个码字的码长为6，且第一个码字对应的字符为A，由于第一个码字的码长为6，因而第二个码字的正确起始位为该24位码元的第6位(前0至5位被第一个码字占用)，而以该24位码元的第6位为起始位置的码字对应的字符为C，因而可以将字符C作为第二个有效字符，由于第二个有效字符对应的码字的码长为9，因而第三个码字的正确起始位为该24位码元的第15位(前0至14位被第一个码字和第二个码字占用)，以第15位为起始位置的码字对应的字符为F，因而得到的解压数据为A C F。In Table 2, the order of each character from top to bottom, and the start bit of the code word corresponding to each character in the code segment with code length a*N+b read in the xth clock cycle is consistent. The mode of determining valid characters from the 24-bit code element shown in Figure 3B and the corresponding characters of the 24-bit code element can be as follows: the start bit of the 24-bit code element is used as the start bit of the first code word (the start bit is the correct start bit of the codeword corresponding to the first valid character), the code length of the first codeword is determined to be 6, and the character corresponding to the first codeword is A, because the first The code length of a codeword is 6, thus the correct start bit of the second codeword is the 6th bit of the 24-bit symbol (the first 0 to 5 bits are occupied by the first codeword), and the 24-bit The 6th bit of the code element is the character corresponding to the code word of the starting position is C, so character C can be used as the second effective character, because the code length of the code word corresponding to the second effective character is 9, so the third The correct start bit of a code word is the 15th bit of the 24-bit code element (the first 0 to 14 bits are occupied by the first code word and the second code word), and the code word with the 15th bit as the starting position The corresponding character is F, so the obtained decompressed data is A C F.

步骤306、在x为大于或等于2的整数时，解压引擎从(a*N+b)/a个字符中确定出c/a组候选解压数据。Step 306, when x is an integer greater than or equal to 2, the decompression engine determines c/a group of candidate decompression data from (a*N+b)/a characters.

其中，每组候选解压数据包括多个字符，且每相邻的两个字符中，一个字符对应的码字的末位码元在第x时钟周期读取的码段中的位置和另一个字符对应的码字的起始码元在第x时钟周期读取的码段中的位置是相邻的；每两组候选解压数据内第一个字符对应的码字的起始码元在第x时钟周期读取的码段中的位置是不相同的，且每组候选解压数据内第一个字符对应的码字的起始码元为第x时钟周期读取的码段中的第W*a个码元，W为大于或等于0且小于或等于b/a的整数。Wherein, each group of candidate decompressed data includes a plurality of characters, and in every two adjacent characters, the position of the last code element of the code word corresponding to a character in the code segment read in the xth clock cycle and the position of another character The position of the start symbol of the corresponding codeword in the code segment read in the xth clock cycle is adjacent; the start symbol of the codeword corresponding to the first character in each two groups of candidate decompressed data is at the xth clock cycle The position in the code segment read by the clock cycle is different, and the starting code element of the code word corresponding to the first character in each group of candidate decompressed data is the W*th in the code segment read by the xth clock cycle a symbols, W is an integer greater than or equal to 0 and less than or equal to b/a.

在x为大于或等于2的整数时，由于第x-1时钟周期读取的数据还未完成解压缩，解压引擎难以获知第x-1时钟周期读取的数据中未解压的码元的个数，因而解压引擎当前难以确定读取的码长为a*N+b的码段中第一个码字的正确起始位。此时，解压引擎可以确定出所有可能的正确起始位(所有可能的正确起始位为第x时钟周期读取的码长为a*N+b的码段中的第H*a位，H取0至K中的每个整数，K＝b/a)，然后分别以这些所有可能的正确起始位为第一个码字的正确起始位，以步骤305中的方式得到c/a个候选解压数据。When x is an integer greater than or equal to 2, since the data read in the x-1th clock cycle has not been decompressed, it is difficult for the decompression engine to know the number of undecompressed symbols in the data read in the x-1th clock cycle Therefore, it is currently difficult for the decompression engine to determine the correct start bit of the first codeword in the read code segment with a code length of a*N+b. At this point, the decompression engine can determine all possible correct start bits (all possible correct start bits are the H*a bit in the code segment whose code length is a*N+b read in the xth clock cycle, H gets each integer from 0 to K, K=b/a), then respectively use these all possible correct start bits as the correct start bits of the first code word, and obtain c/ in the manner in step 305 a candidate decompressed data.

其中，由于第x时钟周期读取的码段的前b位与第x-1时钟周期读取的码段的后b位是重复的，而这重复的b位中的前几位可能存在第x-1时钟周期读取的码段中最后一个码字的一部分，第x时钟周期读取的码段中第一个码字的正确起始位为第x-1时钟周期读取的码段中最后一个码字的末位的下一位。Wherein, since the first b bits of the code segment read in the x-th clock cycle are repeated with the last b bits of the code segment read in the x-1 clock cycle, and the first few bits in the repeated b bits may have the first A part of the last code word in the code segment read by x-1 clock cycle, the correct start bit of the first code word in the code segment read by x-1 clock cycle is the code segment read by x-1 clock cycle The next digit of the last digit of the last codeword in .

由于压缩数据中码字的码长的最大公约数为a，因而第x时钟周期读取的码段的前b位中存在的第x-1时钟周期读取的码段的长度可以为0至K中的任一整数与a的乘积(在长度为0*a时，表明第x时钟周期读取的码段的前b位中不存在第x-1时钟周期读取的码段中最后一个码字的一部分)，因而第x时钟周期读取的码段的前b位码元中的第H*a位可能是第x时钟周期读取的码段中第一个码字的正确起始位，而候选解压数据的个数取决于0至K中整数的个数，该个数等于K+1＝b/a+1＝(a+b)/a＝c/a，其中，给K加1是因为0也算作0至K中的一个整数。Since the greatest common divisor of the code length of the code word in the compressed data is a, the length of the code segment read in the x-1th clock cycle that exists in the first b bits of the code segment read in the xth clock cycle can be from 0 to The product of any integer in K and a (when the length is 0*a, it indicates that the last bit of the code segment read in the x-1 clock cycle does not exist in the first b bits of the code segment read in the xth clock cycle Part of the code word), so the H*a bit in the first b-bit symbols of the code segment read in the xth clock cycle may be the correct start of the first code word in the code segment read in the xth clock cycle bit, and the number of candidate decompressed data depends on the number of integers from 0 to K, and this number is equal to K+1=b/a+1=(a+b)/a=c/a, wherein, for K 1 is added because 0 also counts as an integer from 0 to K.

示例性的，c＝9，a＝3，第x时钟周期读取的码段的前9位为“000 001 010”，则其中的第0位、第3位和第6位为可能的正确起始位，解压引擎可以分别将第0位、第3位和第6位作为3个候选解压数据中第一个字符对应的码字的起始位，并分别以步骤305中的方式得到3个候选解压数据。Exemplarily, c=9, a=3, the first 9 digits of the code segment read in the xth clock cycle are "000 001 010", then the 0th, 3rd and 6th digits are possible correct Start bit, the decompression engine can respectively use the 0th bit, the 3rd bit and the 6th bit as the start bit of the codeword corresponding to the first character in the 3 candidate decompressed data, and obtain 3 in the manner in step 305 respectively candidate decompressed data.

步骤307、解压引擎从c/a组候选解压数据中确定出第x时钟周期的目标解压数据。Step 307, the decompression engine determines the target decompressed data of the xth clock cycle from the c/a group of candidate decompressed data.

在压缩数据中，第x-1时钟周期的目标解压数据中最后一个字符对应的码字的末位码元和第x时钟周期的目标解压数据中第一个字符对应的码字的起始码元是相邻的，第x-1时钟周期的目标解压数据是指第x-1时钟周期读取的码段对应的目标解压数据。在x等于1的情况下，第x-1时钟周期的目标解压数据即为对第1时钟周期读取的码段进行解压得到的按照特定顺序排列的多个有效字符。In the compressed data, the last code element of the codeword corresponding to the last character in the target decompressed data of the x-1th clock cycle and the start code of the codeword corresponding to the first character in the target decompressed data of the xth clock cycle elements are adjacent, and the target decompressed data of the x-1th clock cycle refers to the target decompressed data corresponding to the code segment read in the x-1th clock cycle. When x is equal to 1, the target decompressed data of the x-1th clock cycle is a plurality of valid characters arranged in a specific order obtained by decompressing the code segment read in the first clock cycle.

解压引擎在得到了第x时钟周期的c/a个候选解压数据后，同时也完成了对第x-1时钟周期读取的码段的解压，得到了第x-1时钟周期的目标解压数据，解压引擎可以根据第x-1时钟周期的目标解压数据确定第x时钟周期读取的数据中第一个码字的正确起始位，并从c/a个候选解压数据中选出第x时钟周期的目标解压数据。After the decompression engine obtains the c/a candidate decompressed data of the xth clock cycle, it also completes the decompression of the code segment read in the x-1th clock cycle, and obtains the target decompressed data of the x-1th clock cycle , the decompression engine can determine the correct start bit of the first codeword in the data read in the xth clock cycle according to the target decompressed data of the x-1th clock cycle, and select the xth from the c/a candidate decompressed data Clock cycle target to decompress data.

示例性的，第x-1时钟周期的解压数据为“ABCD”，对应的码字为“000 001 010011”，第x-1时钟周期读取的码段为“000 001 010 011 101”，b＝6，a＝3，c＝9， N＝4，a*N+b＝15，第x时钟周期读取的码段为“011 101000 101001”，其中前6位为第x-1时钟周期读取的码段的后6位，第x-1时钟周期读取的码段中后三位“101”为一个6位的码字的前三位，以步骤304中的方式不会对这三位进行解压缩，这三位数据为未解压的码段，可以将这三位码段的起始码元确定为第x时钟周期的目标解压数据中第一个字符对应的码字的起始码元。Exemplarily, the decompressed data of the x-1 clock cycle is "ABCD", the corresponding code word is "000 001 010011", and the code segment read in the x-1 clock cycle is "000 001 010 011 101", b =6, a=3, c=9, N=4, a*N+b=15, the code segment read in the xth clock cycle is "011 101000 101001", where the first 6 bits are the x-1th clock cycle The last 6 bits of the read code segment, the last three "101" in the code segment read in the x-1th clock cycle are the first three bits of a 6-bit code word, which will not be corrected in the manner in step 304 Three bits are decompressed, and these three bits of data are code segments not decompressed, and the starting code element of these three bits of code segments can be determined as the starting code word corresponding to the first character in the target decompressed data of the xth clock cycle start code unit.

需要说明的是，本发明实施例在对每个时钟周期读取的码段是并发的进行处理的，即解压引擎在第x-1时钟周期读取了码段后，解压引擎在对在第x-1时钟周期读取的码段进行解压缩时，解压引擎又在第x时钟周期读取了码段，并同时开始对第x时钟周期读取的码段进行解压。第x-1时钟周期读取的码段比第x时钟周期读取的码段先读取一个时钟周期，因而解压引擎在对第x时钟周期读取的码段进行步骤303至步骤307的任何一种处理时，都比解压引擎在对x-1时钟周期读取的码段进行相同处理慢一个步骤。因此，解压引擎在对第x时钟周期读取的码段的处理过程中，在执行步骤306时，解压引擎已经完成了对第x-1时钟周期读取的码段的解压，得到了第x-1时钟周期的目标解压数据。It should be noted that, in the embodiment of the present invention, the code segment read in each clock cycle is processed concurrently, that is, after the decompression engine reads the code segment in the x-1th clock cycle, the decompression engine processes the When the code segment read in the x-1 clock cycle is decompressed, the decompression engine reads the code segment in the xth clock cycle and starts to decompress the code segment read in the xth clock cycle at the same time. The code segment read in the x-1th clock cycle is read one clock cycle earlier than the code segment read in the xth clock cycle, so the decompression engine performs any step from step 303 to step 307 on the code segment read in the xth clock cycle Each process is one step slower than the decompression engine performing the same process on the code segment read by x-1 clock cycles. Therefore, when the decompression engine is processing the code segment read in the x-th clock cycle, when step 306 is executed, the decompression engine has completed the decompression of the code segment read in the x-1 clock cycle, and the x-th clock cycle is obtained. -1 clock cycle target to decompress data.

通过步骤302至步骤307能够将压缩数据中的全部数据进行解压缩，获取每个时钟周期的目标解压数据。Through steps 302 to 307, all the data in the compressed data can be decompressed, and the target decompressed data of each clock cycle can be obtained.

步骤308、解压引擎根据压缩数据，将获取的多个时钟周期的解压数据进行拼接，获取压缩数据对应的解压数据。Step 308, the decompression engine splices the obtained decompressed data of multiple clock cycles according to the compressed data, and obtains decompressed data corresponding to the compressed data.

其中，拼接后相邻的两个时钟周期中，一个时钟周期的目标解压数据内最后一个字符对应的码字的末位码元在压缩数据中的位置和另一个时钟周期的目标解压数据内第一个字符对应的码字的起始码元在压缩数据中的位置是相邻的。Wherein, in two adjacent clock cycles after splicing, the position of the last code element of the code word corresponding to the last character in the target decompressed data of one clock cycle in the compressed data and the first position in the target decompressed data of another clock cycle The positions of the starting code elements of the code word corresponding to a character in the compressed data are adjacent.

如图3C所示，其为应用本发明实施例提供的解压缩方法时、应用基于霍夫曼编码的解压缩方法时以及应用基于LZ4的解压缩方法时，解压引擎所占用的芯片面积的条形图，其中，纵轴代表芯片面积，芯片的面积单位为平方毫米，芯片是指16纳米(nm)工艺的芯片。其中，条形31代表应用基于霍夫曼编码的解压缩方法时，解压引擎所占用的芯片面积，该面积为0.05平方毫米。条形32代表最大公约数为3，应用本发明实施例提供的解压缩方法时，解压引擎所占用的芯片面积，该面积为0.035平方毫米。条形33代表最大公约数为2，应用本发明实施例提供的解压缩方法时，解压引擎所占用的芯片面积，该面积为 0.025平方毫米。条形34代表应用基于LZ4的解压缩方法时，解压引擎所占用的芯片面积，该面积为0.16平方毫米。可以看出本发明实施例提供的解压缩方法，所占用的芯片面积小于相关技术中的解压缩方法。As shown in Figure 3C, it is the bar of the chip area occupied by the decompression engine when applying the decompression method provided by the embodiment of the present invention, when applying the decompression method based on Huffman coding, and when applying the decompression method based on LZ4. In the graph, the vertical axis represents the area of the chip, the unit of the area of the chip is square millimeter, and the chip refers to a chip of 16 nanometer (nm) process. Wherein, the bar 31 represents the chip area occupied by the decompression engine when the decompression method based on Huffman coding is applied, and the area is 0.05 square millimeters. The bar 32 represents that the greatest common divisor is 3. When the decompression method provided by the embodiment of the present invention is applied, the chip area occupied by the decompression engine is 0.035 square millimeters. Bar 33 represents that the greatest common divisor is 2. When the decompression method provided by the embodiment of the present invention is applied, the chip area occupied by the decompression engine is 0.025 square millimeters. Bar 34 represents the chip area occupied by the decompression engine when the LZ4-based decompression method is applied, and the area is 0.16 square millimeters. It can be seen that the decompression method provided by the embodiment of the present invention occupies less chip area than the decompression method in the related art.

此外，本发明实施例提供的解压缩方法的解压速度可以达到根据哈夫曼编码生成的压缩数据封装的解压速度的10倍左右，并与基于LZ4生成的压缩数据封装的解压速度基本持平。In addition, the decompression speed of the decompression method provided by the embodiment of the present invention can reach about 10 times the decompression speed of the compressed data package generated based on Huffman coding, and is basically the same as the decompression speed of the compressed data package generated based on LZ4.

综上所述，本发明实施例提供的解压缩方法，在对压缩数据进行解压缩时，通过并发地对(a*N+b)/a个子码段进行解压处理，得到(a*N+b)/a个字符，然后根据变长编码表和第x时钟周期读取的码段，从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据。不用等待解压完一个码字后再根据该码字的长度解压下一个码字。解决了相关技术中解压速度较低的问题。在进行解压缩时可以同时从多个位置开始解压缩，解压速度较高。In summary, the decompression method provided by the embodiment of the present invention, when decompressing compressed data, decompresses (a*N+b)/a subcode segments concurrently to obtain (a*N+ b)/a characters, and then determine the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length code table and the code segment read in the xth clock cycle. There is no need to wait for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword. The problem of low decompression speed in the related technology is solved. During decompression, decompression can be started from multiple locations at the same time, and the decompression speed is relatively high.

图4是本发明实施例提供的一种压缩装置的结构方框图，该压缩装置可以成为终端或服务器的部分或者全部。该压缩装置400可以包括：Fig. 4 is a structural block diagram of a compression device provided by an embodiment of the present invention, and the compression device may be part or all of a terminal or a server. The compression device 400 may include:

公约数确定单元410，用于执行上述实施例中的步骤202。The common divisor determination unit 410 is configured to execute step 202 in the above embodiment.

编码表生成单元420，用于执行上述实施例中的步骤203和步骤201。The coding table generating unit 420 is configured to execute step 203 and step 201 in the above-mentioned embodiment.

压缩单元430，用于执行上述实施例中的步骤204。The compression unit 430 is configured to execute step 204 in the above embodiment.

封装生成单元440，用于执行上述实施例中的步骤205。The package generation unit 440 is configured to execute step 205 in the above embodiment.

综上所述，本发明实施例提供的压缩装置，通过生成码字的长度具有最大公约数的变长编码表，并根据该变长编码表对待压缩数据进行压缩，使得解压引擎在解压该压缩数据时，可以从多个位置并发开始解压，而不用等待解压完一个码字后再根据该码字的长度解压下一个码字，解压速度较高。To sum up, the compression device provided by the embodiment of the present invention generates a variable-length coding table whose codeword length has the greatest common divisor, and compresses the data to be compressed according to the variable-length coding table, so that the decompression engine decompresses the compressed data. Data can be decompressed concurrently from multiple locations, instead of waiting for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword, and the decompression speed is relatively high.

图5A是本发明实施例提供的一种解压缩装置的结构方框图。该解压缩装置500可以包括：Fig. 5A is a structural block diagram of a decompression device provided by an embodiment of the present invention. The decompression device 500 may include:

子码段确定单元510，用于执行上述实施例中的步骤301、302和303。The subcode segment determining unit 510 is configured to execute steps 301, 302 and 303 in the above-mentioned embodiment.

解压单元520，用于执行上述实施例中的步骤304。The decompression unit 520 is configured to execute step 304 in the above embodiment.

解压数据确定单元530，用于执行上述实施例中的步骤305、306和307。The decompressed data determining unit 530 is configured to execute steps 305, 306, and 307 in the above-mentioned embodiments.

可选的，如图5B所示，该解压缩装置500还可以包括：Optionally, as shown in FIG. 5B, the decompression device 500 may also include:

拼接单元540，用于执行上述实施例中的步骤308。The splicing unit 540 is configured to execute step 308 in the foregoing embodiment.

综上所述，本发明实施例提供的解压缩装置，在对压缩数据进行解压缩时，通过并发地对(a*N+b)/a个子码段进行解压处理，得到(a*N+b)/a个字符，然后根据变长编码表和第x时钟周期读取的码段，从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据。不用等待解压完一个码字后再根据该码字的长度解压下一个码字。解决了相关技术中解压速度较低的问题。在进行解压缩时可以同时从多个位置开始解压缩，解压速度较高。In summary, the decompression device provided by the embodiment of the present invention, when decompressing compressed data, decompresses (a*N+b)/a sub-code segments concurrently to obtain (a*N+ b)/a characters, and then determine the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length code table and the code segment read in the xth clock cycle. There is no need to wait for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword. The problem of low decompression speed in the related technology is solved. During decompression, decompression can be started from multiple locations at the same time, and the decompression speed is relatively high.

参见附图6，为本发明实施例提供的一种数据处理系统的结构示意图，该数据处理系统包括压缩装置61和解压缩装置63，压缩装置61可以为前述实施例所述的压缩装置，解压缩装置63可以为前述实施例所述的解压缩装置。压缩装置61用于对待压缩数据进行压缩得到压缩数据；解压缩装置63用于对所述压缩数据进行解压处理。该数据处理系统可以应用于终端或服务器中。其中，终端可以包括智能手机、平板电脑、计算机和膝上型计算机，而服务器可以是一台服务器，或者由若干台服务器组成的服务器集群，或者是一个云计算服务中心。Referring to accompanying drawing 6, it is a schematic structural diagram of a data processing system provided by an embodiment of the present invention, the data processing system includes a compression device 61 and a decompression device 63, the compression device 61 can be the compression device described in the foregoing embodiment, and the decompression The device 63 may be the decompression device described in the foregoing embodiments. The compression device 61 is used to compress the data to be compressed to obtain compressed data; the decompression device 63 is used to decompress the compressed data. The data processing system can be applied to a terminal or a server. Wherein, the terminal may include a smart phone, a tablet computer, a computer and a laptop computer, and the server may be a server, or a server cluster composed of several servers, or a cloud computing service center.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by hardware, and can also be completed by instructing related hardware through a program. The program can be stored in a computer-readable storage medium. The above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

以上所述仅为本发明实施例的可选实施例，并不用以限制本申请，凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包括在本申请的保护范围之内。The above descriptions are only optional embodiments of the embodiments of the present invention, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the present application within the scope of protection.

Claims

1. a kind of decompression method, it is characterised in that methods described includes：

The code length read from compressed data according to the xth clock cycle is the code section of a*N+b, (a*N+b) for obtaining/a subcode Section；Wherein, x is the integer more than or equal to 1；The code length of each subcode section is c in (a*N+b)/a subcode section, described (a*N+b)/a subcode section includes first subcode section, and the start bit of the first subcode section reads with the xth clock cycle The start bit of code section overlap, interval between the start bit of two neighboring subcode section in and (a*N+b)/a subcode section Code length is a-1, and the N is the integer more than 0；

Based on variable length coding table, decompression processing concomitantly is carried out to (a*N+b)/a subcode section, obtain (a*N+b)/a Character, wherein, decompressing each subcode section can obtain a character；

The variable length coding table includes multiple code words, and each character corresponds to the elongated volume in (a*N+b)/a character A code word in code table；The corresponding code length of at least two code words is different in the multiple code word, correspondence in the multiple code word The corresponding code length of code length code word most long is c, and the most short corresponding code length of code word of correspondence code length is a in the multiple code word, and a It is the greatest common divisor of the corresponding code length of each code word in the multiple code word, wherein, a is the integer more than or equal to 2, and c is Integer more than a, the difference of c and a is b；

According to the code section that the variable length coding table and the xth clock cycle read, determine from (a*N+b)/a character Go out the target decompression data of xth clock cycle, wherein, in the target decompression data of the xth clock cycle, per adjacent two Position of the last bit symbols of the corresponding code word of significant character in the code section that the xth clock cycle reads in significant character Position of the start element of code word corresponding with another significant character in the code section that the xth clock cycle reads is adjacent.

2. method according to claim 1, it is characterised in that

The code length that the xth clock cycle reads is that the rear b code length read with the (x+1)th clock cycle of the code section of a*N+b is B coincides before the code section of a*N+b.

3. method according to claim 1 and 2, it is characterised in that described according to the change in the case of being equal to 1 in x The code section that long codes table and the xth clock cycle read, the xth clock cycle is determined from (a*N+b)/a character Target decompression data, specifically include：

The target decompression data of xth clock cycle, the target of the xth clock cycle are determined from (a*N+b)/a character The starting of the code section that the start element of the corresponding code word of first significant character and the xth clock cycle read in decompression data Code element coincides.

4. method according to claim 1 and 2, it is characterised in that in the case where x is the integer more than or equal to 2, institute The code section read according to the variable length coding table and the xth clock cycle is stated, is determined from (a*N+b)/a character The target decompression data of xth clock cycle, specifically include：

Determine that c/a groups candidate decompresses data from (a*N+b)/a character；Every group of candidate's decompression packet includes multiple characters, And per in two adjacent characters, the code section that the last bit symbols of the corresponding code word of character read in the xth clock cycle In the position of the start element in code that the xth clock cycle reads section of position code word corresponding with another character be Adjacent；The start element of the corresponding code word of first character is read in the xth clock cycle in every two groups of candidates decompression data Position in the code section for taking differs, and in every group of candidate decompression data the corresponding code word of first character start element The W*a code element in the code section read for the xth clock cycle, W is more than or equal to 0 and whole less than or equal to b/a Number；

Determine that the target of xth clock cycle decompresses data from c/a groups candidate decompression data；In the compressed data In, the last bit symbols and the xth clock of the corresponding code word of last character in the target decompression data of the clock cycle of xth -1 The start element of the corresponding code word of first character is adjacent, the clock cycle of xth -1 in the target decompression data in cycle Target decompression data refer to the corresponding target decompression data of the code section for reading clock cycle of xth -1.

5. method according to claim 1, it is characterised in that the target for determining the xth clock cycle decompress data it Afterwards, methods described also includes：

According to the compressed data, the decompression data of the multiple clock cycle for obtaining are spliced, obtained the compressed data Corresponding decompression data, in two clock cycle adjacent after splicing, in the target decompression data of clock cycle last The target decompression number of position and another clock cycle of the last bit symbols of the corresponding code word of individual character in the compressed data It is adjacent according to position of the start element of the corresponding code word of interior first character in the compressed data.

6. a kind of compression method, it is characterised in that methods described includes：

Determine the target common divisor of the corresponding code length of each code word in variable length coding table；The variable length coding table includes multiple codes Word, the target common divisor is the integer more than or equal to 2, and each character in data to be compressed corresponds to the variable-length encoding A code word in table；

Each character and the target common divisor in the data to be compressed, generate the variable length coding table；The change The code length of at least two code words is different in long codes table, the probability of occurrence corresponding code word of character higher in the data to be compressed Code length of the code length less than the relatively low corresponding code word of character of probability of occurrence, the maximum of the code length of code word in the variable length coding table Common divisor is the target common divisor；

The data to be compressed are compressed according to the variable length coding table, obtain compressed data；

Based on the compressed data and variable length coding table generation compressed data encapsulation.

7. method according to claim 6, it is characterised in that the corresponding code of each code word in the determination variable length coding table Target common divisor long, including：

Determine at least one common divisor, according to the variable length coding table corresponding to each common divisor at least one common divisor, The data to be compressed are compressed, the compression ratio for obtaining respectively less than preset value, wherein, according at least one common divisor In each common divisor, can generate a corresponding variable length coding table；

Determine the target common divisor from least one common divisor, the numerical value of the target common divisor more than it is described extremely The numerical value of each common divisor in a few common divisor in addition to the target common divisor.

8. a kind of decompressing device, it is characterised in that the decompressing device includes：

Subcode section determining unit, the code length for being read from compressed data according to the xth clock cycle is the code section of a*N+b, is obtained (a*N+b) for arriving/a subcode section；Wherein, x is the integer more than or equal to 1；Per height in (a*N+b)/a subcode section The code length of code section is c, and (a*N+b)/a subcode section includes first subcode section, the start bit of the first subcode section and The start bit of the code section that the xth clock cycle reads overlaps, and two neighboring subcode section in (a*N+b)/a subcode section Start bit between be spaced code length be a-1, the N is the integer more than 0；

Decompression units, for based on variable length coding table, concomitantly carrying out decompression processing to (a*N+b)/a subcode section, obtain To (a*N+b)/a character, wherein, decompressing each subcode section can obtain a character；The variable length coding table includes multiple Code word, each character corresponds to a code word in the variable length coding table in (a*N+b)/a character；The multiple code The corresponding code length of at least two code words is different in word, and the correspondence code length corresponding code length of code word most long is c in the multiple code word, The most short corresponding code length of code word of correspondence code length is a in the multiple code word, and a is each code word correspondence in the multiple code word Code length greatest common divisor, wherein, a is the integer more than or equal to 2, and c is the integer more than a, and the difference of c and a is b；

Decompression data determination unit, for the code section read according to the variable length coding table and the xth clock cycle, from described (a*N+b) the target decompression data of xth clock cycle are determined in/a character, wherein, the target solution of the xth clock cycle In pressure data, per the last bit symbols of the corresponding code word of significant character in two adjacent significant characters in the xth clock The start element of the position code word corresponding with another significant character in the code section that the cycle reads is read in the xth clock cycle Position in the code section for taking is adjacent.

9. decompressing device according to claim 1, it is characterised in that

10. decompressing device according to claim 8 or claim 9, it is characterised in that in the case of being equal to 1 in x, decompresses number According to determining unit, specifically for：

11. decompressing device according to claim 8 or claim 9, it is characterised in that in the feelings that x is the integer more than or equal to 2 Under condition, the decompression data determination unit is specifically included：

12. decompressing devices according to claim 8, it is characterised in that the decompressing device also includes：

Concatenation unit, for according to the compressed data, the decompression data of the multiple clock cycle for obtaining being spliced, obtains The corresponding decompression data of the compressed data, in two clock cycle adjacent after splicing, the target decompression of clock cycle Position of the last bit symbols of the corresponding code word of last character in the compressed data and another clock cycle in data Target decompression data in the position of the start element in the compressed data of the corresponding code word of first character be adjacent.

13. a kind of compression sets, it is characterised in that the compression set includes：

Common divisor determining unit, the target common divisor for determining the corresponding code length of each code word in variable length coding table；The change Long codes table includes multiple code words, and the target common divisor is the integer more than or equal to 2, each character in data to be compressed Corresponding to a code word in the variable length coding table；

Coding schedule generation unit, for each character in the data to be compressed and the target common divisor, generates institute State variable length coding table；The code length of at least two code words is different in the variable length coding table, probability of occurrence in the data to be compressed The code length of the code length of the corresponding code word of the character higher character corresponding code word relatively low less than probability of occurrence, the variable-length encoding The greatest common divisor of the code length of code word is the target common divisor in table；

Compression unit, for being compressed to the data to be compressed according to the variable length coding table, obtains compressed data；

Encapsulation generation unit, for based on the compressed data and variable length coding table generation compressed data encapsulation.

14. compression sets according to claim 13, it is characterised in that the common divisor determining unit, are used for：

15. a kind of data handling systems, it is characterised in that the system includes compression set and decompressing device,

The compression set includes the compression set described in claim 13 or 14, and the compression set is used to treat compressed data It is compressed and obtains compressed data；

The decompressing device includes any described decompressing devices of claim 8-12；The decompressing device is used for institute Stating compressed data carries out decompression processing.