CN106849956B - Compression method, decompression method, apparatus and data processing system - Google Patents
Compression method, decompression method, apparatus and data processing system Download PDFInfo
- Publication number
- CN106849956B CN106849956B CN201611270254.0A CN201611270254A CN106849956B CN 106849956 B CN106849956 B CN 106849956B CN 201611270254 A CN201611270254 A CN 201611270254A CN 106849956 B CN106849956 B CN 106849956B
- Authority
- CN
- China
- Prior art keywords
- code
- data
- character
- common divisor
- clock cycle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
技术领域technical field
本申请涉及数据处理领域,特别涉及一种压缩方法、解压缩方法、装置和数据处理系统。The present application relates to the field of data processing, and in particular, to a compression method, a decompression method, an apparatus and a data processing system.
背景技术Background technique
变长编码(英文:Variable Length Coding)是一种压缩数据时使用的编码方式,在该编码方式中,待压缩数据中的字符可以由长度不同的码字(码字为若干位二进制码)来表示,通常将出现概率较高的字符用较短的码字(如01)来表示,而将出现概率较低的字符用较长的码字来表示(如111001),而字符与码字的对应关系可以记录在变长编码表中。Variable Length Coding (English: Variable Length Coding) is an encoding method used when compressing data. In this encoding method, the characters in the data to be compressed can be encoded by code words with different lengths (the code words are binary codes of several bits). Usually, the characters with higher occurrence probability are represented by shorter codewords (such as 01), and the characters with lower probability of occurrence are represented by longer codewords (such as 111001), and the difference between the characters and the codewords is Correspondence can be recorded in the variable-length coding table.
目前,采用变长编码对数据进行压缩所得到的数据称为压缩数据封装,该压缩数据封装可以包括:压缩数据和变长编码表等,对该压缩数据封装进行解压缩时,可以先对压缩数据封装进行解析,得到变长编码表,然后从压缩数据的第一位开始,根据该变长编码表依次对压缩数据进行解码,得到解压缩后的数据。At present, the data obtained by compressing data using variable-length encoding is called compressed data encapsulation. The compressed data encapsulation may include: compressed data and variable-length encoding tables, etc. When decompressing the compressed data encapsulation, the The data is encapsulated and parsed to obtain a variable-length coding table, and then starting from the first bit of the compressed data, the compressed data is sequentially decoded according to the variable-length coding table to obtain decompressed data.
由于在解压缩前,上述压缩数据封装中的压缩数据中每个码字的长度是未知的,对压缩数据进行解压缩时,解出一个码字后才能确定下一个码字的起始位置,因而需要按照压缩数据的顺序一个码字接一个码字地进行解压缩,解压速度较低。Since the length of each codeword in the compressed data in the above-mentioned compressed data package is unknown before decompression, when the compressed data is decompressed, the starting position of the next codeword can only be determined after a codeword is decompressed. Therefore, it is necessary to decompress one codeword after another according to the sequence of the compressed data, and the decompression speed is low.
发明内容SUMMARY OF THE INVENTION
为了解决解压缩时解压速度较低的问题,本发明实施例提供了一种压缩方法、解压缩方法、装置和数据处理系统。所述技术方案如下:In order to solve the problem of low decompression speed during decompression, embodiments of the present invention provide a compression method, a decompression method, an apparatus, and a data processing system. The technical solution is as follows:
本发明实施例提供的解压缩方法可以由解压引擎执行,该解压引擎可以设置在x86架构(英文:The X86 architecture)的处理器或高级精简指令集处理器(英文:Advanced RISC Machines;简称:ARM)架构的处理器中。The decompression method provided by the embodiment of the present invention may be executed by a decompression engine, and the decompression engine may be set on a processor of an x86 architecture (English: The X86 architecture) or an advanced reduced instruction set processor (English: Advanced RISC Machines; abbreviation: ARM) ) architecture processor.
第一方面,本发明实施例提供了一种解压缩方法,该方法包括:In a first aspect, an embodiment of the present invention provides a decompression method, which includes:
解压引擎根据第x时钟周期从压缩数据中读取的码长为a*N+b的码段,得到的(a*N+b)/a个子码段;其中,x为大于或等于1的整数;(a*N+b)/a个子码段中每个子码段的码长为c,(a*N+b)/a个子码段中包括首个子码段,首个子码段的起始位与第x时钟周期读取的码段的起始位重合,且(a*N+b)/a个子码段中相邻两个子码段的起始位之间间隔的码长为a-1,所述N为大于0的整数。The decompression engine reads the code segment with a code length of a*N+b from the compressed data according to the xth clock cycle, and obtains (a*N+b)/a subcode segments; wherein, x is greater than or equal to 1. Integer; (a*N+b)/the code length of each subcode segment in a subcode segment is c, (a*N+b)/a subcode segment includes the first subcode segment, and the start of the first subcode segment The start bit coincides with the start bit of the code segment read in the xth clock cycle, and the code length of the interval between the start bits of two adjacent subcode segments in (a*N+b)/a subcode segments is a -1, the N is an integer greater than 0.
解压引擎基于变长编码表,并发地对(a*N+b)/a个子码段进行解压处理,得到(a*N+b)/a个字符,其中,解压引擎解压每一子码段能够得到一个字符。The decompression engine decompresses (a*N+b)/a subcode segments concurrently based on the variable-length encoding table to obtain (a*N+b)/a characters, wherein the decompression engine decompresses each subcode segment can get a character.
该变长编码表可以包括多个码字,(a*N+b)/a个字符中每个字符对应于变长编码表中的一个码字;多个码字中至少两个码字对应的码长不同,多个码字中对应码长最长的码字对应的码长为c,多个码字中对应码长最短的码字对应的码长为a,且a为多个码字中每一码字对应的码长的最大公约数,其中,a为大于或等于2的整数,c为大于a的整数,c与a的差值为b。The variable-length encoding table may include multiple codewords, each of (a*N+b)/a characters corresponds to one codeword in the variable-length encoding table; at least two codewords in the multiple codewords correspond to The code lengths are different, the code length corresponding to the code word with the longest corresponding code length among the multiple code words is c, the code length corresponding to the code word with the shortest corresponding code length among the multiple code words is a, and a is a plurality of codes The greatest common divisor of the code length corresponding to each codeword in the word, where a is an integer greater than or equal to 2, c is an integer greater than a, and the difference between c and a is b.
解压引擎根据变长编码表和第x时钟周期读取的码段,从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据,其中,第x时钟周期的目标解压数据中,每相邻的两个有效字符中一个有效字符对应的码字的末位码元在第x时钟周期读取的码段中的位置和另一个有效字符对应的码字的起始码元在第x时钟周期读取的码段中的位置相邻。第x时钟周期的目标解压数据包括按照特定顺序排列的多个有效字符。The decompression engine determines the target decompression data of the xth clock cycle from (a*N+b)/a characters according to the variable-length encoding table and the code segment read in the xth clock cycle, where the target decompression data of the xth clock cycle In the decompressed data, the position of the last symbol of the code word corresponding to one of the two adjacent valid characters in the code segment read in the xth clock cycle and the start of the code word corresponding to the other valid character. The symbols are located adjacent to each other in the code segment read at the xth clock cycle. The target decompressed data for the xth clock cycle includes a plurality of valid characters arranged in a specific order.
本发明实施例提供的解压缩方法,在对压缩数据进行解压缩时,通过并发地对(a*N+b)/a个子码段进行解压处理,得到(a*N+b)/a个字符,然后根据变长编码表和第x时钟周期读取的码段,从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据。不用等待解压完一个码字后再根据该码字的长度解压下一个码字。在进行解压缩时可以同时从多个位置开始解压缩,解压速度较高。In the decompression method provided by the embodiment of the present invention, when decompressing compressed data, by concurrently decompressing (a*N+b)/a subcode segments, (a*N+b)/a subcode segments are obtained. character, and then determine the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length encoding table and the code segment read at the xth clock cycle. There is no need to wait for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword. When decompressing, you can start decompressing from multiple locations at the same time, and the decompression speed is high.
可选的,第x时钟周期读取的码长为a*N+b的码段的后b位与第x+1时钟周期读取的码长为a*N+b的码段前b位相重合。Optionally, the last b bits of the code segment with a code length of a*N+b read in the xth clock cycle are in phase with the first b bits of the code segment with a code length of a*N+b read in the x+1th clock cycle. coincide.
本发明实施例提供的解压缩方法,解压引擎通过在读取两个相邻的时钟周期时,使后一时钟周期的前b位为前一时钟周期的后b位,避免了在解压时遗漏未解压的码段。In the decompression method provided by the embodiment of the present invention, when the decompression engine reads two adjacent clock cycles, the first b bits of the next clock cycle are the last b bits of the previous clock cycle, so as to avoid omission during decompression Uncompressed code segment.
可选的,在x为等于1的情况下,解压引擎根据变长编码表和第x时钟周 期读取的码段,从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据,具体包括:Optionally, when x is equal to 1, the decompression engine determines the xth clock cycle from (a*N+b)/a characters according to the variable-length encoding table and the code segment read in the xth clock cycle. The target decompressed data, including:
解压引擎从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据,第x时钟周期的目标解压数据中第一个有效字符对应的码字的起始码元和第x时钟周期读取的码段的起始码元相重合。The decompression engine determines the target decompression data of the xth clock cycle from (a*N+b)/a characters, and the start code element and the codeword corresponding to the first valid character in the target decompression data of the xth clock cycle. The start symbols of the code segment read in the xth clock cycle coincide.
本发明实施例提供的解压缩方法,在x等于1时,通过目标解压数据的起始码元的位置确定了第1时钟周期的目标解压数据。In the decompression method provided by the embodiment of the present invention, when x is equal to 1, the target decompressed data in the first clock cycle is determined by the position of the start symbol of the target decompressed data.
可选的,在x为大于或等于2的整数的情况下,解压引擎根据变长编码表和第x时钟周期读取的码段,从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据,具体包括:Optionally, when x is an integer greater than or equal to 2, the decompression engine determines from (a*N+b)/a characters according to the variable-length encoding table and the code segment read in the xth clock cycle. Target decompressed data for the xth clock cycle, including:
解压引擎从(a*N+b)/a个字符中确定出c/a组候选解压数据;每组候选解压数据包括多个字符,且每相邻的两个字符中,一个字符对应的码字的末位码元在第x时钟周期读取的码段中的位置和另一个字符对应的码字的起始码元在第x时钟周期读取的码段中的位置是相邻的;每两组候选解压数据内第一个字符对应的码字的起始码元在第x时钟周期读取的码段中的位置是不相同的,且每组候选解压数据内第一个字符对应的码字的起始码元为第x时钟周期读取的码段中的第W*a个码元,W为大于或等于0且小于或等于b/a的整数。The decompression engine determines c/a groups of candidate decompression data from (a*N+b)/a characters; each group of candidate decompression data includes multiple characters, and in each adjacent two characters, the code corresponding to one character The position of the last symbol of the word in the code segment read in the xth clock cycle is adjacent to the position of the start symbol of the codeword corresponding to another character in the code segment read in the xth clock cycle; The positions of the start symbols of the code words corresponding to the first characters in each group of candidate decompressed data are different in the code segment read in the xth clock cycle, and the first characters in each group of candidate decompressed data correspond to The starting symbol of the codeword is the W*a-th symbol in the code segment read in the xth clock cycle, where W is an integer greater than or equal to 0 and less than or equal to b/a.
解压引擎从c/a组候选解压数据中确定出第x时钟周期的目标解压数据;在压缩数据中,第x-1时钟周期的目标解压数据中最后一个字符对应的码字的末位码元和第x时钟周期的目标解压数据中第一个字符对应的码字的起始码元是相邻的,第x-1时钟周期的目标解压数据是指第x-1时钟周期读取的码段对应的目标解压数据。The decompression engine determines the target decompression data of the xth clock cycle from the candidate decompression data of the c/a group; in the compressed data, the last symbol of the codeword corresponding to the last character in the target decompression data of the x-1th clock cycle The starting symbol of the code word corresponding to the first character in the target decompressed data of the xth clock cycle is adjacent to the start symbol, and the target decompression data of the x-1th clock cycle refers to the code read in the x-1th clock cycle. The target decompressed data corresponding to the segment.
本发明实施例提供的解压缩方法,在x为大于或等于2的整数的情况下,通过先获取第x周期读取的数据的可能的c/a个候选解压数据,然后再根据第x-1时钟周期的目标解压数据选出第x时钟周期的目标解压数据,使得解压引擎可以在上一时钟周期的数据还未解压完成时就开始获取候选解压数据,提高了解压数据的速度。在x等于1的情况下,第x-1时钟周期的目标解压数据即为对第1时钟周期读取的码段进行解压得到的按照特定顺序排列的多个有效字符。In the decompression method provided by the embodiment of the present invention, in the case where x is an integer greater than or equal to 2, firstly obtain possible c/a candidate decompression data of the data read in the xth cycle, and then according to the xth - The target decompression data of 1 clock cycle selects the target decompression data of the xth clock cycle, so that the decompression engine can start to obtain candidate decompression data before the decompression of the data in the previous clock cycle is completed, which improves the speed of decompressing data. In the case where x is equal to 1, the target decompressed data of the x-1th clock cycle is a plurality of valid characters arranged in a specific order obtained by decompressing the code segment read in the first clock cycle.
可选的,解压引擎在确定出第x时钟周期的目标解压数据之后,该方法还包括:Optionally, after the decompression engine determines the target decompressed data of the xth clock cycle, the method further includes:
解压引擎根据压缩数据,将获取的多个时钟周期的解压数据进行拼接,获取压缩数据对应的解压数据,拼接后相邻的两个时钟周期中,一个时钟周期的目标解压数据内最后一个字符对应的码字的末位码元在压缩数据中的位置和另一个时钟周期的目标解压数据内第一个字符对应的码字的起始码元在压缩数据中的位置是相邻的。The decompression engine splices the obtained decompressed data of multiple clock cycles according to the compressed data, and obtains the decompressed data corresponding to the compressed data. In the two adjacent clock cycles after splicing, the last character in the target decompressed data of one clock cycle corresponds to The position of the last symbol of the codeword in the compressed data is adjacent to the position of the start symbol of the codeword corresponding to the first character in the target decompressed data of another clock cycle in the compressed data.
本发明实施例提供的解压缩方法,在得到多个时钟周期的解压数据后,将这些结果拼接起来以获得压缩数据的解压数据,解压速度较高。In the decompression method provided by the embodiment of the present invention, after obtaining decompressed data of multiple clock cycles, these results are spliced to obtain decompressed data of compressed data, and the decompression speed is high.
第二方面,本发明实施例提供一种压缩方法,该压缩方法可以由压缩引擎执行,该方法包括:In a second aspect, an embodiment of the present invention provides a compression method, where the compression method can be executed by a compression engine, and the method includes:
压缩引擎确定变长编码表中每个码字对应的码长的目标公约数;变长编码表包括多个码字,目标公约数为大于或等于2的整数,待压缩数据中的每个字符对应于变长编码表中的一个码字;The compression engine determines the target common divisor of the code length corresponding to each codeword in the variable-length coding table; the variable-length coding table includes multiple codewords, and the target common divisor is an integer greater than or equal to 2. Each character in the data to be compressed Corresponds to a codeword in the variable-length coding table;
压缩引擎根据待压缩数据中的每个字符和目标公约数,生成变长编码表;变长编码表中至少两个码字的码长不同,待压缩数据中出现概率较高的字符对应的码字的码长小于出现概率较低的字符对应的码字的码长,变长编码表中码字的码长的最大公约数为目标公约数;The compression engine generates a variable-length encoding table according to each character in the data to be compressed and the common divisor of the target; at least two code words in the variable-length encoding table have different code lengths, and the code corresponding to a character with a higher probability of appearing in the data to be compressed The code length of the word is less than the code length of the code word corresponding to the character with a lower occurrence probability, and the greatest common divisor of the code length of the code word in the variable-length coding table is the target common divisor;
压缩引擎根据变长编码表对待压缩数据进行压缩,得到压缩数据;The compression engine compresses the data to be compressed according to the variable-length coding table to obtain compressed data;
压缩引擎基于压缩数据和变长编码表生成压缩数据封装。The compression engine generates a compressed data package based on the compressed data and the variable-length encoding table.
本发明实施例提供的压缩方法,根据待压缩数据中的每个字符和目标公约数,生成变长编码表,并根据该变长编码表对待压缩数据进行压缩,使得解压引擎在解压该压缩数据时,可以从多个位置并发开始解压,而不用等待解压完一个码字后再根据该码字的长度解压下一个码字,解压速度较高。The compression method provided by the embodiment of the present invention generates a variable-length encoding table according to each character in the data to be compressed and the common divisor of the target, and compresses the data to be compressed according to the variable-length encoding table, so that the decompression engine decompresses the compressed data. When decompression is performed concurrently from multiple locations, it is not necessary to wait for a codeword to be decompressed and then decompress the next codeword according to the length of the codeword, and the decompression speed is high.
可选的,解压引擎确定变长编码表中每个码字对应的码长的目标公约数,包括:Optionally, the decompression engine determines the target common divisor of the code length corresponding to each codeword in the variable-length encoding table, including:
解压引擎首先可以确定至少一个公约数,根据对应于至少一个公约数中每一公约数的变长编码表,对待压缩数据进行压缩,得到的压缩率均小于预设值,其中,根据至少一个公约数中每一公约数,均能够生成一个对应的变长编码表;The decompression engine can first determine at least one common divisor, and compress the data to be compressed according to the variable-length coding table corresponding to each common divisor of the at least one common divisor, and the obtained compression rate is smaller than the preset value, wherein according to at least one common divisor Each common divisor in the number can generate a corresponding variable-length coding table;
之后解压引擎可以从至少一个公约数中确定出目标公约数,目标公约数的数值大于至少一个公约数中除目标公约数之外的每一公约数的数值。Afterwards, the decompression engine may determine the target common divisor from the at least one common divisor, and the value of the target common divisor is greater than the value of each common divisor except the target common divisor in the at least one common divisor.
最大公约数a越大,本发明实施例提供的压缩方法压缩的数据的解压速度 就越高,但是压缩率可能会在最大公约数变大时升高。本发明实施例提供的压缩方法,通过预先确定对应的压缩率小于预设值的至少一个公约数,并从中选出最大的公约数作为目标公约数,保证了压缩率和解压速度都在一个较高的范围。The larger the greatest common divisor a, the higher the decompression speed of the data compressed by the compression method provided by the embodiment of the present invention, but the compression rate may increase when the greatest common divisor becomes larger. In the compression method provided by the embodiment of the present invention, by predetermining at least one common divisor of which the corresponding compression ratio is smaller than the preset value, and selecting the largest common divisor as the target common divisor, it is ensured that the compression ratio and the decompression speed are at a relatively high level. high range.
第三方面,本发明实施例提供一种解压缩装置,该解压缩装置包括多个单元,该多个单元用于实现上述第一方面或第一方面中任意一种可能的实现方式所提供的解压缩方法。In a third aspect, an embodiment of the present invention provides a decompression apparatus, where the decompression apparatus includes multiple units, and the multiple units are configured to implement the above-mentioned first aspect or any one of the possible implementation manners of the first aspect. Decompression method.
第四方面,本发明实施例提供一种压缩装置,该压缩装置包括多个单元,该多个单元用于实现上述第二方面或第二方面中任意一种可能的实现方式所提供的压缩方法。In a fourth aspect, an embodiment of the present invention provides a compression device, where the compression device includes a plurality of units, and the plurality of units are used to implement the compression method provided by the second aspect or any possible implementation manner of the second aspect .
第五方面,本发明实施例提供一种数据处理系统,该系统包括第三方面提供的解压缩装置和第四方面提供的压缩装置,所述压缩装置用于对待压缩数据进行压缩得到压缩数据;所述解压缩装置用于对所述压缩数据进行解压处理。In a fifth aspect, an embodiment of the present invention provides a data processing system, which includes the decompression device provided in the third aspect and the compression device provided in the fourth aspect, where the compression device is configured to compress the data to be compressed to obtain compressed data; The decompression device is used for decompressing the compressed data.
第六方面,提供一种解压缩装置,所述解压缩装置包括:底板以及设置在底板上的中央处理器(英文:Central Processing Unit;简称:CPU)和内存,该底板连接有总线接口,该总线接口可以连接有解压引擎,该解压引擎用于执行第一方面提供的解压缩方法。In a sixth aspect, a decompression device is provided, the decompression device includes: a base plate, a central processing unit (English: Central Processing Unit; CPU for short) and a memory disposed on the base plate, the base plate is connected with a bus interface, the The bus interface may be connected with a decompression engine, where the decompression engine is configured to execute the decompression method provided by the first aspect.
第七方面,提供一种解压缩装置,所述解压缩装置包括:处理器芯片、与处理器芯片连接的内存、以及设置在处理器芯片上的CPU和总线接口,解压引擎可以集成在处理器芯片上,并通过总线接口和CPU连接,解压引擎用于执行第一方面提供的解压缩方法。In a seventh aspect, a decompression device is provided, the decompression device includes: a processor chip, a memory connected to the processor chip, and a CPU and a bus interface provided on the processor chip, and the decompression engine can be integrated in the processor. On the chip, and connected to the CPU through a bus interface, the decompression engine is used to execute the decompression method provided by the first aspect.
第八方面,提供的一种压缩装置,该压缩装置包括:至少一个处理器、至少一个网络接口、存储器以及至少一个总线,存储器与网络接口分别通过总线与处理器相连;处理器被配置为执行存储器中存储的指令;处理器通过执行指令来实现第二方面提供的压缩方法。In an eighth aspect, a compression device is provided, the compression device includes: at least one processor, at least one network interface, memory, and at least one bus, the memory and the network interface are respectively connected to the processor through the bus; the processor is configured to execute The instructions stored in the memory; the processor implements the compression method provided in the second aspect by executing the instructions.
综上所述,本发明实施例提供的解压缩方法,在对压缩数据进行解压缩时,通过并发地对(a*N+b)/a个子码段进行解压处理,得到(a*N+b)/a个字符,然后根据变长编码表和第x时钟周期读取的码段,从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据。不用等待解压完一个码字后再根据该码字的长度解压下一个码字。解决了相关技术中解压速度较低的问题。在进行解压缩时可以同 时从多个位置开始解压缩,解压速度较高。To sum up, in the decompression method provided by the embodiments of the present invention, when decompressing compressed data, by concurrently decompressing (a*N+b)/a subcode segments, (a*N+ b)/a characters, and then determine the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length encoding table and the code segment read in the xth clock cycle. There is no need to wait for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword. The problem of low decompression speed in the related art is solved. When decompressing, the decompression can be started from multiple locations at the same time, and the decompression speed is high.
附图说明Description of drawings
图1A是本发明实施例提供的一种解压缩装置的结构方框图;1A is a structural block diagram of a decompression apparatus provided by an embodiment of the present invention;
图1B是本发明实施例提供的另一种解压缩装置的结构方框图;1B is a structural block diagram of another decompression apparatus provided by an embodiment of the present invention;
图1C是本发明实施例提供的一种压缩装置的结构方框图;1C is a structural block diagram of a compression apparatus provided by an embodiment of the present invention;
图2A是本发明实施例示出的一种压缩方法的流程图;2A is a flowchart of a compression method according to an embodiment of the present invention;
图2B是图2A所示实施例中一种确定目标公约数的流程图;Fig. 2B is a kind of flow chart of determining target common divisor in the embodiment shown in Fig. 2A;
图3A是本发明实施例示出的一种解压缩方法的流程图;3A is a flowchart of a decompression method according to an embodiment of the present invention;
图3B是图3A所示实施例中第x时钟周期读取的数据中码字与字符的对应关系示意图;3B is a schematic diagram of the correspondence between code words and characters in the data read in the xth clock cycle in the embodiment shown in FIG. 3A;
图3C是图3A所示实施例中解压引擎所占用的芯片面积条形图;3C is a bar graph of the chip area occupied by the decompression engine in the embodiment shown in FIG. 3A;
图4是本发明实施例提供的一种压缩装置的结构方框图;4 is a structural block diagram of a compression apparatus provided by an embodiment of the present invention;
图5A是本发明实施例提供的一种解压缩装置的结构方框图;5A is a structural block diagram of a decompression apparatus provided by an embodiment of the present invention;
图5B是本发明实施例提供的一种解压缩装置的结构方框图;5B is a structural block diagram of a decompression apparatus provided by an embodiment of the present invention;
图6为本发明实施例提供的一种数据处理系统的结构示意图。FIG. 6 is a schematic structural diagram of a data processing system according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开作进一步地详细描述,显然,所描述的实施例仅仅是本公开一部份实施例,而不是全部的实施例。In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be described in further detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. .
请参考图1A,其示出了本发明实施例提供的一种解压缩装置的结构方框图,该解压缩装置可以包括:底板11以及设置在底板11上的CPU12和内存13,该底板11连接有总线接口14,该总线接口14可以连接有解压引擎15,解压引擎15可以是扩展卡,而扩展卡可以为现场可编程门阵列(英文:Field-Programmable Gate Array;简称:FPGA)或应用专用集成电路(英文:Application Specific Integrated Circuit;简称:ASIC)。总线接口14可以为PCIe(一种总线接口)。Please refer to FIG. 1A , which shows a structural block diagram of a decompression device provided by an embodiment of the present invention. The decompression device may include: a
请参考图1B,其示出了本发明实施例提供的另一种解压缩装置的结构方框图,该解压缩装置可以包括:处理器芯片16、与处理器芯片16连接的内存17,以及设置在处理器芯片16上的CPU161和总线接口162,解压引擎163可以集 成在处理器芯片16上,并通过总线接口162和CPU161连接。其中,内存17可以为双倍速率同步动态随机存储器(英文:DoubleData Rate SDRAM;简称:DDR),总线接口162可以为高级可扩展接口(英文:AdvancedeXtensible Interface;简称:AXI)。Please refer to FIG. 1B , which shows a structural block diagram of another decompression apparatus provided by an embodiment of the present invention. The decompression apparatus may include: a
请参考图1C,其示出了本发明实施例提供的一种压缩装置的结构方框图,该压缩装置20可以包括:至少一个处理器21、至少一个网络接口22、存储器23以及至少一个总线24,存储器23与网络接口22分别通过总线24与处理器21相连;处理器21被配置为执行存储器23中存储的指令231。存储器23可能包含高速随机存取存储器(英文:Random AccessMemory;简称:RAM),也可能还包括非不稳定的存储器(英文:non-volatile memory),例如至少一个磁盘存储器。Please refer to FIG. 1C, which shows a structural block diagram of a compression apparatus provided by an embodiment of the present invention. The compression apparatus 20 may include: at least one
图2A是本发明实施例示出的一种压缩方法的流程图,本实施例以该压缩方法应用于对待压缩数据进行压缩来举例说明。该压缩方法可以包括如下几个步骤:FIG. 2A is a flowchart of a compression method according to an embodiment of the present invention. This embodiment is illustrated by applying the compression method to compressing data to be compressed. The compression method may include the following steps:
步骤201、压缩引擎获取待压缩数据中各种字符的概率分布。Step 201: The compression engine obtains the probability distribution of various characters in the data to be compressed.
在使用本发明实施例提供的压缩方法时,压缩引擎可以分析待压缩数据,以获取待压缩数据中各种字符的概率分布。待压缩数据可以是由多个不同的字符构成的,不同字符的出现概率可能不同,概率分布中可以包括每一字符的出现概率,示例性的,字符“A”的出现概率可以为4%(百分号),字符“B”的出现概率可以为6%。其中,待压缩数据中的字符是指存储于存储装置中未压缩的字符,存储装置在存储数据时,通常是以二进制位(英文:bit)为单位的,根据编码方式的不同,每个字符所占的二进制位的个数不同,如以美国信息交换标准代码(英文:American Standard Code for Information Interchange;简称:ASCII)作为编码方式的待压缩数据中,每个字符可以占8个二进制位。When using the compression method provided by the embodiment of the present invention, the compression engine may analyze the data to be compressed to obtain the probability distribution of various characters in the data to be compressed. The data to be compressed may be composed of a plurality of different characters, and the occurrence probability of different characters may be different. The probability distribution may include the occurrence probability of each character. For example, the occurrence probability of the character "A" may be 4% ( percent sign), the probability of occurrence of the character "B" can be 6%. Among them, the characters in the data to be compressed refer to the uncompressed characters stored in the storage device. When the storage device stores data, it usually takes binary bits (English: bit) as the unit. According to different encoding methods, each character The number of occupied binary bits is different. For example, in the data to be compressed using the American Standard Code for Information Interchange (English: American Standard Code for Information Interchange; ASCII for short) as the encoding method, each character can occupy 8 binary bits.
本发明实施例中的压缩引擎可以通过软件或硬件结合在图1C所示的压缩装置中。The compression engine in this embodiment of the present invention may be integrated into the compression apparatus shown in FIG. 1C through software or hardware.
步骤202、压缩引擎确定变长编码表中每个码字对应的码长的目标公约数。Step 202: The compression engine determines the target common divisor of the code length corresponding to each codeword in the variable-length coding table.
该变长编码表可以包括多个码字,目标公约数为大于或等于2的整数,待压缩数据中的每个字符对应于所述变长编码表中的一个码字。The variable-length encoding table may include multiple codewords, the target common divisor is an integer greater than or equal to 2, and each character in the data to be compressed corresponds to a codeword in the variable-length encoding table.
该目标公约数为编码表中每一码字对应码长的最大公约数,而该编码表是由压缩引擎后续生成的。需要说明的是,码字由若干个码元组成,计算机通信中通常表现为若干位二进制数据。一个码元可以为一个二进制数。The target common divisor is the greatest common divisor of the corresponding code length of each codeword in the encoding table, and the encoding table is subsequently generated by the compression engine. It should be noted that a codeword is composed of several symbols, which are usually represented as several bits of binary data in computer communication. A symbol can be a binary number.
如图2B所示,压缩引擎确定目标公约数的流程可以包括下面2个子步骤:As shown in FIG. 2B , the process for the compression engine to determine the target common divisor may include the following two sub-steps:
子步骤2021、压缩引擎确定至少一个公约数,根据对应于至少一个公约数中每一公约数的变长编码表,对待压缩数据进行压缩,得到的压缩率均小于预设值。Sub-step 2021: The compression engine determines at least one common divisor, and compresses the data to be compressed according to the variable-length coding table corresponding to each common divisor of the at least one common divisor, and the obtained compression ratios are all less than the preset value.
其中,根据至少一个公约数中每一公约数,均能够生成一个对应的变长编码表。Wherein, according to each common divisor of the at least one common divisor, a corresponding variable-length coding table can be generated.
子步骤2021可以包括:Sub-step 2021 may include:
1)压缩引擎首先可以预先设置多个公约数,这多个公约数均为大于或等于2且小于S的整数,S为待压缩数据中每个字符所占的二进制位的个数(这是因为如果公约数大于或等于字符所占的二进制位的个数,以该公约数为压缩数据中每个码字的码长的最大公约数时,压缩数据的数据量会大于待压缩数据,难以起到压缩的目的)。示例性的,待压缩数据中字符所占的位的个数为8,则公约数可以预先设置为2至7这6个数。1) The compression engine can first preset multiple common divisors, which are integers greater than or equal to 2 and less than S, where S is the number of binary bits occupied by each character in the data to be compressed (this is Because if the common divisor is greater than or equal to the number of binary bits occupied by characters, when the common divisor is the greatest common divisor of the code length of each codeword in the compressed data, the data volume of the compressed data will be larger than the data to be compressed, and it is difficult to for compression purposes). Exemplarily, if the number of bits occupied by characters in the data to be compressed is 8, the common divisor may be preset to 6 numbers from 2 to 7.
2)在设置了多个公约数后,压缩引擎可以估计这多个公约数中每个公约数对应的压缩率(英文:Compression ratio),压缩率是数据压缩后的大小与压缩前的大小之比。示例性的,待压缩数据在压缩前的大小为100兆字节(英文:Megabytes;简称:M),而压缩后的大小为10M,则压缩率为10%。2) After setting multiple common divisors, the compression engine can estimate the compression ratio (English: Compression ratio) corresponding to each common divisor of the multiple common divisors. The compression ratio is the sum of the compressed size of the data and the size before compression. Compare. Exemplarily, the size of the data to be compressed before compression is 100 megabytes (English: Megabytes; abbreviation: M), and the size after compression is 10M, and the compression rate is 10%.
压缩引擎在估计每个公约数的压缩率时,可以对待压缩数据进行采样得到采样数据,然后根据每一个公约数生成一个变长编码表(根据公约数生成编码表的过程可以参考步骤203),并根据每个变长编码表对采样数据进行压缩得到每个公约数对应的压缩率。其中,根据这多个公约数中任一公约数生成的编码表中,记录有采样数据中每个字符和该字符对应的码字之间的对应关系,同时,变长编码表中每个字符对应的码字的码长为该任一公约数的正整数倍。When the compression engine estimates the compression ratio of each common divisor, it can sample the data to be compressed to obtain sampled data, and then generate a variable-length encoding table according to each common divisor (for the process of generating the encoding table according to the common divisor, refer to step 203), And compress the sampled data according to each variable-length coding table to obtain the compression ratio corresponding to each common divisor. Wherein, in the coding table generated according to any common divisor of the plurality of common divisors, the correspondence between each character in the sampled data and the code word corresponding to the character is recorded, and at the same time, each character in the variable-length coding table is recorded. The code length of the corresponding codeword is a positive integer multiple of any of the common divisors.
3)从多个公约数中筛选出对应的压缩率小于预设值的至少一个公约数。3) Screening out at least one common divisor whose compression ratio is smaller than a preset value from the plurality of common divisors.
预设值可以为操作人员预先设置的,该预设值在设置时可以考虑对于解压速度的要求,对于解压速度要求高时,可以设置一个较高的预设值,而在对解压速度要求低时,可以设置一个较小的预设值。这是因为通常压缩率越小,解 压速度就会越慢。The preset value can be preset by the operator. The preset value can be set in consideration of the decompression speed requirement. When the decompression speed requirement is high, a higher preset value can be set, and the decompression speed requirement is low. , you can set a smaller preset value. This is because generally the smaller the compression ratio, the slower the decompression speed will be.
子步骤2022、压缩引擎从至少一个公约数中确定出目标公约数,目标公约数的数值大于至少一个公约数中除目标公约数之外的每一公约数的数值。Sub-step 2022, the compression engine determines a target common divisor from at least one common divisor, and the value of the target common divisor is greater than the value of each common divisor except the target common divisor in the at least one common divisor.
压缩引擎确定出目标公约数,该目标公约数是指对应的压缩率小于预设值的至少一个公约数中具有最大值的公约数。The compression engine determines a target common divisor, where the target common divisor refers to a common divisor with the largest value among at least one common divisor whose corresponding compression ratio is smaller than a preset value.
应当明白的是,在前述“至少一个公约数”中,在该“至少一个公约数”的数量为1的情况下,也即该“至少一个公约数”就是指一个公约数,则该公约数即为目标公约数。只有在该“至少一个公约数”的数量大于或等于2的情况下,目标公约数才是指大于至少一个公约数中除目标公约数之外的每一公约数的数值的公约数。It should be understood that, in the aforementioned "at least one common divisor", if the number of the "at least one common divisor" is 1, that is, the "at least one common divisor" refers to a common divisor, then the common divisor is the target common factor. Only if the number of the "at least one common divisor" is greater than or equal to 2, the target common divisor refers to a common divisor that is greater than the value of each of the at least one common divisor except the target common divisor.
由于本发明实施例提供的压缩方法对应的解压缩方法(参考图3A所示的实施例)中,变长编码表中码字的最大公约数越大,解压引擎每个时钟周期读取的数据就越多,解压速度也就越高,因而在考虑解压速度的情况下,可以将压缩率小于预设值的公约数中最大的公约数确定为目标公约数。Since in the decompression method corresponding to the compression method provided by the embodiment of the present invention (refer to the embodiment shown in FIG. 3A ), the greater the greatest common divisor of the codewords in the variable-length coding table, the greater the data read by the decompression engine in each clock cycle. The larger the number, the higher the decompression speed. Therefore, considering the decompression speed, the largest common divisor among the common divisors of which the compression ratio is smaller than the preset value can be determined as the target common divisor.
此外,由于最大公约数越大,表示字符的码字的码长也会越长,进而用于表示待压缩数据的压缩数据(压缩数据是由一个个码字构成的)的数据量也会越大,因而最大公约数越大,压缩率也会越大。在考虑压缩率的情况下,也可以将压缩率小于预设值的公约数中最小的公约数确定为目标公约数。In addition, since the greatest common divisor is larger, the code length of the code word representing the character will be longer, and the data volume of the compressed data used to represent the data to be compressed (the compressed data is composed of code words) will also increase. Therefore, the greater the greatest common divisor, the greater the compression ratio. In the case of considering the compression ratio, the smallest common divisor among the common divisors of which the compression ratio is smaller than the preset value may also be determined as the target common divisor.
步骤203,压缩引擎根据待压缩数据中的每个字符和目标公约数,生成变长编码表。
变长编码表中至少两个码字的码长不同,待压缩数据中出现概率较高的字符对应的码字的码长小于出现概率较低的字符对应的码字的码长,变长编码表中码字的码长的最大公约数为目标公约数。The code lengths of at least two code words in the variable-length coding table are different, and the code length of the code word corresponding to the character with a higher occurrence probability in the data to be compressed is smaller than the code length of the code word corresponding to the character with a lower occurrence probability. The greatest common divisor of the code lengths of the codewords in the table is the target common divisor.
变长编码表是记录待压缩数据中字符与码字的对应关系的映射表,生成变长编码表的过程是确定待压缩数据中字符与码字的对应关系的过程。其中,变长编码表中每一码字对应于待压缩数据中的一个字符,相同的字符对应的码字是相同的,不同的字符对应的码字是不同的。对于“不同的字符对应的码字不同”,一种情况是不同的字符对应的码字的位数(或者说“码长”)不同;另一种情况是虽然不同的字符对应的码字的位数相同,但是码字中的二进制数据的排列次序不同。参见表1,字符A对应的码字是000,字符B对应码字是001, 则字符A对应的码字000和字符B对应码字001的虽然位数是相同的,但是码字中二进制数据的排列顺序不同。进一步参见表1,字符A对应的代码是000,字符F对应代码的101000,则字符A对应的代码000和字符F对应代码101000的位数是不同的。The variable-length encoding table is a mapping table that records the correspondence between characters and codewords in the data to be compressed, and the process of generating the variable-length encoding table is the process of determining the correspondence between characters and codewords in the data to be compressed. Wherein, each codeword in the variable-length encoding table corresponds to a character in the data to be compressed, the codeword corresponding to the same character is the same, and the codeword corresponding to different characters is different. For "different characters correspond to different code words", one case is that the number of digits (or "code length") of the code words corresponding to different characters is different; the other case is that although the code words corresponding to different characters have different The number of bits is the same, but the order of arrangement of the binary data in the codeword is different. Referring to Table 1, the code word corresponding to character A is 000, and the code word corresponding to character B is 001, then the code word 000 corresponding to character A and the code word 001 corresponding to character B have the same number of digits, but the binary data in the code word is the same. are arranged in different order. Further referring to Table 1, the code corresponding to the character A is 000, and the character F corresponds to the code 101000, so the number of digits of the code 000 corresponding to the character A and the code 101000 corresponding to the character F are different.
在生成变长编码表时,可以按照待压缩数据中每个字符出现概率从大到小的顺序,为每个字符分配长度从小到大的码字,即字符出现的概率越大,就将该字符与长度越短的码字对应,这样就能够以较少的数据量来表示待压缩数据。When generating a variable-length code table, each character can be assigned a code word with a decreasing length to each character in the order of the probability of occurrence of each character in the data to be compressed, that is, the greater the probability of a character, the Characters correspond to codewords with shorter lengths, so that the data to be compressed can be represented with a smaller amount of data.
示例性的,目标公约数为3时,变长编码表可以如表1所示:Exemplarily, when the target common divisor is 3, the variable-length encoding table can be as shown in Table 1:
表1Table 1
在表1中,字符列中,码长较短的码字对应的字符出现的概率要大于码长较长的码字对应的字符出现的概率。与字符位于同一行的码长,代表该字符对应的码字的码长,或者代表该字符对应的码字包含的码元的数量,例如与字符“P”位于同一行的码长6代表的就是字符“P”对应的码字“110010”的码长。而与字符位于同一行的码字代表该字符对应的码字,例如与字符“Q”位于同一行的码字“110011”代表该字符“Q”对应的码字。待压缩数据中字符的排列顺序与压缩数据中码字的排列顺序是一致的。In Table 1, in the character column, the probability of occurrence of a character corresponding to a codeword with a shorter code length is greater than that of a character corresponding to a codeword with a longer code length. The code length that is located on the same line as the character, represents the code length of the code word corresponding to the character, or represents the number of code elements contained in the code word corresponding to the character, such as the code length 6 that is located on the same line as the character "P". It is the code length of the code word "110010" corresponding to the character "P". The codeword on the same row as the character represents the codeword corresponding to the character, for example, the codeword "110011" on the same row as the character "Q" represents the codeword corresponding to the character "Q". The arrangement order of the characters in the data to be compressed is consistent with the arrangement order of the code words in the compressed data.
本发明实施例中的变长编码可以是一种熵编码(英文:entropy encoding),熵编码是无损压缩时使用的一种编码方式。The variable-length encoding in the embodiment of the present invention may be an entropy encoding (English: entropy encoding), and entropy encoding is an encoding manner used in lossless compression.
步骤204、压缩引擎根据变长编码表对待压缩数据进行压缩,得到压缩数据。Step 204: The compression engine compresses the data to be compressed according to the variable length coding table to obtain compressed data.
在得到了变长编码表之后,可以根据变长编码表对待压缩数据进行压缩,得到压缩数据。示例性的,变长编码表为表1,根据表1对待压缩数据“ABC”进行压缩后,得到的压缩数据为“000 001 010”。After the variable-length encoding table is obtained, the data to be compressed can be compressed according to the variable-length encoding table to obtain compressed data. Exemplarily, the variable-length coding table is Table 1. After compressing the to-be-compressed data "ABC" according to Table 1, the obtained compressed data is "000 001 010".
步骤205、压缩引擎基于压缩数据和变长编码表生成压缩数据封装。Step 205: The compression engine generates a compressed data package based on the compressed data and the variable-length encoding table.
在获取了变长编码表和压缩数据之后,压缩引擎基于压缩数据和变长编码表生成压缩数据封装。压缩数据封装还可以包括其他数据,如效验码(英文:CHECKSUM)等。After acquiring the variable-length encoding table and the compressed data, the compression engine generates a compressed data package based on the compressed data and the variable-length encoding table. The compressed data encapsulation may also include other data, such as a check code (English: CHECKSUM) and the like.
本发明实施例提供的压缩方法的压缩率相较于基于LZ4(一种编码方法)的压缩方法的压缩率下降了30%左右。Compared with the compression rate of the compression method based on LZ4 (an encoding method), the compression rate of the compression method provided by the embodiment of the present invention is reduced by about 30%.
综上所述,本发明实施例提供的压缩方法,通过生成码字的长度具有最大公约数的变长编码表,并根据该变长编码表对待压缩数据进行压缩,使得解压引擎在解压该压缩数据时,可以从多个位置并发开始解压,而不用等待解压完一个码字后再根据该码字的长度解压下一个码字,解压速度较高。To sum up, the compression method provided by the embodiment of the present invention generates a variable-length encoding table with the greatest common divisor of the length of the codeword, and compresses the data to be compressed according to the variable-length encoding table, so that the decompression engine is decompressing the compressed data. Data can be decompressed concurrently from multiple locations without waiting for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword, and the decompression speed is high.
图3A是本发明实施例示出的一种解压缩方法的流程图,本实施例以该解压缩方法应用于对图2A所示实施例提供的压缩方法生成的压缩数据封装进行解压来举例说明,该解压缩方法可以被图1A或图1B中的解压引擎来实现。该解 压缩方法可以包括如下几个步骤:3A is a flowchart of a decompression method shown in an embodiment of the present invention. This embodiment is illustrated by applying the decompression method to decompressing the compressed data package generated by the compression method provided by the embodiment shown in FIG. 2A , The decompression method can be implemented by the decompression engine in FIG. 1A or FIG. 1B . The decompression method may include the following steps:
步骤301、解压引擎分析压缩数据封装,得到变长编码表和压缩数据。Step 301: The decompression engine analyzes the compressed data encapsulation to obtain a variable-length code table and compressed data.
在使用本发明实施例提供的解压缩方法时,解压引擎首先可以分析压缩数据封装,得到变长编码表和压缩数据。其中,压缩数据封装可以是图2A所示实施例提供的压缩方法压缩得到的压缩数据封装,该压缩数据封装中可以包括变长编码表、压缩数据和其他数据。When using the decompression method provided by the embodiment of the present invention, the decompression engine may first analyze the encapsulation of the compressed data, and obtain the variable-length coding table and the compressed data. The compressed data encapsulation may be the compressed data encapsulation obtained by the compression method provided by the embodiment shown in FIG. 2A , and the compressed data encapsulation may include a variable-length encoding table, compressed data and other data.
本步骤是可选的步骤,即解压引擎也可以直接得到变长编码和压缩数据。This step is optional, that is, the decompression engine can also directly obtain variable-length encoded and compressed data.
步骤302、解压引擎按照压缩数据的生成顺序在每个时钟周期从压缩数据中读取码长为a*N+b的码段,第x时钟周期读取的码长为a*N+b的码段的后b位与第x+1时钟周期读取的码长为a*N+b的码段前b位相重合。
在获取了变长编码表后,解压引擎可以按照压缩数据的生成顺序(即先生成的数据先读取,后生成数据的后读取)在每个时钟周期从压缩数据中读取码长为a*N+b的码段(a*N+b大于或等于压缩数据中最长的码字的码长c),第x时钟周期读取的码长为a*N+b的码段的后b位与第x+1时钟周期读取的码长为a*N+b的码段前b位相重合,其中,N为大于0的整数,a代表上述实施例中提到的目标公约数,N的值可以根据硬件情况预先设置,b为压缩数据中最长码字的码长c和最短码字的码长a(最短码字的码长等于目标公约数a)的差值,a*N可以表示本发明实施例提供的解压方法的解压速度,即a和N越大,本发明实施例提供的解压方法的解压速度就越高。示例性的,在a=3,c=12,b=c-a=9时,解压引擎每个时钟周期读取3N+9位的数据。After obtaining the variable-length code table, the decompression engine can read the code length from the compressed data in each clock cycle according to the generation sequence of the compressed data (that is, the data generated first is read first, and the data generated last is read later). The code segment of a*N+b (a*N+b is greater than or equal to the code length c of the longest codeword in the compressed data), the code length read in the xth clock cycle is the code segment of a*N+b The last b bits coincide with the first b bits of the code segment with a code length of a*N+b read in the x+1th clock cycle, where N is an integer greater than 0, and a represents the target common divisor mentioned in the above embodiment , the value of N can be preset according to the hardware situation, b is the difference between the code length c of the longest code word in the compressed data and the code length a of the shortest code word (the code length of the shortest code word is equal to the target common divisor a), a *N may represent the decompression speed of the decompression method provided by the embodiment of the present invention, that is, the larger a and N are, the higher the decompression speed of the decompression method provided by the embodiment of the present invention. Exemplarily, when a=3, c=12, b=c-a=9, the decompression engine reads 3N+9 bits of data per clock cycle.
压缩数据中,码字的码长的最大公约数a、压缩数据中最长码字的码长和最短码字的码长的差值b等数值,可以是解压引擎在解压之前就预先获知的,示例性的,可以是在设置解压引擎时就将这些数值写入了解压引擎。或者,码字的码长的最大公约数a、压缩数据中最长码字的码长和最短码字的码长的差值b等数值也可以是包括在变长编码表中,解压引擎可以在解析变长编码表时获取这些数值。In the compressed data, the values such as the greatest common divisor a of the code length of the code word, the difference between the code length of the longest code word and the code length of the shortest code word in the compressed data, etc., can be known in advance by the decompression engine before decompression. , for example, these values can be written into the decompression engine when the decompression engine is set. Alternatively, values such as the greatest common divisor a of the code length of the code word, the difference between the code length of the longest code word in the compressed data and the code length of the shortest code word b, etc., can also be included in the variable-length encoding table, and the decompression engine can These values are obtained when parsing the variable-length encoding table.
时钟周期(英文:Clock Cycle)也称为振荡周期,时钟周期是计算机中的时间单位。在一个时钟周期内,CPU完成一个基本的动作。The clock cycle (English: Clock Cycle) is also called the oscillation cycle, and the clock cycle is the unit of time in the computer. In one clock cycle, the CPU completes a basic action.
在步骤302中,解压引擎从压缩数据中每个时钟周期读取了固定码长的码段,因此可以以一个稳定的带宽向解压引擎传输数据,解压引擎对于带宽的利 用率较高,本发明实施例提供的解压方法可以应用于带宽压缩场景,例如神经网络参数在线解压应用,而相关技术中由于每个时钟周期解压的码字的码长可能不同,因而解压引擎读取的码段的码长也不同,难以应用于带宽压缩场景。In
在本发明实施例提供的解压方法中,解压引擎可以通过调整N的值来调节每个时钟周期读取的数据量,进而调节解压速度。In the decompression method provided by the embodiment of the present invention, the decompression engine can adjust the amount of data read in each clock cycle by adjusting the value of N, thereby adjusting the decompression speed.
由于相邻的两个时钟周期中的前一个时钟周期读取的码段末尾可能存在未解压的多位码元,为了避免这些码元遗漏,解压引擎在每个时钟周期读取码段时,将前一个时钟周期读取的码段的最后b位重复读取,这是因为整个压缩数据的码长是a的整数倍,前一个时钟周期最多可能未解压的码元的个数为b(在前一个时钟周期读取的码段中,最后一个码字为最大的长度c时,前一个时钟周期遗漏的未解压码元为最大个数b),前一个时钟周期最少可能未解压的码元的个数为0。需要说明的是,实际上,前一个时钟周期未解压的码元的个数为区间[0,K]内一个整数与a的乘积,其中K等于b/a。Since there may be uncompressed multi-bit symbols at the end of the code segment read in the previous clock cycle in two adjacent clock cycles, in order to avoid the omission of these symbols, when the decompression engine reads the code segment in each clock cycle, The last b bits of the code segment read in the previous clock cycle are read repeatedly, because the code length of the entire compressed data is an integer multiple of a, and the maximum number of symbols that may not be decompressed in the previous clock cycle is b ( In the code segment read in the previous clock cycle, when the last codeword is the maximum length c, the number of undecompressed symbols missed in the previous clock cycle is the maximum number b), and the least possible undecompressed code in the previous clock cycle The number of elements is 0. It should be noted that, in fact, the number of uncompressed symbols in the previous clock cycle is the product of an integer in the interval [0, K] and a, where K is equal to b/a.
步骤303、解压引擎根据第x时钟周期从压缩数据中读取的码长为a*N+b的码段,得到的(a*N+b)/a个子码段。Step 303: The decompression engine obtains (a*N+b)/a sub-code segments according to the code segment with the code length a*N+b read from the compressed data according to the xth clock cycle.
其中,x为大于或等于1的整数;(a*N+b)/a个子码段中每个子码段的码长为c,(a*N+b)/a个子码段中包括首个子码段,首个子码段的起始位与第x时钟周期读取的码段的起始位重合,且(a*N+b)/a个子码段中相邻两个子码段的起始位之间间隔的码长为a-1。Among them, x is an integer greater than or equal to 1; the code length of each sub-code segment in (a*N+b)/a sub-code segments is c, and (a*N+b)/a sub-code segments include the first sub-code segment Code segment, the start bit of the first subcode segment coincides with the start bit of the code segment read in the xth clock cycle, and the start of (a*N+b)/a subcode segment of two adjacent subcode segments The code length of the interval between bits is a-1.
步骤304、解压引擎基于变长编码表,并发地对(a*N+b)/a个子码段进行解压处理,得到(a*N+b)/a个字符,其中,解压每一子码段得到一个字符。在x等于1时,执行步骤305;在x为大于或等于2的整数时,执行步骤306。
在第x时钟周期读取的a*N+b位数据中,解压引擎可以并发的对多个子码段进行解压处理,在对每个子码段进行解压处理得到的一个字符对应的码字的起始码元为每个子码段的起始码元。In the a*N+b-bit data read in the xth clock cycle, the decompression engine can decompress multiple subcode segments concurrently. The start symbol is the start symbol of each subcode segment.
其中,解压引擎对码长为a*N+b的码段中的起始位为第N*a位至第(N+K-1)*a位开始的多个子码段根据变长编码表进行解压缩时,码长为a*N+b的码段中剩余的码元的个数不够c个,解压引擎可以将这些码元不够c个的子码段以预设值(该预设值可以随意设置,如全为0或全为1,或者0和1交替等)将码长填充至c,然后再对这些子码段进行解压缩,得到多个字符,之后可以检 测每个子码段解压得到的字符在变长编码表中对应的码字的码长,如某个子码段解压得到的字符在变长编码表中对应的码字的码长大于该子码段在填充前的码长,则可以将该子码段解压得到的字符删除,在删除字符后,a*N+b位数据中存在未解压的若干码元。Wherein, the decompression engine uses the variable-length coding table for multiple sub-code segments starting from the N*ath bit to the (N+K-1)*ath bit in the code segment with a code length of a*N+b. During decompression, if the number of remaining symbols in the code segment whose code length is a*N+b is not enough to c, the decompression engine can use the preset value (the preset The value can be set arbitrarily, such as all 0 or all 1, or alternating 0 and 1, etc.) Fill the code length to c, and then decompress these subcode segments to obtain multiple characters, and then each subcode can be detected. The code length of the corresponding code word in the variable-length encoding table of the character obtained by segment decompression. For example, the code length of the corresponding code word in the variable-length encoding table of the character obtained by decompressing a sub-code segment is greater than that of the sub-code segment before filling. If the code length is longer, the characters obtained by decompressing the sub-code segment can be deleted. After the characters are deleted, there are several undecompressed symbols in the a*N+b bit data.
示例性的,第x时钟周期读取的码段为“000 001 010 011 100”,a=3,c=6,依照表1所示的变长编码表,同时对子码段“000 001”、“001 010”、“010 011”、“011 100”和“100000”进行解压缩,得到的字符为码字000对应的字符“A”,码字001对应的字符“B”,码字010对应的字符“C”、码字011对应的字符“D”以及码字“100”对应的字符“E”。其中“100 000”为填充为6位的子码段,根据该6位子码段解压得到的字符“E”对应的码字的码长为3,不大于“100 000”在填充前的长度,因而可以不删除字符“E”。Exemplarily, the code segment read in the xth clock cycle is "000 001 010 011 100", a=3, c=6, according to the variable-length coding table shown in Table 1, and the sub-code segment "000 001" , "001 010", "010 011", "011 100" and "100000" are decompressed, and the obtained character is the character "A" corresponding to the codeword 000, the character "B" corresponding to the codeword 001, and the codeword 010 The corresponding character "C", the character "D" corresponding to the codeword 011, and the character "E" corresponding to the codeword "100". Among them, "100 000" is a sub-code segment filled with 6 bits. The code length of the code word corresponding to the character "E" obtained by decompressing the 6-bit sub-code segment is 3, which is not greater than the length of "100 000" before filling. Therefore, the character "E" may not be deleted.
需要说明的是,压缩数据中的码字与将该压缩数据进行解压得到的目标解压数据中的字符是一一对应的。压缩数据中的第K个码字是与目标解压数据中的第K个字符对应的,确定了某个码字的起始位,就能够确定该码字(从起始位开始读取最少a位码元,最多c位码元,能够从变长编码表中得到这些码元表示的一个码字)。现有技术中,压缩数据中的每个码字的码长不全相同,不按照压缩数据中的顺序解压出一个码字前,解压引擎无法获知该码字的码长,进而也无法获知下一个码字的起始位。但由于本发明实施例提供的解压方法中的压缩数据中的码字的码长具有最大公约数a,因而解压数据中每个字符对应的码字的起始位是包括在(a*N+b)/a个子码段中每个子码段的起始位中的,可以称对压缩数据解压得到的目标解压数据中每个字符对应的码字(从变长编码表中可以得知每个字符对应的码字)的起始位为(a*N+b)/a个子码段的起始位中的正确起始位(正确起始位是以该位在整个压缩数据中的顺序来确定的,如解压数据中第一个字符对应的码字的起始位为整个压缩数据的第一位,则整个压缩数据中的第一位即为解压数据中第一个字符对应的码字的正确起始位),而将(a*N+b)/a个子码段的起始位中除正确起始位的其他位称为错误起始位。步骤304并发对(a*N+b)/a个子码段进行解压处理得到的多个字符中,包括从正确起始位开始解压得到的有效字符和从错误起始位开始解压得到的无效字符。It should be noted that there is a one-to-one correspondence between the codewords in the compressed data and the characters in the target decompressed data obtained by decompressing the compressed data. The Kth codeword in the compressed data corresponds to the Kth character in the target decompressed data. If the start bit of a codeword is determined, the codeword can be determined (read at least a from the start bit) Bit symbols, up to c bit symbols, can obtain a codeword represented by these symbols from the variable-length coding table). In the prior art, the code lengths of each codeword in the compressed data are not all the same. Before decompressing a codeword according to the order in the compressed data, the decompression engine cannot know the code length of the codeword, and thus cannot know the next codeword. The start bit of the codeword. However, since the code length of the code word in the compressed data in the decompression method provided by the embodiment of the present invention has the greatest common divisor a, the start bit of the code word corresponding to each character in the decompressed data is included in (a*N+ b) In the start bit of each sub-code segment in the/a sub-code segments, it can be called the code word corresponding to each character in the target decompressed data obtained by decompressing the compressed data (from the variable-length coding table, we can know that each character The starting bit of the code word corresponding to the character) is the correct starting bit in the starting bits of (a*N+b)/a sub-code segments (the correct starting bit is the order of the bit in the entire compressed data) Determined, if the start bit of the code word corresponding to the first character in the decompressed data is the first bit of the entire compressed data, then the first bit in the entire compressed data is the code word corresponding to the first character in the decompressed data. The correct start bit), and the other bits except the correct start bit among the start bits of (a*N+b)/a subcode segments are called error start bits. In
步骤305,在x等于1时,解压引擎从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据,第x时钟周期的目标解压数据中第一个有效字符对应的 码字的起始码元和第x时钟周期读取的码段的起始码元相重合。
其中,第x时钟周期的目标解压数据中,每相邻的两个有效字符中一个有效字符对应的码字的末位码元在第x时钟周期读取的码段中的位置和另一个有效字符对应的码字的起始码元在第x时钟周期读取的码段中的位置相邻。第x时钟周期的目标解压数据包括按照特定顺序排列的多个有效字符。Among them, in the target decompressed data of the xth clock cycle, the position of the last symbol of the code word corresponding to one valid character in each adjacent two valid characters in the code segment read in the xth clock cycle and the other valid character The starting symbols of the code words corresponding to the characters are located adjacent to each other in the code segment read in the xth clock cycle. The target decompressed data for the xth clock cycle includes a plurality of valid characters arranged in a specific order.
解压引擎在得到了(a*N+b)/a个字符后,就得知了这(a*N+b)/a个字符中每个字符对应的码字的码长。因为这多个字符中包括从错误起始位开始解压得到的无效字符,因而可以从中选出多个有效字符进行拼接,以得到第x时钟周期的目标解压数据,而选择的依据可以是先确定第x时钟周期读取的码长为a*N+b的码段中第一个码字的正确起始位,然后根据该正确起始位以及第一个码字的码长来确定第二个码字的正确起始位(第二个码字的正确起始位为第一个码字的末位的下一位),以此类推,解压引擎就能够获得每个有效字符对应的码字的正确起始位,进而就能够选出多个有效字符,将这多个有效字符以每个有效字符对应的码字在第x时钟周期读取的a*N+b位数据中的先后顺序进行拼接,就得到了第x时钟周期读取的数据的目标解压数据。After the decompression engine obtains (a*N+b)/a characters, it knows the code length of the codeword corresponding to each of the (a*N+b)/a characters. Because these multiple characters include invalid characters obtained from decompression from the wrong start bit, multiple valid characters can be selected for splicing to obtain the target decompressed data of the xth clock cycle, and the selection basis can be determined first. The code length read in the xth clock cycle is the correct start bit of the first code word in the code segment of a*N+b, and then the second code word is determined according to the correct start bit and the code length of the first code word. The correct start bit of each codeword (the correct start bit of the second codeword is the next bit of the last bit of the first codeword), and so on, the decompression engine can obtain the code corresponding to each valid character The correct start bit of the word, and then multiple valid characters can be selected, and the multiple valid characters are the order of the code words corresponding to each valid character in the a*N+b bit data read in the xth clock cycle. Splicing is performed sequentially, and the target decompressed data of the data read in the xth clock cycle is obtained.
在x=1时,第x时钟周期读取的码长为a*N+b的码段的起始位即为压缩数据的起始位,该起始位为第x时钟周期读取的码长为a*N+b的码段中第一个码字的正确起始位。When x=1, the start bit of the code segment with the code length a*N+b read in the xth clock cycle is the start bit of the compressed data, and the start bit is the code read in the xth clock cycle The correct start bit of the first codeword in a code segment of length a*N+b.
示例性的,如图3B所示,其为第x时钟周期读取码长为a*N+b的码段中的前24位码元,在步骤304中,可以得到其中每个码字的码长以及每个码字对应的字符,即0至5位构成的码长为6的码字对应字符A,3至8位构成的码长为6的码字对应字符B,6至14位构成的码长为9的码字对应字符C,9至14位构成的码长为6的码字对应字符D,12至23位构成的码长为12的码字对应字符E,15至17位构成的码长为3的码字对应字符F,其中,码字的码长以及码字对应的字符可以如表2所示:Exemplarily, as shown in FIG. 3B , it reads the first 24-bit symbols in the code segment whose code length is a*N+b in the xth clock cycle. The code length and the character corresponding to each code word, that is, the code word composed of 0 to 5 bits with a code length of 6 corresponds to character A, and the code word composed of 3 to 8 bits with a code length of 6 corresponds to character B, and 6 to 14 bits. The code word formed with the code length of 9 corresponds to the character C, the code word formed by 9 to 14 bits is the corresponding character D of the code length 6, the code word formed by the 12 to 23 bits is the corresponding character E of the code length of 12, and 15 to 17 The code word with the code length of 3 corresponding to the character F, wherein the code length of the code word and the character corresponding to the code word can be as shown in Table 2:
表2Table 2
在表2中,每个字符自上而下的顺序,和每个字符对应的码字的起始位在第x时钟周期读取码长为a*N+b的码段中的先后顺序是一致的。从图3B示出的24位码元以及该24位码元对应的多个字符中确定有效字符的方式可以为:以该24位码元的起始位作为第一个码字的起始位(该起始位为第一个有效字符对应的码字的正确起始位),确定了第一个码字的码长为6,且第一个码字对应的字符为A,由于第一个码字的码长为6,因而第二个码字的正确起始位为该24位码元的第6位(前0至5位被第一个码字占用),而以该24位码元的第6位为起始位置的码字对应的字符为C,因而可以将字符C作为第二个有效字符,由于第二个有效字符对应的码字的码长为9,因而第三个码字的正确起始位为该24位码元的第15位(前0至14位被第一个码字和第二个码字占用),以第15位为起始位置的码字对应的字符为F,因而得到的解压数据为A C F。In Table 2, the top-down sequence of each character, and the sequence of the start bit of the codeword corresponding to each character in the code segment with the code length a*N+b read in the xth clock cycle are: consistent. The way to determine a valid character from the 24-bit symbol shown in FIG. 3B and a plurality of characters corresponding to the 24-bit symbol may be: taking the start bit of the 24-bit symbol as the start bit of the first codeword (The start bit is the correct start bit of the code word corresponding to the first valid character), it is determined that the code length of the first code word is 6, and the character corresponding to the first code word is A. The code length of each codeword is 6, so the correct start bit of the second codeword is the 6th bit of the 24-bit symbol (the first 0 to 5 bits are occupied by the first codeword), and the 24-bit The character corresponding to the code word whose 6th bit of the code element is the starting position is C, so the character C can be used as the second valid character. Since the code length of the code word corresponding to the second valid character is 9, the third The correct start bit of a codeword is the 15th bit of the 24-bit symbol (the first 0 to 14 bits are occupied by the first codeword and the second codeword), and the codeword with the 15th bit as the starting position The corresponding character is F, so the decompressed data obtained is A C F.
步骤306、在x为大于或等于2的整数时,解压引擎从(a*N+b)/a个字符中确定出c/a组候选解压数据。Step 306: When x is an integer greater than or equal to 2, the decompression engine determines c/a groups of candidate decompressed data from (a*N+b)/a characters.
其中,每组候选解压数据包括多个字符,且每相邻的两个字符中,一个字符对应的码字的末位码元在第x时钟周期读取的码段中的位置和另一个字符对应的码字的起始码元在第x时钟周期读取的码段中的位置是相邻的;每两组候选解压数据内第一个字符对应的码字的起始码元在第x时钟周期读取的码段中的位置是不相同的,且每组候选解压数据内第一个字符对应的码字的起始码元为第x时钟周期读取的码段中的第W*a个码元,W为大于或等于0且小于或等于b/a的整数。Among them, each group of candidate decompression data includes multiple characters, and in each adjacent two characters, the position of the last symbol of the code word corresponding to one character in the code segment read in the xth clock cycle and another character The position of the starting symbol of the corresponding codeword in the code segment read in the xth clock cycle is adjacent; the starting symbol of the codeword corresponding to the first character in each two groups of candidate decompressed data is located at the xth The positions in the code segments read by the clock cycle are different, and the start symbol of the code word corresponding to the first character in each group of candidate decompressed data is W* in the code segment read in the xth clock cycle a symbols, W is an integer greater than or equal to 0 and less than or equal to b/a.
在x为大于或等于2的整数时,由于第x-1时钟周期读取的数据还未完成解压缩,解压引擎难以获知第x-1时钟周期读取的数据中未解压的码元的个数,因而解压引擎当前难以确定读取的码长为a*N+b的码段中第一个码字的正确起始位。此时,解压引擎可以确定出所有可能的正确起始位(所有可能的正确起始位为第x时钟周期读取的码长为a*N+b的码段中的第H*a位,H取0至K中的每个整数,K=b/a),然后分别以这些所有可能的正确起始位为第一个码字的正确起始位,以步骤305中的方式得到c/a个候选解压数据。When x is an integer greater than or equal to 2, since the data read in the x-1 clock cycle has not been decompressed, it is difficult for the decompression engine to know the number of uncompressed symbols in the data read in the x-1 clock cycle. Therefore, it is difficult for the decompression engine to determine the correct start bit of the first codeword in the code segment whose code length is a*N+b. At this point, the decompression engine can determine all possible correct start bits (all possible correct start bits are the H*a bit in the code segment with a code length of a*N+b read in the xth clock cycle, H takes each integer from 0 to K, K=b/a), and then takes these all possible correct start bits as the correct start bits of the first codeword respectively, and obtains c/ a candidate decompressed data.
其中,由于第x时钟周期读取的码段的前b位与第x-1时钟周期读取的码段的后b位是重复的,而这重复的b位中的前几位可能存在第x-1时钟周期读取的码段中最后一个码字的一部分,第x时钟周期读取的码段中第一个码字的正确起始位为第x-1时钟周期读取的码段中最后一个码字的末位的下一位。Among them, because the first b bits of the code segment read in the xth clock cycle and the last b bits of the code segment read in the x-1th clock cycle are repeated, and the first few bits of the repeated b bits may exist in the A part of the last code word in the code segment read in the x-1 clock cycle, the correct start bit of the first code word in the code segment read in the x-th clock cycle is the code segment read in the x-1 clock cycle The digit next to the last digit of the last codeword in .
由于压缩数据中码字的码长的最大公约数为a,因而第x时钟周期读取的码段的前b位中存在的第x-1时钟周期读取的码段的长度可以为0至K中的任一整数与a的乘积(在长度为0*a时,表明第x时钟周期读取的码段的前b位中不存在第x-1时钟周期读取的码段中最后一个码字的一部分),因而第x时钟周期读取的码段的前b位码元中的第H*a位可能是第x时钟周期读取的码段中第一个码字的正确起始位,而候选解压数据的个数取决于0至K中整数的个数,该个数等于K+1=b/a+1=(a+b)/a=c/a,其中,给K加1是因为0也算作0至K中的一个整数。Since the greatest common divisor of the code length of the code word in the compressed data is a, the length of the code segment read in the x-1th clock cycle that exists in the first b bits of the code segment read in the xth clock cycle can be 0 to The product of any integer in K and a (when the length is 0*a, it indicates that there is no last bit in the code segment read in the x-1th clock cycle in the first b bits of the code segment read in the xth clock cycle part of the codeword), so the H*a bit in the first b-bit symbols of the code segment read at the xth clock cycle may be the correct start of the first codeword in the code segment read at the xth clock cycle bits, and the number of candidate decompressed data depends on the number of integers from 0 to K, which is equal to K+1=b/a+1=(a+b)/a=c/a, where K+1=b/a+1=(a+b)/a=c/a 1 is added because 0 also counts as an integer from 0 to K.
示例性的,c=9,a=3,第x时钟周期读取的码段的前9位为“000 001 010”,则其中的第0位、第3位和第6位为可能的正确起始位,解压引擎可以分别将第0位、第3位和第6位作为3个候选解压数据中第一个字符对应的码字的起始位,并分别以步骤305中的方式得到3个候选解压数据。Exemplarily, c=9, a=3, the first 9 bits of the code segment read in the xth clock cycle is "000 001 010", then the 0th, 3rd and 6th bits are possible correct start bit, the decompression engine can take the 0th bit, the 3rd bit and the 6th bit as the start bit of the code word corresponding to the first character in the three candidate decompressed data, and obtain 3 bits in the manner in
步骤307、解压引擎从c/a组候选解压数据中确定出第x时钟周期的目标解压数据。Step 307: The decompression engine determines the target decompression data of the xth clock cycle from the candidate decompression data of the group c/a.
在压缩数据中,第x-1时钟周期的目标解压数据中最后一个字符对应的码字的末位码元和第x时钟周期的目标解压数据中第一个字符对应的码字的起始码元是相邻的,第x-1时钟周期的目标解压数据是指第x-1时钟周期读取的码段对应的目标解压数据。在x等于1的情况下,第x-1时钟周期的目标解压数据即为对第1时钟周期读取的码段进行解压得到的按照特定顺序排列的多个有效字符。In the compressed data, the last symbol of the codeword corresponding to the last character in the target decompressed data of the x-1th clock cycle and the start code of the codeword corresponding to the first character in the target decompressed data of the xth clock cycle The elements are adjacent, and the target decompression data of the x-1th clock cycle refers to the target decompression data corresponding to the code segment read in the x-1th clock cycle. In the case where x is equal to 1, the target decompressed data of the x-1th clock cycle is a plurality of valid characters arranged in a specific order obtained by decompressing the code segment read in the first clock cycle.
解压引擎在得到了第x时钟周期的c/a个候选解压数据后,同时也完成了对第x-1时钟周期读取的码段的解压,得到了第x-1时钟周期的目标解压数据,解压引擎可以根据第x-1时钟周期的目标解压数据确定第x时钟周期读取的数据中第一个码字的正确起始位,并从c/a个候选解压数据中选出第x时钟周期的目标解压数据。After the decompression engine obtains the c/a candidate decompressed data of the xth clock cycle, it also completes the decompression of the code segment read in the x-1th clock cycle, and obtains the target decompression data of the x-1th clock cycle. , the decompression engine can determine the correct start bit of the first codeword in the data read in the xth clock cycle according to the target decompression data of the x-1th clock cycle, and select the xth from the c/a candidate decompression data Target decompressed data in clock cycles.
示例性的,第x-1时钟周期的解压数据为“ABCD”,对应的码字为“000 001 010011”,第x-1时钟周期读取的码段为“000 001 010 011 101”,b=6,a=3,c=9, N=4,a*N+b=15,第x时钟周期读取的码段为“011 101000 101001”,其中前6位为第x-1时钟周期读取的码段的后6位,第x-1时钟周期读取的码段中后三位“101”为一个6位的码字的前三位,以步骤304中的方式不会对这三位进行解压缩,这三位数据为未解压的码段,可以将这三位码段的起始码元确定为第x时钟周期的目标解压数据中第一个字符对应的码字的起始码元。Exemplarily, the decompressed data of the x-1 clock cycle is "ABCD", the corresponding code word is "000 001 010011", and the code segment read in the x-1 clock cycle is "000 001 010 011 101", b =6, a=3, c=9, N=4, a*N+b=15, the code segment read in the xth clock cycle is "011 101000 101001", of which the first 6 bits are the x-1th clock cycle The last 6 bits of the read code segment, the last three bits "101" in the code segment read in the x-1 clock cycle are the first three bits of a 6-bit code word, and the method in
需要说明的是,本发明实施例在对每个时钟周期读取的码段是并发的进行处理的,即解压引擎在第x-1时钟周期读取了码段后,解压引擎在对在第x-1时钟周期读取的码段进行解压缩时,解压引擎又在第x时钟周期读取了码段,并同时开始对第x时钟周期读取的码段进行解压。第x-1时钟周期读取的码段比第x时钟周期读取的码段先读取一个时钟周期,因而解压引擎在对第x时钟周期读取的码段进行步骤303至步骤307的任何一种处理时,都比解压引擎在对x-1时钟周期读取的码段进行相同处理慢一个步骤。因此,解压引擎在对第x时钟周期读取的码段的处理过程中,在执行步骤306时,解压引擎已经完成了对第x-1时钟周期读取的码段的解压,得到了第x-1时钟周期的目标解压数据。It should be noted that, in the embodiment of the present invention, the code segments read in each clock cycle are processed concurrently, that is, after the decompression engine reads the code segments in the x-1th clock cycle, the decompression engine processes the code segments in the x-1 clock cycle. When the code segment read at the x-1 clock cycle is decompressed, the decompression engine reads the code segment again at the xth clock cycle, and simultaneously starts to decompress the code segment read at the xth clock cycle. The code segment read at the x-1th clock cycle is read one clock cycle before the code segment read at the xth clock cycle, so the decompression engine performs any of
通过步骤302至步骤307能够将压缩数据中的全部数据进行解压缩,获取每个时钟周期的目标解压数据。Through
步骤308、解压引擎根据压缩数据,将获取的多个时钟周期的解压数据进行拼接,获取压缩数据对应的解压数据。Step 308: The decompression engine splices the obtained decompressed data of multiple clock cycles according to the compressed data, and obtains the decompressed data corresponding to the compressed data.
其中,拼接后相邻的两个时钟周期中,一个时钟周期的目标解压数据内最后一个字符对应的码字的末位码元在压缩数据中的位置和另一个时钟周期的目标解压数据内第一个字符对应的码字的起始码元在压缩数据中的位置是相邻的。Among them, in the two adjacent clock cycles after splicing, the position of the last symbol of the code word corresponding to the last character in the target decompressed data of one clock cycle in the compressed data and the position of the last symbol in the target decompressed data of another clock cycle The positions of the start symbols of the code words corresponding to one character in the compressed data are adjacent.
如图3C所示,其为应用本发明实施例提供的解压缩方法时、应用基于霍夫曼编码的解压缩方法时以及应用基于LZ4的解压缩方法时,解压引擎所占用的芯片面积的条形图,其中,纵轴代表芯片面积,芯片的面积单位为平方毫米,芯片是指16纳米(nm)工艺的芯片。其中,条形31代表应用基于霍夫曼编码的解压缩方法时,解压引擎所占用的芯片面积,该面积为0.05平方毫米。条形32代表最大公约数为3,应用本发明实施例提供的解压缩方法时,解压引擎所占用的芯片面积,该面积为0.035平方毫米。条形33代表最大公约数为2,应用本发明实施例提供的解压缩方法时,解压引擎所占用的芯片面积,该面积为 0.025平方毫米。条形34代表应用基于LZ4的解压缩方法时,解压引擎所占用的芯片面积,该面积为0.16平方毫米。可以看出本发明实施例提供的解压缩方法,所占用的芯片面积小于相关技术中的解压缩方法。As shown in FIG. 3C , it is a bar of the chip area occupied by the decompression engine when the decompression method provided by the embodiment of the present invention is applied, when the decompression method based on Huffman coding is applied, and when the decompression method based on LZ4 is applied. Graph, wherein the vertical axis represents the area of the chip, the area of the chip is in square millimeters, and the chip refers to a chip with a 16 nanometer (nm) process. The
此外,本发明实施例提供的解压缩方法的解压速度可以达到根据哈夫曼编码生成的压缩数据封装的解压速度的10倍左右,并与基于LZ4生成的压缩数据封装的解压速度基本持平。In addition, the decompression speed of the decompression method provided by the embodiment of the present invention can reach about 10 times the decompression speed of the compressed data package generated according to Huffman coding, and is basically the same as the decompression speed of the compressed data package generated based on LZ4.
综上所述,本发明实施例提供的解压缩方法,在对压缩数据进行解压缩时,通过并发地对(a*N+b)/a个子码段进行解压处理,得到(a*N+b)/a个字符,然后根据变长编码表和第x时钟周期读取的码段,从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据。不用等待解压完一个码字后再根据该码字的长度解压下一个码字。解决了相关技术中解压速度较低的问题。在进行解压缩时可以同时从多个位置开始解压缩,解压速度较高。To sum up, in the decompression method provided by the embodiments of the present invention, when decompressing compressed data, by concurrently decompressing (a*N+b)/a subcode segments, (a*N+ b)/a characters, and then determine the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length encoding table and the code segment read in the xth clock cycle. There is no need to wait for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword. The problem of low decompression speed in the related art is solved. When decompressing, you can start decompressing from multiple locations at the same time, and the decompression speed is high.
图4是本发明实施例提供的一种压缩装置的结构方框图,该压缩装置可以成为终端或服务器的部分或者全部。该压缩装置400可以包括:FIG. 4 is a structural block diagram of a compression apparatus provided by an embodiment of the present invention, and the compression apparatus may become part or all of a terminal or a server. The compression device 400 may include:
公约数确定单元410,用于执行上述实施例中的步骤202。The common
编码表生成单元420,用于执行上述实施例中的步骤203和步骤201。The coding
压缩单元430,用于执行上述实施例中的步骤204。The
封装生成单元440,用于执行上述实施例中的步骤205。The
综上所述,本发明实施例提供的压缩装置,通过生成码字的长度具有最大公约数的变长编码表,并根据该变长编码表对待压缩数据进行压缩,使得解压引擎在解压该压缩数据时,可以从多个位置并发开始解压,而不用等待解压完一个码字后再根据该码字的长度解压下一个码字,解压速度较高。To sum up, the compression device provided by the embodiment of the present invention generates a variable-length encoding table with the greatest common divisor of the length of the codeword, and compresses the data to be compressed according to the variable-length encoding table, so that the decompression engine decompresses the compressed data. Data can be decompressed concurrently from multiple locations without waiting for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword, and the decompression speed is high.
图5A是本发明实施例提供的一种解压缩装置的结构方框图。该解压缩装置500可以包括:FIG. 5A is a structural block diagram of a decompression apparatus provided by an embodiment of the present invention. The decompression apparatus 500 may include:
子码段确定单元510,用于执行上述实施例中的步骤301、302和303。The subcode
解压单元520,用于执行上述实施例中的步骤304。The
解压数据确定单元530,用于执行上述实施例中的步骤305、306和307。The decompressed
可选的,如图5B所示,该解压缩装置500还可以包括:Optionally, as shown in FIG. 5B , the decompression apparatus 500 may further include:
拼接单元540,用于执行上述实施例中的步骤308。The
综上所述,本发明实施例提供的解压缩装置,在对压缩数据进行解压缩时,通过并发地对(a*N+b)/a个子码段进行解压处理,得到(a*N+b)/a个字符,然后根据变长编码表和第x时钟周期读取的码段,从(a*N+b)/a个字符中确定出第x时钟周期的目标解压数据。不用等待解压完一个码字后再根据该码字的长度解压下一个码字。解决了相关技术中解压速度较低的问题。在进行解压缩时可以同时从多个位置开始解压缩,解压速度较高。To sum up, in the decompression apparatus provided by the embodiment of the present invention, when decompressing the compressed data, by concurrently decompressing (a*N+b)/a subcode segments, (a*N+ b)/a characters, and then determine the target decompressed data of the xth clock cycle from (a*N+b)/a characters according to the variable-length encoding table and the code segment read in the xth clock cycle. There is no need to wait for a codeword to be decompressed before decompressing the next codeword according to the length of the codeword. The problem of low decompression speed in the related art is solved. When decompressing, you can start decompressing from multiple locations at the same time, and the decompression speed is high.
参见附图6,为本发明实施例提供的一种数据处理系统的结构示意图,该数据处理系统包括压缩装置61和解压缩装置63,压缩装置61可以为前述实施例所述的压缩装置,解压缩装置63可以为前述实施例所述的解压缩装置。压缩装置61用于对待压缩数据进行压缩得到压缩数据;解压缩装置63用于对所述压缩数据进行解压处理。该数据处理系统可以应用于终端或服务器中。其中,终端可以包括智能手机、平板电脑、计算机和膝上型计算机,而服务器可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。Referring to FIG. 6, it is a schematic structural diagram of a data processing system provided by an embodiment of the present invention. The data processing system includes a
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于 一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, etc.
以上所述仅为本发明实施例的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。The above descriptions are only optional embodiments of the embodiments of the present invention, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the present application. within the scope of protection.
Claims (15)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611270254.0A CN106849956B (en) | 2016-12-30 | 2016-12-30 | Compression method, decompression method, apparatus and data processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611270254.0A CN106849956B (en) | 2016-12-30 | 2016-12-30 | Compression method, decompression method, apparatus and data processing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106849956A CN106849956A (en) | 2017-06-13 |
CN106849956B true CN106849956B (en) | 2020-07-07 |
Family
ID=59118309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611270254.0A Active CN106849956B (en) | 2016-12-30 | 2016-12-30 | Compression method, decompression method, apparatus and data processing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106849956B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109981108B (en) * | 2017-12-27 | 2023-05-02 | 杭州海康威视数字技术股份有限公司 | Data compression method, decompression method, device and equipment |
CN110784225A (en) * | 2018-07-31 | 2020-02-11 | 华为技术有限公司 | Data compression, decompression method and related device, electronic equipment, system |
CN111384967B (en) * | 2018-12-28 | 2022-12-09 | 上海寒武纪信息科技有限公司 | Data encoding method |
CN111600610B (en) * | 2020-05-26 | 2023-04-28 | 北京思特奇信息技术股份有限公司 | Universal coding method, system and electronic equipment for variable-length integers |
CN114124106B (en) * | 2022-01-28 | 2022-04-26 | 苏州浪潮智能科技有限公司 | LZ4 decompression method, system, storage medium and equipment |
CN118337217A (en) * | 2023-01-12 | 2024-07-12 | 华为技术有限公司 | Method for compressing data, method for decompressing data, device, system and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1291826A (en) * | 1999-08-02 | 2001-04-18 | 三星电子株式会社 | Variable-length coding method and device |
CN1758761A (en) * | 2004-08-27 | 2006-04-12 | 松下电器产业株式会社 | Coding apparatus and imaging apparatus |
CN101150719A (en) * | 2006-09-20 | 2008-03-26 | 华为技术有限公司 | Method and device for parallel video coding |
US9036711B1 (en) * | 2008-11-06 | 2015-05-19 | Marvell International Ltd. | Visual data compression algorithm with parallel processing capability |
CN105933708A (en) * | 2016-04-15 | 2016-09-07 | 张彦刚 | Data compression-decompression method and device |
-
2016
- 2016-12-30 CN CN201611270254.0A patent/CN106849956B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1291826A (en) * | 1999-08-02 | 2001-04-18 | 三星电子株式会社 | Variable-length coding method and device |
CN1758761A (en) * | 2004-08-27 | 2006-04-12 | 松下电器产业株式会社 | Coding apparatus and imaging apparatus |
CN101150719A (en) * | 2006-09-20 | 2008-03-26 | 华为技术有限公司 | Method and device for parallel video coding |
US9036711B1 (en) * | 2008-11-06 | 2015-05-19 | Marvell International Ltd. | Visual data compression algorithm with parallel processing capability |
CN105933708A (en) * | 2016-04-15 | 2016-09-07 | 张彦刚 | Data compression-decompression method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106849956A (en) | 2017-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106849956B (en) | Compression method, decompression method, apparatus and data processing system | |
US10116325B2 (en) | Data compression/decompression device | |
US11463102B2 (en) | Data compression method, data decompression method, and related apparatus, electronic device, and system | |
JP7321208B2 (en) | Polar code rate matching method and apparatus | |
JP5498783B2 (en) | Data compression method | |
JP6512733B2 (en) | Data compression method and apparatus for performing the method | |
KR102381999B1 (en) | Method and system for decoding variable length coded input and method for modifying codebook | |
US9934234B2 (en) | Adaptive rate compression hash processor | |
CN104811209B (en) | A kind of the compressed file data embedding method and device of anti-most long matching detection | |
US20240364363A1 (en) | Systems and Methods for Lossless Compression of Tabular Numeric Data | |
US10230392B2 (en) | Techniques for parallel data decompression | |
JP6647340B2 (en) | Improved file compression and encryption | |
CN106293542B (en) | Method and device for decompressing file | |
US9160362B1 (en) | Lempel-Ziv (LZ)-based data compression employing implicit variable-length distance coding | |
US7439887B2 (en) | Method and apparatus for GIF decompression using fixed-size codeword table | |
CN108932315A (en) | A kind of method and relevant apparatus of data decompression | |
JP2005521324A (en) | Method and apparatus for lossless data compression and decompression | |
US9054730B2 (en) | Method and system for LZW based decompression | |
US8190584B1 (en) | Utilizing recursive application of a reversible transform which involves lexicographic ordering | |
CN114172521B (en) | A decompression chip verification method, apparatus, device and readable storage medium | |
CN110875744B (en) | Coding method and device | |
CN119727736A (en) | Entropy coding method and device of data, electronic equipment and storage medium | |
CN114416350A (en) | Decompression method, decompression device and readable storage medium | |
CN117640966A (en) | Image processing method, device, equipment and storage medium | |
TW202418807A (en) | Noniterative entropy coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |