[go: up one dir, main page]

CN107967296B - Improved LZO compression method with rapid low resource overhead - Google Patents

Improved LZO compression method with rapid low resource overhead Download PDF

Info

Publication number
CN107967296B
CN107967296B CN201711050579.2A CN201711050579A CN107967296B CN 107967296 B CN107967296 B CN 107967296B CN 201711050579 A CN201711050579 A CN 201711050579A CN 107967296 B CN107967296 B CN 107967296B
Authority
CN
China
Prior art keywords
compression
bits
len
new character
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711050579.2A
Other languages
Chinese (zh)
Other versions
CN107967296A (en
Inventor
张健
赖晓玲
周国昌
杨玉辰
巨艇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Institute of Space Radio Technology
Original Assignee
Xian Institute of Space Radio Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Institute of Space Radio Technology filed Critical Xian Institute of Space Radio Technology
Priority to CN201711050579.2A priority Critical patent/CN107967296B/en
Publication of CN107967296A publication Critical patent/CN107967296A/en
Application granted granted Critical
Publication of CN107967296B publication Critical patent/CN107967296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种快速低资源开销的改进LZO压缩方法,首先根据回指距离构建LZO压缩方法对新字符进行压缩的第一压缩格式及对应的第一压缩算法、第二压缩格式及对应的第二压缩算法,然后记录进行LZO压缩的新字符长度,根据新字符长度、回指距离选择的压缩格式及对应的压缩算法进行LZO压缩。本发明以LZO算法为基础,对比特文件进行统计分析,在保留哈希运算的前提下提出一套新的压缩格式,该压缩格式种类划分少,判断方式简单,在几乎不降低压缩率的前提下减小回指距离,压缩与解压缩速度均有较大提高,更便于硬件实现和宇航应用,具有很好的使用价值。

Figure 201711050579

An improved LZO compression method with fast and low resource overhead, firstly, a first compression format, a corresponding first compression algorithm, a second compression format and a corresponding second compression algorithm for compressing new characters by the LZO compression method are constructed according to the anaphora distance , and then record the new character length for LZO compression, and perform LZO compression according to the new character length, the compression format selected by the anaphora distance and the corresponding compression algorithm. Based on the LZO algorithm, the present invention performs statistical analysis on the bit file, and proposes a new compression format on the premise of retaining the hash operation. By reducing the anaphora distance, the compression and decompression speeds are greatly improved, which is more convenient for hardware implementation and aerospace applications, and has good use value.

Figure 201711050579

Description

一种快速低资源开销的改进LZO压缩方法An improved LZO compression method with fast and low resource overhead

技术领域technical field

本发明涉及比特文件压缩技术领域,特别是一种快速低资源开销的改进LZO压缩方法。The invention relates to the technical field of bit file compression, in particular to an improved LZO compression method with fast and low resource overhead.

背景技术Background technique

LZO算法基于传统字典压缩算法,使用哈希表来记录滑动窗口内的字典数据,减少了搜索相同字符的时间,极大地提高了算法的压缩速度,其通过把压缩格式精细化到按每比特使用,可以使每个字节利用更充分,增大数据的压缩比。2004年,美国太空总署将LZO无损压缩应用在了探测火星的机会号和精神号上。The LZO algorithm is based on the traditional dictionary compression algorithm, and uses a hash table to record the dictionary data in the sliding window, which reduces the time to search for the same character and greatly improves the compression speed of the algorithm. , which can make each byte more fully utilized and increase the data compression ratio. In 2004, NASA applied LZO lossless compression to the Mars explorations Opportunity and Spirit.

如图1所示为现有的LZO压缩算法流程图,其首先对输入数据做哈希运算,判断输入数据是否在哈希表内,如果输入数据不在哈希表内,则扩充哈希表,否则对输入数据按LZO压缩格式进行压缩。现有的LZO压缩算法压缩格式根据偏移距离和重复长度的不同共分为5种,如图2(a)、(b)、(c)、(d)、(e)所示为现有的LZO压缩算法5种LZO压缩格式。As shown in Figure 1 is the flow chart of the existing LZO compression algorithm, which firstly performs a hash operation on the input data to determine whether the input data is in the hash table, if the input data is not in the hash table, then expand the hash table, Otherwise, the input data is compressed according to the LZO compression format. The existing LZO compression algorithm compression formats are divided into 5 types according to the offset distance and repetition length, as shown in Figure 2(a), (b), (c), (d), (e) The LZO Compression Algorithm 5 LZO compression formats.

如图2(a)、(b)、(c)所示三种压缩格式的重复长度小于等于8字节,格式1回指距离的范围小于等于2k,格式2回指距离小于等于16k,格式3回指距离大于16k,小于等于48k。格式4格式5是重复长度大于8字节的情况,格式4回指距离小于等于16k,格式5回指距离大于16k小于等于48k。As shown in Figure 2(a), (b), (c), the repetition length of the three compression formats is less than or equal to 8 bytes. 3 back finger distance is greater than 16k, less than or equal to 48k. Format 4, Format 5, is the case where the repetition length is greater than 8 bytes. Format 4 means that the distance is less than or equal to 16k, and format 5 means that the distance is greater than 16k and less than or equal to 48k.

如图2所示的5种压缩格式最大可支持48k的回指距离,因此至少要存储48k的解压数据才能保证压缩数据被正确解压,过大的RAM深度使解压缩的速率变慢,RAM本身在解压缩时刷新频率降低,在空间辐射环境中,更易发生单粒子翻转。上述5种格式通过回指距离和重复长度共同判定使得判定过程繁琐,降低压缩速度;解压缩电路要区分5种压缩格式,需要大量的控制信号来保证解压缩正确性,使得解压缩过程复杂,增加了时序开销和硬件资源开销。The five compression formats shown in Figure 2 can support a maximum anaphora distance of 48k, so at least 48k of decompressed data must be stored to ensure that the compressed data is correctly decompressed. Excessive RAM depth will slow down the decompression rate, and the RAM itself The refresh rate is reduced during decompression, and single-event flipping is more likely to occur in a space radiation environment. The above five formats are jointly determined by the anaphora distance and the repetition length, which makes the determination process cumbersome and reduces the compression speed; the decompression circuit needs to distinguish the five compression formats, and a large number of control signals are needed to ensure the correctness of the decompression, which makes the decompression process complicated. Increased timing overhead and hardware resource overhead.

发明内容SUMMARY OF THE INVENTION

本发明解决的技术问题是:克服现有技术的不足,提供了一种快速低资源开销的改进LZO压缩方法,通过对现有的LZO压缩算法研究,在保留LZO压缩算法哈希运算的同时,通过统计分析比特文件的数据特点和LZO的五种压缩格式,设计了一种快速低资源开销的无损压缩方法,通过重新设计压缩格式,在几乎不影响压缩率前提下,压缩与解压缩速度较大提升、硬件资源开销大幅降低。The technical problem solved by the present invention is: overcoming the deficiencies of the prior art and providing an improved LZO compression method with fast and low resource overhead, through the research on the existing LZO compression algorithm, while retaining the hash operation of the LZO compression algorithm, Through statistical analysis of the data characteristics of the bit file and the five compression formats of LZO, a fast and low resource cost lossless compression method is designed. Greatly improved, hardware resource overhead is greatly reduced.

本发明的技术解决方案是:一种快速低资源开销的改进LZO压缩方法,包括如下步骤:The technical solution of the present invention is: an improved LZO compression method with fast and low resource overhead, comprising the following steps:

(1)根据回指距离构建LZO压缩方法对新字符进行压缩的第一压缩格式及对应的第一压缩算法、第二压缩格式及对应的第二压缩算法;所述的第一压缩格式适用且进行压缩的新字符的回指距离小于第二压缩格式适用且进行压缩的新字符的回指距离;(1) Constructing the first compression format and the corresponding first compression algorithm, the second compression format and the corresponding second compression algorithm for compressing the new character by the LZO compression method according to the anaphoric distance; the first compression format is applicable and The anaphora distance of the new character to be compressed is smaller than the anaphora distance of the new character to be compressed to which the second compression format is applicable;

(2)记录进行LZO压缩的新字符长度,根据新字符长度、回指距离选择的压缩格式及对应的压缩算法进行LZO压缩。(2) Record the new character length for LZO compression, and perform LZO compression according to the new character length, the compression format selected by the anaphora distance and the corresponding compression algorithm.

所述的第一压缩格式适用且进行压缩的新字符的回指距离不大于1k,第二压缩格式适用且进行压缩的新字符的回指距离大于1k小于等于16k。The first compression format is applicable and the anaphora distance of the new characters to be compressed is not greater than 1k, and the second compression format is applicable and the anaphora distance of the new characters to be compressed is greater than 1k and less than or equal to 16k.

所述的第一压缩格式的第一字节t(0)为8bit,最高位为1,t(0)的6、5、4bit记录重复长度的值,3≤重复长度≤8,3、2bit记录回指距离的低2位,1、0bit记录下次输入为新字符且新字符长度≤3时新字符长度的值,最后一字节u记录回指距离后8位。The first byte t(0) of the first compression format is 8 bits, the highest bit is 1, and 6, 5, and 4 bits of t(0) record the value of the repetition length, 3≤repetition length≤8, 3, 2 bits Record the lower 2 bits of the anaphora distance, 1 and 0 bits record the value of the new character length when the next input is a new character and the new character length is less than or equal to 3, and the last byte u records the last 8 bits of the anaphora distance.

所述的第一压缩格式对应的第一压缩算法包括如下步骤:The first compression algorithm corresponding to the first compression format includes the following steps:

步骤1、判断待压缩数据的回指距离,如果待压缩数据的回指距离≤1k且重复长度M_len≥3,转入步骤2,如果待压缩数据的重复长度<3,则不进行压缩;Step 1. Determine the anaphora distance of the data to be compressed. If the anaphora distance of the data to be compressed is less than or equal to 1k and the repetition length M_len ≥ 3, go to step 2. If the repetition length of the data to be compressed is less than 3, no compression is performed;

步骤2、当3≤重复长度M_len≤8时,将M_len-1的值记录在t(0)的6、5、4bit中;Step 2. When 3≤repetition length M_len≤8, record the value of M_len-1 in bits 6, 5, and 4 of t(0);

当8<M_len≤263时,将t(0)的6、5、4bit分别记为0,然后将M_len-8的值记录在第二字节t(1)中;When 8<M_len≤263, record bits 6, 5, and 4 of t(0) as 0, respectively, and then record the value of M_len-8 in the second byte t(1);

当263<M_len≤n*255+263时,n≥1,将t(0)的6、5、4bit分别记为0,将第2字节t(1)至第(n+1)字节t(n)的8bit分别记为0,将M_len-8-n*255的值记录在第(n+2)字节t(n+1)中;When 263<M_len≤n*255+263, n≥1, the 6, 5, and 4 bits of t(0) are recorded as 0 respectively, and the second byte t(1) to the (n+1)th byte The 8 bits of t(n) are respectively recorded as 0, and the value of M_len-8-n*255 is recorded in the (n+2)th byte t(n+1);

步骤3、如果下次输入数据为新字符且新字符长度≤3,则将新字符长度的值记录于t(0)的第1、0bit,如果新字符长度>3时,则将t(0)的最后两bit记为0。Step 3. If the next input data is a new character and the new character length is less than or equal to 3, the value of the new character length is recorded in the 1st and 0th bits of t(0). If the new character length is greater than 3, t(0 ), the last two bits are recorded as 0.

所述的第二压缩格式第一字节t(0)的高两位分别置为0、1,t(0)的5、4、3、2、1、0bit记录重复长度的值,4≤重复长度≤65,倒数第二字节u的7、6、5、4、3、2bit分别记录回指距离的低6位,u的1、0bit记录下次输入数据为新字符且新字符长度≤3时新字符长度的值,最后一字节v记录回指距离的后8位。The upper two bits of the first byte t(0) of the second compression format are set to 0, 1, respectively, and bits 5, 4, 3, 2, 1, and 0 of t(0) record the value of the repetition length, 4≤ Repeat length ≤ 65, bits 7, 6, 5, 4, 3, and 2 of the penultimate byte u respectively record the lower 6 bits of the anaphora distance, and bits 1 and 0 of u record that the next input data is a new character and the new character length When ≤3, the value of the new character length, the last byte v records the last 8 bits of the anaphoric distance.

所述的第二压缩格式对应的第二压缩算法包括如下步骤:The second compression algorithm corresponding to the second compression format includes the following steps:

步骤1、判断待压缩数据的回指距离,当1k<待压缩数据的回指距离≤16k且重复长度M_len≥4时,转入步骤2,当重复长度M_len<4时,不对数据进行压缩;Step 1. Determine the anaphora distance of the data to be compressed, when 1k<the anaphora distance of the data to be compressed≤16k and the repetition length M_len≥4, go to step 2, when the repetition length M_len<4, do not compress the data;

步骤2、当4≤M_len≤65时,将M_len-2的值记录在t(0)的5、4、3、2、1、0bit中;Step 2. When 4≤M_len≤65, record the value of M_len-2 in bits 5, 4, 3, 2, 1, and 0 of t(0);

当65<M_len≤320时,将t(0)的5、4、3、2、1、0bit分别置为0,将M_len-65的值记录在第二字节t(1)中;When 65<M_len≤320, set bits 5, 4, 3, 2, 1, and 0 of t(0) to 0, respectively, and record the value of M_len-65 in the second byte t(1);

当320<M_len≤n*255+320,t(0)的5、4、3、2、1bit位的值分别为0,将第2字节t(1)至第n+1字节t(n)的8bit分别记为0,将M_len-65-n*255的值记录于第n+2字节t(n+1)中。When 320<M_len≤n*255+320, the values of bits 5, 4, 3, 2, and 1 of t(0) are 0, respectively, and the second byte t(1) to the n+1th byte t( The 8 bits of n) are respectively recorded as 0, and the value of M_len-65-n*255 is recorded in the n+2th byte t(n+1).

步骤3、如果下次输入数据为新字符且新字符长度≤3,则将新字符长度的值记录于u的第1、0bit,如果新字符长度>3时,则将u的最后两bit记为0。Step 3. If the next input data is a new character and the new character length is less than or equal to 3, record the value of the new character length in the 1st and 0th bits of u. If the new character length is greater than 3, record the last two bits of u. is 0.

本发明与现有技术相比的优点在于:The advantages of the present invention compared with the prior art are:

(1)本发明以LZO算法为基础,对比特文件进行统计分析,基于统计结果对LZO算法进行改进,在保留哈希运算的前提下提出一套新的压缩格式,该压缩格式种类划分少,判断方式简单,在几乎不降低压缩率的前提下减小回指距离,压缩与解压缩速度均有较大提高;(1) the present invention is based on the LZO algorithm, carries out statistical analysis to the bit file, improves the LZO algorithm based on the statistical results, and proposes a set of new compression formats under the premise of retaining the hash operation, and the compression format types are less divided, The judgment method is simple, and the anaphora distance is reduced without reducing the compression rate, and the compression and decompression speeds are greatly improved;

(2)本发明通过减少压缩格式种类和回指距离,使得解压缩电路硬件开销(除IO)仅为LZO算法的35%,SRAM刷新频率为传统LZO算法的3倍,工作时钟频率提高,解压缩译码时钟周期减少,更便于硬件实现和宇航应用,具有很好的使用价值。(2) The present invention reduces the type of compression format and the anaphora distance, so that the hardware overhead of the decompression circuit (except IO) is only 35% of the LZO algorithm, the SRAM refresh frequency is 3 times that of the traditional LZO algorithm, the working clock frequency is increased, and the solution The compression decoding clock cycle is reduced, which is more convenient for hardware implementation and aerospace applications, and has good use value.

附图说明Description of drawings

图1为现有的LZO压缩算法流程图;Fig. 1 is the flow chart of the existing LZO compression algorithm;

图2为现有的LZO压缩算法对应的5种LZO压缩格式;Fig. 2 is 5 kinds of LZO compression formats corresponding to the existing LZO compression algorithm;

图3为本发明方法设计的压缩格式1;Fig. 3 is the compression format 1 designed by the method of the present invention;

图4为压缩格式1的运算流程;Fig. 4 is the operation flow of compression format 1;

图5为压缩格式格式2;Figure 5 is the compression format format 2;

图6为压缩格式2的运算流程;Fig. 6 is the operation flow of compression format 2;

图7为本发明算法的新字符编码格式;Fig. 7 is the new character encoding format of the algorithm of the present invention;

图8为新字符编码格式的运算流程;Fig. 8 is the operation flow of the new character encoding format;

图9为本发明方法对应的压缩算法设计流程图;Fig. 9 is the compression algorithm design flow chart corresponding to the method of the present invention;

图10为本发明方法对应的解压缩算法设计流程图;Fig. 10 is the decompression algorithm design flow chart corresponding to the method of the present invention;

图11为解压缩电路框图;11 is a block diagram of a decompression circuit;

图12为本发明快速低资源开销的比特文件无损压缩方法流程图。FIG. 12 is a flow chart of the method for fast and low resource overhead lossless compression of bit files according to the present invention.

具体实施方式Detailed ways

为了提升LZO压缩算法的压缩与解压缩速度,减少解压缩电路的硬件开销,提高电路抗单粒子翻转能力,本发明针对现有技术的不足,提出一种快速低资源开销的改进LZO压缩方法,本发明方法沿用传统LZO算法的哈希运算,并对压缩格式进行了改进,改进后的压缩算法压缩格式包括2种,只由回指距离判定区分;最大回指距离减小为16k,通过上述改进本发明解压缩电路硬件开销(除IO)仅为传统LZO算法的35%,RAM刷新频率为传统LZO算法的3倍,压缩速度提升18%,解压缩速度提升37%,压缩率减小0.3%以内。In order to improve the compression and decompression speed of the LZO compression algorithm, reduce the hardware overhead of the decompression circuit, and improve the circuit's ability to resist single-event flipping, the present invention aims at the shortcomings of the prior art, and proposes an improved LZO compression method with fast and low resource overhead, The method of the invention follows the hash operation of the traditional LZO algorithm, and improves the compression format. The improved compression algorithm includes two compression formats, which are only distinguished by the anaphora distance; the maximum anaphora distance is reduced to 16k. The hardware overhead (excluding IO) of the improved decompression circuit of the present invention is only 35% of the traditional LZO algorithm, the RAM refresh frequency is 3 times that of the traditional LZO algorithm, the compression speed is increased by 18%, the decompression speed is increased by 37%, and the compression rate is reduced by 0.3 % or less.

如图12所示为本发明一种快速低资源开销的改进LZO压缩方法的设计流程包括如下步骤:As shown in FIG. 12, the design flow of a fast and low resource overhead improved LZO compression method of the present invention includes the following steps:

(1)研究无损压缩技术,选择合适的压缩算法作为基础。了解目前的无损压缩算法的种类以及它们的优缺点。字典编码算法从理论上克服了其他算法可能使压缩率反弹的问题,带来较高的压缩率,本发明以字典编码算法为基础进行研究。(1) Research the lossless compression technology and choose the appropriate compression algorithm as the basis. Learn about the types of lossless compression algorithms available today and their advantages and disadvantages. The dictionary encoding algorithm theoretically overcomes the problem that other algorithms may rebound the compression rate, and brings about a higher compression rate. The present invention is based on the dictionary encoding algorithm for research.

(2)研究LZO压缩算法,分析其优缺点。传统的字典编码由于要与字典内的字符挨个比对,使得字典编码的压缩速度非常缓慢。LZO算法基于传统字典压缩算法,并使用哈希表来记录滑动窗口内的字典数据,使其搜索的复杂度一直为常数,在压缩速度上有很大提升。但LZO算法有5种不同的压缩格式,且最大回指距离为48k,这些都限制了LZO算法压缩与解压缩的速度,在解压缩电路实现时带来了较大的硬件开销,不利于宇航应用。(2) Study the LZO compression algorithm and analyze its advantages and disadvantages. The traditional dictionary encoding has to compare with the characters in the dictionary one by one, so the compression speed of dictionary encoding is very slow. The LZO algorithm is based on the traditional dictionary compression algorithm, and uses a hash table to record the dictionary data in the sliding window, so that the search complexity is always constant, and the compression speed is greatly improved. However, the LZO algorithm has 5 different compression formats, and the maximum anaphora distance is 48k, which limits the compression and decompression speed of the LZO algorithm, and brings a large hardware overhead when the decompression circuit is implemented, which is not conducive to aerospace. application.

(3)选取大量实际工作中的比特文件作为样本,统计比特文件中数据的回指距离的规律。(3) Select a large number of bit files in actual work as samples, and count the law of the anaphoric distance of the data in the bit files.

(4)对LZO五种压缩格式和比特文件统计结果的分析,精细化的设计压缩格式,在提高算法速度以及理论上降低资源消耗的同时保证压缩率。(4) Analysis of the five LZO compression formats and the statistical results of the bit file, and the refined design of the compression format ensures the compression rate while improving the algorithm speed and theoretically reducing resource consumption.

(5)设计本发明算法的压缩软件,验证算法的压缩率和压缩速度。(5) Design the compression software of the algorithm of the present invention, and verify the compression ratio and compression speed of the algorithm.

(6)设计本发明算法的解压缩软件,结合压缩软件验证算法的正确性以及解压缩速度。(6) Design the decompression software of the algorithm of the present invention, and verify the correctness of the algorithm and the decompression speed in combination with the compression software.

(7)设计算法的硬件电路和FPGA实现,验证解压缩算法的硬件实现和资源利用率。(7) Design the hardware circuit and FPGA implementation of the algorithm, and verify the hardware implementation and resource utilization of the decompression algorithm.

本发明方法的具体设计方法包括如下步骤:The specific design method of the method of the present invention comprises the following steps:

(1)分析LZO算法的5种压缩格式以及比特文件数据特点,设计压缩格式优化方案;(1) Analyze the five compression formats of the LZO algorithm and the characteristics of the bit file data, and design an optimization scheme for the compression format;

格式2和格式4,格式3和格式5的首字符格式编码方式相同,所不同的是在判断重复长度上。因此,本发明方法合并压缩格式2与格式4,格式3与格式5,从而使的压缩与解压缩更加方便,减小解压缩硬件开销与提高压缩和解压缩速度。统计回指距离,超过50%的比特文件的回指距离小于1k,而这种小于1k回指距离根据重复长度的不同可以按格式1、格式4进行压缩,第一种压缩格式的压缩结果为2字节,而第4种压缩格式压缩结果最少3字节。因此尽量多的数据按第一种压缩格式进行压缩对提高压缩率是非常有效的。Format 2 and Format 4, Format 3 and Format 5 have the same encoding method of the first character format, the difference is in judging the repetition length. Therefore, the method of the present invention combines the compression format 2 and the format 4, and the format 3 and the format 5, thereby making the compression and decompression more convenient, reducing the decompression hardware overhead and improving the compression and decompression speed. Statistical anaphora distance, more than 50% of the bit files have anaphora distance less than 1k, and this anaphora distance less than 1k can be compressed according to format 1 and format 4 according to different repetition lengths. The compression result of the first compression format is 2 bytes, while the 4th compression format compresses the result at least 3 bytes. Therefore, it is very effective to compress as much data as possible according to the first compression format to improve the compression rate.

最大48k的回指距离,使得解压缩时至少需要48k深度的RAM存储解压数据才能保证压缩数据被正确解压。为了提升压缩与解压缩速度,同时减少解压缩电路的硬件开销,提高电路抗单粒子翻转能力,本发明从减小压缩格式的回指距离入手。由于哈希表的深度为16k,在数据压缩时最多读入16k数据就会更新哈希表,同时超过95%的比特文件回指距离小于16k,因此本发明将回指距离也缩小为16k并不会带来过多的压缩率损耗,即可删除压缩格式3和5。The maximum anaphora distance is 48k, so that at least 48k deep RAM is required to store the decompressed data during decompression to ensure that the compressed data is correctly decompressed. In order to improve the compression and decompression speed, reduce the hardware overhead of the decompression circuit, and improve the circuit's anti-single event flipping capability, the present invention starts from reducing the anaphora distance of the compression format. Since the depth of the hash table is 16k, the hash table will be updated when a maximum of 16k data is read during data compression. At the same time, more than 95% of the bit files have an anaphoric distance of less than 16k. Therefore, the present invention also reduces the anaphoric distance to 16k. Compression formats 3 and 5 can be removed without excessive compression loss.

结合以上分析本发明方法得到一种简化的压缩格式编码,即压缩格式只由回指距离判定,压缩格式只有2种,回指距离小于等于16k。Combined with the above analysis, the method of the present invention obtains a simplified compression format encoding, that is, the compression format is determined only by the anaphora distance, there are only two compression formats, and the anaphora distance is less than or equal to 16k.

(2)结合上述分析,设计快速低资源开销的压缩算法编码格式(2) Combined with the above analysis, design a fast and low resource overhead compression algorithm encoding format

如图3所示为本发明方法的两种压缩格式的第一种,即回指距离小于等于1k的情况,该压缩格式的第一字节t(0)的最高位(第7bit位)的值为1,解压缩时判定方法为首字符大于等于128;t(0)的6、5、4bit记录重复长度的值,3≤重复长度(M_len)≤8;t(0)的3、2bit位用于记录回指距离的低2位的值;t(0)的1、0bit位用于记录下一次输入的数据为新字符(N_len),且N_len≤3时N_len的值;最后一字节u用于记录回指距离后8位;t(0)与u之间的字节数≥0,由重复长度决定。详细压缩运算流程如图4所示。Figure 3 shows the first of the two compression formats of the method of the present invention, that is, when the anaphoric distance is less than or equal to 1k, the highest bit (7th bit) of the first byte t(0) of the compression format is The value is 1, the first character is greater than or equal to 128 when decompressing; the 6, 5, and 4 bits of t(0) record the value of the repetition length, 3≤the repetition length (M_len)≤8; the 3 and 2 bits of t(0) Used to record the value of the lower 2 bits of the anaphora distance; bits 1 and 0 of t(0) are used to record the value of N_len when the next input data is a new character (N_len), and N_len≤3; the last byte u is used to record the last 8 bits of the anaphora distance; the number of bytes between t(0) and u is ≥ 0, which is determined by the repetition length. The detailed compression operation flow is shown in Figure 4.

如图4所示,t(0)为压缩格式第一字节,M_len为重复长度(压缩字符长度),u为压缩格式最后一字节式,N_len为下一次输入的新字节长度,n为t(0)与u之间0字节的个数。压缩格式1的回指距离为10位,其回指距离的范围小于等于1k,回指距离减1后记录于该压缩格式的offset中。重复长度不受限制,但M_len小于3时不对数据压缩,当M_len大于等于3,小于等于8时,M_len减1记录于第一个byte中的length的3bit中,该压缩格式的压缩结果为2byte;当M_len大于8时,第一个byte中的length的3bit全为0,第二个byte为M_len减去8;如果M_len减去8后还大于255,那么第二byte为全零,第三byte为M_len减8再减255……依次类推。第一字节的最后两bit用于存储新字符长度信息,当下一次输入的数据为新字符,且N_len小于等于3时,第一字节t(0)的最后两bit记录该长度,当N_len大于3时,t(0)的最后两bit为00。压缩格式判定方法为首字符大于128。As shown in Figure 4, t(0) is the first byte of the compressed format, M_len is the repetition length (compressed character length), u is the last byte of the compressed format, N_len is the new byte length of the next input, n is the number of 0 bytes between t(0) and u. The anaphora distance of compression format 1 is 10 bits, the range of the anaphora distance is less than or equal to 1k, and the anaphora distance is reduced by 1 and recorded in the offset of the compression format. The repetition length is not limited, but when M_len is less than 3, the data will not be compressed. When M_len is greater than or equal to 3 and less than or equal to 8, M_len minus 1 is recorded in 3 bits of length in the first byte, and the compression result of this compression format is 2 bytes. ; When M_len is greater than 8, the 3 bits of length in the first byte are all 0, and the second byte is M_len minus 8; if M_len minus 8 is still greater than 255, then the second byte is all zero, and the third byte is all zero. byte is M_len minus 8 and then minus 255... and so on. The last two bits of the first byte are used to store the new character length information. When the next input data is a new character and N_len is less than or equal to 3, the last two bits of the first byte t(0) record the length, when N_len When greater than 3, the last two bits of t(0) are 00. The compression format determination method is that the first character is greater than 128.

步骤1:回指距离(M_off)≤1k,且重复长度(M_len)≥3,该段重复字符按第一种压缩格式压缩;(M_len)<3不对该段字符压缩。Step 1: If the anaphora distance (M_off) is less than or equal to 1k, and the repetition length (M_len) is greater than or equal to 3, the repeated characters of this segment are compressed according to the first compression format; (M_len)<3, the segment of characters is not compressed.

M_off-1的值的前2位记录于第一字节t(0)的3、2bit位中,后8位记录于该压缩格式的最后一字节u中。The first 2 bits of the value of M_off-1 are recorded in bits 3 and 2 of the first byte t(0), and the last 8 bits are recorded in the last byte u of the compressed format.

步骤2:当3≤M_len≤8,(M_len-1)的值记录于t(0)的6、5、4bit位中;Step 2: When 3≤M_len≤8, the value of (M_len-1) is recorded in bits 6, 5, and 4 of t(0);

当8<M_len≤263,t(0)的6、5、4bit位的值分别为0,将M_len-8的值记录于第二字节t(1)中;When 8<M_len≤263, the values of bits 6, 5, and 4 of t(0) are respectively 0, and the value of M_len-8 is recorded in the second byte t(1);

当263<M_len≤n*255+263(n≥1),t(0)的6、5、4bit位的值分别为0,第二字节t(1)的8个bit位的值全为0……第(n+1)字节t(n)的8个bit位的值全为0,将M_len-8-n*255的值记录于第(n+2)字节t(n+1)中。When 263<M_len≤n*255+263(n≥1), the values of bits 6, 5, and 4 of t(0) are 0 respectively, and the values of 8 bits of the second byte t(1) are all 0... The values of the 8 bits of the (n+1)th byte t(n) are all 0, and the value of M_len-8-n*255 is recorded in the (n+2)th byte t(n+ 1) in.

步骤3:下一次输入的数据为新字符(N_len),且N_len≤3时,N_len的值记录于t(0)的第1、0bit位中;当N_len>3时,t(0)的最后两bit为00。重复字符按第一种压缩格式压缩完成。Step 3: The next input data is a new character (N_len), and when N_len≤3, the value of N_len is recorded in the 1st and 0th bits of t(0); when N_len>3, the last of t(0) Two bits are 00. Repeated characters are compressed according to the first compression format.

如图5所示为本发明算法的两种压缩格式的第二种,即回指距大于1k小于等于16k的情况,该压缩格式的第一字节t(0)的最高位和倒数第二位(第7bit位和第6bit位)的值为0和1,解压缩时判定方法为首字符大于等于64;t(0)的5、4、3、2、1、0bit记录重复长度的值,4≤重复长度(M_len)≤65;倒数第二字节u的7、6、5、4、3、2bit位用于记录回指距离的低6位的值;u的1、0bit位用于记录下一次输入的数据为新字符(N_len),且N_len≤3时N_len的值;最后一字节v用于记录回指距离后8位;t(0)与u之间的字节数≥0,由重复长度决定。详细压缩运算流程如图6所示。Figure 5 shows the second of the two compression formats of the algorithm of the present invention, that is, when the back index distance is greater than 1k and less than or equal to 16k, the highest bit and the penultimate second of the first byte t(0) of the compression format The values of the bits (the 7th bit and the 6th bit) are 0 and 1. When decompressing, the determination method is that the first character is greater than or equal to 64; bits 5, 4, 3, 2, 1, and 0 of t(0) record the value of the repetition length, 4≤repetition length (M_len)≤65; bits 7, 6, 5, 4, 3, and 2 of the penultimate byte u are used to record the value of the lower 6 bits of the anaphora distance; bits 1 and 0 of u are used for Record the value of N_len when the next input data is a new character (N_len) and N_len≤3; the last byte v is used to record the last 8 bits of the anaphora distance; the number of bytes between t(0) and u≥ 0, determined by the repeat length. The detailed compression operation flow is shown in Figure 6.

如图6所示,t(0)为压缩格式第一字节,M_len为重复长度,u为压缩格式倒数第二字节,v为第二种压缩格式的最后一字节,N_len为下一次输入的新字节长度,n为t(0)与u之间0byte的个数。第2种压缩格式的回指距离为14位,其回指距离的范围小于等于16k,减1后记录于最后两个byte的offset中。重复长度不受限制,但M_len小于4时不对数据压缩,当M_len大于等于4,小于等于65时,重复长度减2记录于第一个byte中的length的6bit中;当M_len大于65时,第一个byte中的length的6bit全为零,第二个bit为M_len减去65;如果M_len减去65后还大于255,那么第二byte为全零,第三byte为M_len减65再减255……依次类推。当M_len小于65时,压缩字符为3byte。判定方法为首字符大于等于64。u的最后两bit用于存储新字符长度信息,当下一次输入的数据为新字符,且N_len小于等于3时,u的最后两bit记录该长度,当N_len大于3时,u的最后两bit为00。As shown in Figure 6, t(0) is the first byte of the compressed format, M_len is the repetition length, u is the penultimate byte of the compressed format, v is the last byte of the second compressed format, and N_len is the next The new byte length of the input, n is the number of 0bytes between t(0) and u. The anaphora distance of the second compression format is 14 bits, and the range of the anaphora distance is less than or equal to 16k, which is recorded in the offset of the last two bytes after subtracting 1. The repetition length is not limited, but when M_len is less than 4, the data will not be compressed. When M_len is greater than or equal to 4 and less than or equal to 65, the repetition length minus 2 is recorded in 6 bits of length in the first byte; when M_len is greater than 65, the first The 6 bits of length in a byte are all zero, and the second bit is M_len minus 65; if M_len minus 65 is greater than 255, then the second byte is all zero, and the third byte is M_len minus 65 and then minus 255 ……And so on. When M_len is less than 65, the compressed characters are 3bytes. The determination method is that the first character is greater than or equal to 64. The last two bits of u are used to store new character length information. When the next input data is a new character and N_len is less than or equal to 3, the last two bits of u record the length. When N_len is greater than 3, the last two bits of u are 00.

步骤1:回指距离1k<(M_off)≤16k,且重复长度(M_len)≥4,该段重复字符按第二种压缩格式压缩;(M_len)<4不对该段字符压缩。Step 1: If the anaphora distance is 1k<(M_off)≤16k, and the repetition length (M_len)≥4, this segment of repeated characters is compressed according to the second compression format; (M_len)<4 does not compress this segment of characters.

M_off-1的值的前6位记录于u的7、6、5、4、3、2bit位中,后8位记录于该压缩格式的最后一字节v中。The first 6 bits of the value of M_off-1 are recorded in the 7, 6, 5, 4, 3, and 2 bits of u, and the last 8 bits are recorded in the last byte v of the compressed format.

步骤2:当4≤M_len≤65,M_len-2的值记录于t(0)的5、4、3、2、1、0bit位中;Step 2: When 4≤M_len≤65, the value of M_len-2 is recorded in bits 5, 4, 3, 2, 1, and 0 of t(0);

当65<M_len≤320,t(0)的5、4、3、2、1、0bit位的值分别为0,将M_len-65的值记录于第二字节t(1)中;When 65<M_len≤320, the values of bits 5, 4, 3, 2, 1, and 0 of t(0) are respectively 0, and the value of M_len-65 is recorded in the second byte t(1);

当320<M_len≤n*255+320(n≥1),t(0)的5、4、3、2、1bit位的值分别为0,t(1)的8个bit位的值全为0……第(n+1)字节t(n)的8个bit位的值全为0,将M_len-65-n*255的值记录于第(n+2)字节t(n+1)中。When 320<M_len≤n*255+320(n≥1), the values of bits 5, 4, 3, 2, and 1 of t(0) are 0, respectively, and the values of 8 bits of t(1) are all 0... The values of the 8 bits of the (n+1)th byte t(n) are all 0, and the value of M_len-65-n*255 is recorded in the (n+2)th byte t(n+ 1) in.

步骤3:下一次输入的数据为新字符(N_len),且N_len≤3时,N_len的值记录于u的第1、0bit位中;当N_len>3时,u的最后两bit为00。重复字符按第二种压缩格式压缩完成。Step 3: The next input data is a new character (N_len), and when N_len≤3, the value of N_len is recorded in the 1st and 0th bits of u; when N_len>3, the last two bits of u are 00. Repeated characters are compressed in the second compression format.

如图7为本发明算法的新字符格式,首先记录新字符长度,新字符长度占用的字节数由新字符长度决定,完成新字符长度的记录后,依次记录输入的新字符。详细压缩运算流程如图8所示。Figure 7 is the new character format of the algorithm of the present invention, first record the new character length, the number of bytes occupied by the new character length is determined by the new character length, after completing the recording of the new character length, record the new characters input in turn. The detailed compression operation flow is shown in Figure 8.

如图8所示,n_len为新字符长度。对于新字符串,本发明首先计算共有多少个新字符,当n_len小于4个的时候若回指距离大于1k会用的上面压缩格式2的倒数第二个字节的最后两个比特位记录,若回指距离小于等于1k,会用上面压缩格式1的第一字节的最后两个比特位记录。当n_len为4~66的时候,先把length减3,记录到第一个byte中;如果新字符长度大于66,记录0于第一byte,再记录新字符长度减去66的差值于第二个byte中;如减去66后仍然大于255,则在第一个0后再记录一个0,同时把差值再减去255得到新的差值,小于255,记录于第三byte中,大于255则记录一个0后继续减去255与255比较,直到最后的差值小于255时,记下最后的差值,记录完新字符的个数后,开始记录新字符。判定方法为首字符小于64。As shown in Figure 8, n_len is the new character length. For a new character string, the present invention first calculates how many new characters there are. When n_len is less than 4, if the anaphoric distance is greater than 1k, the last two bit records of the penultimate byte of the above compression format 2 will be used, If the anaphoric distance is less than or equal to 1k, it will be recorded with the last two bits of the first byte of the above compression format 1. When n_len is 4 to 66, first subtract 3 from length and record it in the first byte; if the length of the new character is greater than 66, record 0 in the first byte, and then record the difference between the length of the new character minus 66 in the first byte In two bytes; if it is still greater than 255 after subtracting 66, then record a 0 after the first 0, and subtract 255 from the difference to get a new difference. If it is less than 255, record it in the third byte. If it is greater than 255, record a 0 and then continue to subtract 255 and compare with 255. When the last difference is less than 255, record the last difference. After recording the number of new characters, start recording new characters. The determination method is that the first character is less than 64.

步骤1:判断输入字符串的字符类型,是否为新字符,是新字符计算新字符长度(n_len)。Step 1: Determine whether the character type of the input string is a new character, and calculate the new character length (n_len) if it is a new character.

步骤2:根据n_len的大小,选择合适的格式,记录n_len值。Step 2: According to the size of n_len, select an appropriate format and record the value of n_len.

当n_len≤3时,记录n_len于前一次的压缩格式内;When n_len≤3, record n_len in the previous compression format;

当3<n_len≤66时,纪录(n_len-3)于新字符格式的第一字节n0中;When 3<n_len≤66, record (n_len-3) in the first byte n0 of the new character format;

当66<n_len≤321,n0=00000000,第二字节n1=n_len-66;When 66<n_len≤321, n0=00000000, the second byte n1=n_len-66;

当321<n_len≤321+n*255(n≥1),n0=00000000,n1=00000000……nn=00000000,n(n+1)=n_len-66-n*255。When 321<n_len≤321+n*255(n≥1), n0=00000000, n1=00000000... nn=00000000, n(n+1)=n_len-66-n*255.

步骤3:输出对应长度的新字符。Step 3: Output new characters of corresponding length.

压缩后的比特文件是新字符与压缩字符交替出现,紧凑化判断条件,可提高比特文件压缩率。In the compressed bit file, new characters and compressed characters appear alternately, and the compaction judgment condition can improve the compression rate of the bit file.

(3)压缩和解压缩算法的整体设计,并验证算法的正确性、压缩率和速率:(3) The overall design of the compression and decompression algorithm, and verify the correctness, compression rate and rate of the algorithm:

本发明算法的压缩部分算法整体设计如图9所示:The overall design of the compression part of the algorithm of the present invention is shown in Figure 9:

步骤1:读入比特文件长度,用于判断数据是否结束。Step 1: Read the length of the bit file to determine whether the data is over.

步骤2:比特文件的前4byte不做任何处理,继续读入数据,记录索引于哈希表。Step 2: The first 4 bytes of the bit file do not do any processing, continue to read the data, and record the index in the hash table.

步骤3:字符类型判断。Step 3: Character type judgment.

接下来4byte数据做哈希运算,按所计算的哈希值从哈希表中取出数据索引,通过索引找到对应字符与做哈希运算的字符对比,若相同,则说明数据在哈希表内,不是新字符,更新哈希表中该字符索引,继续读入数据(每次读入1byte),重复以上操作,直到读入数据不在哈希表内为止,若读入重复长度小于3或重复长度小于4且回指距离大于1k按新字符处理,跳入步骤4;否则跳入步骤6。若不同,做第二次哈希运算并通过哈希地址中存储的索引值取值比较,若相同,继续读入数据,重复以上操作,直到读入数据不在哈希表内为止,若读入重复长度小于3或重复长度小于4且回指距离大于1k按新字符处理,跳入步骤4;否则跳入步骤6。The next 4 bytes of data are hashed, and the data index is taken from the hash table according to the calculated hash value, and the corresponding character is found through the index and compared with the character for the hash operation. If they are the same, it means that the data is in the hash table. , not a new character, update the character index in the hash table, continue to read data (1 byte each time), repeat the above operations until the read data is not in the hash table, if the read repetition length is less than 3 or repeated If the length is less than 4 and the anaphora distance is greater than 1k, it will be treated as a new character, and skip to step 4; otherwise, skip to step 6. If they are different, do the second hash operation and compare the index value stored in the hash address. If they are the same, continue to read the data and repeat the above operations until the read data is not in the hash table. If the repeat length is less than 3 or the repeat length is less than 4 and the anaphora distance is greater than 1k, it will be treated as a new character, and skip to step 4; otherwise, skip to step 6.

步骤4:记录新字符个数,按新字符长度格式进行编码Step 4: Record the number of new characters and encode according to the new character length format

步骤5:记录新字符,完成新字符个数的记录后,在其后记录新字符。继续读入数据。Step 5: Record the new characters, after completing the recording of the number of new characters, record the new characters thereafter. Continue reading in data.

步骤6:计算重复字符长度与回指距离,并压缩数据。回指距离小于等于1k用第1种压缩格式压缩数据,跳入步骤7;否则用第二种压缩格式压缩数据跳入步骤8。Step 6: Calculate the length of repeated characters and the anaphora distance, and compress the data. If the anaphora distance is less than or equal to 1k, use the first compression format to compress the data, and skip to step 7; otherwise, use the second compression format to compress the data and skip to step 8.

步骤7:第1种压缩格式编码Step 7: 1st Compression Format Encoding

步骤8:第2种压缩格式编码Step 8: 2nd Compression Format Encoding

步骤9:判断压缩是否结束,结束则输出数据和数据长度,否则继续读入数据并重复以上操作。Step 9: Determine whether the compression is over, output the data and the data length when it is over, or continue to read in the data and repeat the above operations.

本发明算法的解压缩部分算法整体设计如图10所示:The overall design of the decompression part of the algorithm of the present invention is shown in Figure 10:

步骤1:读入比特文件长度,用于判断数据是否结束。Step 1: Read the length of the bit file to determine whether the data is over.

步骤2:首字符判断。Step 2: Judgment of the first character.

若首字符0≤t(0)<64,则判断为新字符,跳入步骤3。否则判断为重复字符,跳入步骤5。If the first character 0≤t(0)<64, it is judged as a new character, and jumps to step 3. Otherwise, it is judged as a repeated character, and jumps to step 5.

步骤3:计算新字符长度Step 3: Calculate the new character length

若0<t(0)<64时,新字符长度等于t(0)+3(t(0)到t(n)记录新字符长度信息);If 0<t(0)<64, the new character length is equal to t(0)+3 (t(0) to t(n) record the new character length information);

若t(0)=0,当t(1)不等于0时,新字符长度n_len=66+t(1);If t(0)=0, when t(1) is not equal to 0, the new character length n_len=66+t(1);

若t(0)=0,当t(1)=0时,继续读入后面的数,直到t(n)≠0为止,n_len=(n-1)*255+66+t(n)。If t(0)=0, when t(1)=0, continue to read the following numbers until t(n)≠0, n_len=(n-1)*255+66+t(n).

步骤4:记录新字符Step 4: Record the new character

根据新字符的个数记录新字符Record new characters according to the number of new characters

步骤5:解压压缩数据Step 5: Decompress the compressed data

若t≥128时,该数据是按压缩格式1进行压缩的,按照对应的压缩格式计算重复长度和回指距离,跳入步骤6。If t≥128, the data is compressed according to the compression format 1, and the repetition length and the anaphora distance are calculated according to the corresponding compression format, and then jump to step 6.

若64≤t<128时,数据按压缩格式2进行压缩的,按照对应的压缩格式计算重复长度和回指距离,跳入步骤7。If 64≤t<128, and the data is compressed according to the compression format 2, calculate the repetition length and the anaphora distance according to the corresponding compression format, and skip to step 7.

步骤6:压缩格式1中新字符信息判定Step 6: Determination of new character information in compression format 1

若第一byte后两位不等于0时,其为新字符长度,解压数据完成后,跳入步骤4。否则,跳入步骤8。If the last two digits of the first byte are not equal to 0, it is the new character length. After decompressing the data, go to step 4. Otherwise, skip to step 8.

步骤7:压缩格式2中新字符信息判断Step 7: Judgment of new character information in compression format 2

若倒数第二byte后两位不等于0时,其为新字符长度,解压数据完成后,跳入步骤4。否则,跳入步骤8。If the last two digits of the second-to-last byte are not equal to 0, it is the new character length. After the data is decompressed, go to step 4. Otherwise, skip to step 8.

步骤8:判断压缩是否结束,结束则输出数据和数据长度,否则继续读入数据并重复以上操作。Step 8: Determine whether the compression is over, and output the data and data length when the compression is over, otherwise continue to read in the data and repeat the above operations.

压缩和解压缩软件使用C语言在Visual Studio 2008平台设计,对实际工作中产生的26组比特文件进行压缩和解压缩运算,压缩率均大于50%,平均压缩率为67.06%,LZO算法的平均压缩率67.21%,平均压缩率下降0.15%,26组比特文件压缩率较LZO压缩率下降均不超过0.3%;压缩速率提升18%,解压缩速率提升37%。The compression and decompression software is designed on the Visual Studio 2008 platform using C language, and performs compression and decompression operations on 26 sets of bit files generated in actual work. The compression rate is greater than 50%, and the average compression rate is 67.06%. 67.21%, the average compression rate decreased by 0.15%, the compression rate of 26 groups of bit files did not decrease by more than 0.3% compared with the LZO compression rate; the compression rate increased by 18%, and the decompression rate increased by 37%.

(5)本发明解压缩部分电路设计,验证功能、资源利用率及工作频率;(5) The present invention decompresses part of the circuit design, and verifies the function, resource utilization rate and operating frequency;

比特文件通过软件进行压缩,将压缩的结果上传给卫星,在卫星上实现解压缩功能。比特文件压缩在地面实现即可,无需FPGA,只需用本发明基于C语言开发的软件在电脑上实现即可。解压缩在卫星上实现,需设计FPGA电路。基本电路框图如图11所示:The bit file is compressed by software, the compressed result is uploaded to the satellite, and the decompression function is realized on the satellite. The bit file compression can be realized on the ground, without FPGA, and only needs to be realized on the computer with the software developed based on the C language of the present invention. The decompression is realized on the satellite, and the FPGA circuit needs to be designed. The basic circuit block diagram is shown in Figure 11:

电路包括新字符译码模块、压缩字符译码模块、1读1写SRAM16384x8、输出控制模块。当数据输入时,首先要对数据类型进行判定是新字符还是重复字符。是新字符的话,计算新字符长度,并直接输出新字符,同时这些新字符从地址0开始要依次写到SRAM中;是重复字符,判定是何种压缩格式,然后计算回指距离和重复长度,通过回指距离和重复长度从SRAM中取数进行解压缩并输出该数据。由于解压缩数据长度和新字符个数的不确定性,在电路输出部分设计一个输出数据有效的判定信号,该数据为高时输出结果有效。同样,通过设计反馈电路对输入信号进行控制,只有对某段压缩字符解压完成后才能继续输入字符。The circuit includes a new character decoding module, a compressed character decoding module, 1 read and 1 write SRAM16384x8, and an output control module. When data is entered, the first thing to do is to determine whether the data type is a new character or a repeated character. If it is a new character, calculate the length of the new character and output the new character directly. At the same time, these new characters should be written to the SRAM in sequence starting from address 0; if it is a repeated character, determine what compression format it is, and then calculate the anaphoric distance and repetition length. , decompress and output the data by fetching the number from the SRAM through the anaphora distance and repetition length. Due to the uncertainty of the length of the decompressed data and the number of new characters, a judgment signal for valid output data is designed in the output part of the circuit, and the output result is valid when the data is high. Similarly, by designing a feedback circuit to control the input signal, only after the decompression of a certain segment of compressed characters is completed, the characters can continue to be input.

由于不同压缩格式的字符长度和新字符长度以及压缩后数据不确定,各个模块内以及模块之间需要大量的控制逻辑进行判决,判决逻辑随压缩格式数量的增加成数倍增长,译码周期也因此增加,工作频率随之降低,因此压缩格式的减少对解压缩电路的硬件开销、时序开销有很好的优化。Due to the uncertainty of the character length, new character length and compressed data of different compression formats, a large amount of control logic is required in each module and between modules to make decisions. The decision logic increases several times with the increase of the number of compression formats, and the decoding cycle also increases Therefore, the operating frequency decreases with the increase, so the reduction of the compression format has a good optimization on the hardware overhead and timing overhead of the decompression circuit.

SRAM的访问时间随其深度增加而增加,本发明的SRAM深度只有LZO压缩算法SRAM深度的1/3,因此提高了电路最高工作频率。同时SRAM内的数据每16384个时钟周期刷新一次,刷新频率是LZO算法的SRAM刷新频率的三倍,降低了星载电路SEU的风险。The access time of the SRAM increases as its depth increases, and the SRAM depth of the present invention is only 1/3 of the SRAM depth of the LZO compression algorithm, thus increasing the maximum operating frequency of the circuit. At the same time, the data in the SRAM is refreshed every 16384 clock cycles, and the refresh frequency is three times that of the SRAM refresh frequency of the LZO algorithm, which reduces the risk of the onboard circuit SEU.

通过ISE9.2i平台选用xc4vsx55-12ff1148器件对本发明解压缩电路验证,本发明的触发器资源使用率为1%,LUT资源使用率为1.3%,slice的资源使用率为2%,RAM的资源使用率为2%,IO的资源使用率为3%,工作频率为138.1MHz。相较于LZO算法,本发明除了IO资源使用率相当以外,其他器件的资源利用率都在1/3左右,且时钟频率也有很大提升。The decompression circuit of the present invention is verified by selecting xc4vsx55-12ff1148 device on the ISE9.2i platform. The utilization rate of the trigger resource of the present invention is 1%, the LUT resource utilization rate is 1.3%, the slice resource utilization rate is 2%, and the RAM resource utilization rate is 2%. The rate is 2%, the resource usage of IO is 3%, and the operating frequency is 138.1MHz. Compared with the LZO algorithm, in addition to the IO resource utilization rate of the present invention is comparable, the resource utilization rate of other devices is about 1/3, and the clock frequency is also greatly improved.

本发明说明书中未作详细描述的内容属本领域技术人员的公知技术。The content not described in detail in the specification of the present invention belongs to the well-known technology of those skilled in the art.

Claims (4)

1.一种快速低资源开销的改进LZO压缩方法,其特征在于包括如下步骤:1. an improved LZO compression method of fast and low resource overhead, is characterized in that comprising the steps: (1)根据回指距离构建LZO压缩方法对新字符进行压缩的第一压缩格式及对应的第一压缩算法、第二压缩格式及对应的第二压缩算法;所述的第一压缩格式适用且进行压缩的新字符的回指距离小于第二压缩格式适用且进行压缩的新字符的回指距离;(1) Constructing the first compression format and the corresponding first compression algorithm, the second compression format and the corresponding second compression algorithm for compressing the new character by the LZO compression method according to the anaphoric distance; the first compression format is applicable and The anaphora distance of the new character to be compressed is smaller than the anaphora distance of the new character to be compressed to which the second compression format is applicable; (2)记录进行LZO压缩的新字符长度,根据新字符长度、回指距离选择的压缩格式及对应的压缩算法进行LZO压缩;(2) Record the new character length for LZO compression, and perform LZO compression according to the compression format selected by the new character length, the anaphora distance and the corresponding compression algorithm; 所述的第一压缩格式的第一字节t(0)为8bit,最高位为1,t(0)的6、5、4bit记录重复长度的值,3≤重复长度≤8,3、2bit记录回指距离的低2位,1、0bit记录下次输入为新字符且新字符长度≤3时新字符长度的值,最后一字节u记录回指距离后8位;The first byte t(0) of the first compression format is 8 bits, the highest bit is 1, and 6, 5, and 4 bits of t(0) record the value of the repetition length, 3≤repetition length≤8, 3, 2 bits Record the lower 2 bits of the anaphora distance, 1 and 0 bits record the value of the new character length when the next input is a new character and the new character length is less than or equal to 3, and the last byte u records the last 8 bits of the anaphora distance; 所述的第一压缩格式对应的第一压缩算法包括如下步骤:The first compression algorithm corresponding to the first compression format includes the following steps: 步骤1、判断待压缩数据的回指距离,如果待压缩数据的回指距离≤1k且重复长度M_len≥3,转入步骤2,如果待压缩数据的重复长度<3,则不进行压缩;Step 1. Determine the anaphora distance of the data to be compressed. If the anaphora distance of the data to be compressed is less than or equal to 1k and the repetition length M_len ≥ 3, go to step 2. If the repetition length of the data to be compressed is less than 3, no compression is performed; 步骤2、当3≤重复长度M_len≤8时,将M_len-1的值记录在t(0)的6、5、4bit中;Step 2. When 3≤repetition length M_len≤8, record the value of M_len-1 in bits 6, 5, and 4 of t(0); 当8<M_len≤263时,将t(0)的6、5、4bit分别记为0,然后将M_len-8的值记录在第二字节t(1)中;When 8<M_len≤263, record bits 6, 5, and 4 of t(0) as 0, respectively, and then record the value of M_len-8 in the second byte t(1); 当263<M_len≤n*255+263时,n≥1,将t(0)的6、5、4bit分别记为0,将第2字节t(1)至第(n+1)字节t(n)的8bit分别记为0,将M_len-8-n*255的值记录在第(n+2)字节t(n+1)中;When 263<M_len≤n*255+263, n≥1, the 6, 5, and 4 bits of t(0) are recorded as 0 respectively, and the second byte t(1) to the (n+1)th byte The 8 bits of t(n) are respectively recorded as 0, and the value of M_len-8-n*255 is recorded in the (n+2)th byte t(n+1); 步骤3、如果下次输入数据为新字符且新字符长度≤3,则将新字符长度的值记录于t(0)的第1、0bit,如果新字符长度>3时,则将t(0)的最后两bit记为0。Step 3. If the next input data is a new character and the new character length is less than or equal to 3, the value of the new character length is recorded in the 1st and 0th bits of t(0). If the new character length is greater than 3, t(0 ), the last two bits are recorded as 0. 2.根据权利要求1所述的一种快速低资源开销的改进LZO压缩方法,其特征在于:所述的第一压缩格式适用且进行压缩的新字符的回指距离不大于1k,第二压缩格式适用且进行压缩的新字符的回指距离大于1k小于等于16k。2. The improved LZO compression method for fast and low resource overhead according to claim 1, characterized in that: the first compression format is applicable and the anaphora distance of the new character to be compressed is not greater than 1k, and the second compression The anaphora distance of the new characters whose format is applicable and is compressed is greater than 1k and less than or equal to 16k. 3.根据权利要求1或2所述的一种快速低资源开销的改进LZO压缩方法,其特征在于:所述的第二压缩格式第一字节t(0)的高两位分别置为0、1,t(0)的5、4、3、2、1、0bit记录重复长度的值,4≤重复长度≤65的值,倒数第二字节u的7、6、5、4、3、2bit分别记录回指距离的低6位,u的1、0bit记录下次输入数据为新字符且新字符长度≤3时新字符长度的值,最后一字节v记录回指距离的后8位。3. The improved LZO compression method for fast and low resource overhead according to claim 1 or 2, characterized in that: the upper two bits of the first byte t(0) of the second compression format are respectively set to 0 , 1, 5, 4, 3, 2, 1, 0 bits of t(0) record the value of the repetition length, 4≤the value of repetition length≤65, 7, 6, 5, 4, 3 of the penultimate byte u , 2bit respectively record the lower 6 bits of the anaphora distance, 1 and 0bit of u record the value of the new character length when the next input data is a new character and the new character length is less than or equal to 3, the last byte v records the last 8 bits of the anaphora distance bit. 4.根据权利要求3所述的一种快速低资源开销的改进LZO压缩方法,其特征在于:所述的第二压缩格式对应的第二压缩算法包括如下步骤:4. The improved LZO compression method for fast and low resource overhead according to claim 3, wherein the second compression algorithm corresponding to the second compression format comprises the following steps: 步骤1、判断待压缩数据的回指距离,当1k<待压缩数据的回指距离≤16k且重复长度M_len≥4时,转入步骤2,当重复长度M_len<4时,不对数据进行压缩;Step 1. Determine the anaphora distance of the data to be compressed, when 1k<the anaphora distance of the data to be compressed≤16k and the repetition length M_len≥4, go to step 2, when the repetition length M_len<4, do not compress the data; 步骤2、当4≤M_len≤65时,将M_len-2的值记录在t(0)的5、4、3、2、1、0bit中;Step 2. When 4≤M_len≤65, record the value of M_len-2 in bits 5, 4, 3, 2, 1, and 0 of t(0); 当65<M_len≤320时,将t(0)的5、4、3、2、1、0bit分别置为0,将M_len-65的值记录在第二字节t(1)中;When 65<M_len≤320, set bits 5, 4, 3, 2, 1, and 0 of t(0) to 0, respectively, and record the value of M_len-65 in the second byte t(1); 当320<M_len≤n*255+320,t(0)的5、4、3、2、1bit位的值分别为0,将第2字节t(1)至第n+1字节t(n)的8bit分别记为0,将M_len-65-n*255的值记录于第n+2字节t(n+1)中;When 320<M_len≤n*255+320, the values of bits 5, 4, 3, 2, and 1 of t(0) are 0, respectively, and the second byte t(1) to the n+1th byte t( The 8 bits of n) are respectively recorded as 0, and the value of M_len-65-n*255 is recorded in the n+2th byte t(n+1); 步骤3、如果下次输入数据为新字符且新字符长度≤3,则将新字符长度的值记录于u的第1、0bit,如果新字符长度>3时,则将u的最后两bit记为0。Step 3. If the next input data is a new character and the new character length is less than or equal to 3, record the value of the new character length in the 1st and 0th bits of u. If the new character length is greater than 3, record the last two bits of u. is 0.
CN201711050579.2A 2017-10-31 2017-10-31 Improved LZO compression method with rapid low resource overhead Active CN107967296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711050579.2A CN107967296B (en) 2017-10-31 2017-10-31 Improved LZO compression method with rapid low resource overhead

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711050579.2A CN107967296B (en) 2017-10-31 2017-10-31 Improved LZO compression method with rapid low resource overhead

Publications (2)

Publication Number Publication Date
CN107967296A CN107967296A (en) 2018-04-27
CN107967296B true CN107967296B (en) 2020-06-09

Family

ID=62000828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711050579.2A Active CN107967296B (en) 2017-10-31 2017-10-31 Improved LZO compression method with rapid low resource overhead

Country Status (1)

Country Link
CN (1) CN107967296B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7417570B2 (en) * 2006-07-31 2008-08-26 Sap Ag Lossless comparative compression and transmission method and system
CN104378119A (en) * 2014-12-09 2015-02-25 西安电子科技大学 Quick lossless compression method for file system data of embedded equipment
CN104410424A (en) * 2014-11-26 2015-03-11 西安电子科技大学 Quick lossless compression method of memory data of embedded device
US9331712B1 (en) * 2015-05-11 2016-05-03 Qualcomm Incorporated Compressed caching in a virtual memory system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7417570B2 (en) * 2006-07-31 2008-08-26 Sap Ag Lossless comparative compression and transmission method and system
CN104410424A (en) * 2014-11-26 2015-03-11 西安电子科技大学 Quick lossless compression method of memory data of embedded device
CN104378119A (en) * 2014-12-09 2015-02-25 西安电子科技大学 Quick lossless compression method for file system data of embedded equipment
US9331712B1 (en) * 2015-05-11 2016-05-03 Qualcomm Incorporated Compressed caching in a virtual memory system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
File compression with LZO algorithm using NVIDIA CUDA architecture;L.Erdodi;《LINDI 2012 4th IEEE International Symposium on Logistics and Industrial Informatics》;20120907;第1-4页 *
高效无损压缩算法的研究与实现;宋秉玺;《中国优秀硕士学位论文数据库》;20141115(第11期);第15-21页 *

Also Published As

Publication number Publication date
CN107967296A (en) 2018-04-27

Similar Documents

Publication Publication Date Title
CN112953550B (en) Data compression method, electronic device and storage medium
CN102970043B (en) A kind of compression hardware system based on GZIP and accelerated method thereof
CN107565971B (en) A data compression method and device
US8407378B2 (en) High-speed inline data compression inline with an eight byte data path
US9479194B2 (en) Data compression apparatus and data decompression apparatus
CN103236847A (en) Multilayer Hash structure and run coding-based lossless compression method for data
JP2014082762A (en) Data compression apparatus and method, and memory system including data compression apparatus
JP2014132750A (en) Data compression method, and apparatus for performing the method
US20200294629A1 (en) Gene sequencing data compression method and decompression method, system and computer-readable medium
US9966971B2 (en) Character conversion
CN106407285A (en) RLE and LZW-based optimized bit file compression and decompression method
CN106656198B (en) A coding method based on LZ77
CN106849956B (en) Compression method, decompression method, apparatus and data processing system
US20180225298A1 (en) Adaptive rate compression hash processing device
CN108873062A (en) A kind of Multi-encoder high-speed seismic data parallel lossless compression method based on FPGA
CN105824574A (en) Memory data storage method
US9979415B2 (en) Data compression apparatus, data decompression apparatus, data compression method, data compression method, and computer readable medium
CN114640354A (en) Data compression method and device, electronic equipment and computer readable storage medium
Kaur Design and Implementation of Lzw data compression algorithm
CN104378119B (en) The fast and lossless compression method of file system of embedded device data
US11552652B2 (en) Systems and methods for lossless compression of tabular numeric data
CN107967296B (en) Improved LZO compression method with rapid low resource overhead
Joseph et al. A Novel Approach of Modified Run Length Encoding Scheme for High Speed Data Communication Application
US9787323B1 (en) Huffman tree decompression
CN110021349B (en) Method for encoding gene data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant