[go: up one dir, main page]

CN104410424B - The fast and lossless compression method of embedded device internal storage data - Google Patents

The fast and lossless compression method of embedded device internal storage data Download PDF

Info

Publication number
CN104410424B
CN104410424B CN201410696377.5A CN201410696377A CN104410424B CN 104410424 B CN104410424 B CN 104410424B CN 201410696377 A CN201410696377 A CN 201410696377A CN 104410424 B CN104410424 B CN 104410424B
Authority
CN
China
Prior art keywords
length
record
new character
remaining
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410696377.5A
Other languages
Chinese (zh)
Other versions
CN104410424A (en
Inventor
宋彬
李慧玲
秦浩
裴远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201410696377.5A priority Critical patent/CN104410424B/en
Publication of CN104410424A publication Critical patent/CN104410424A/en
Application granted granted Critical
Publication of CN104410424B publication Critical patent/CN104410424B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明公开了一种数据处理技术领域嵌入式设备内存数据的快速无损压缩方法,主要解决已有的压缩方法对内存页面压缩速度低的问题。其主要特点是设计了两种适合内存页面数据的压缩格式:一种是首字节记录字符重复长度、偏移距离和新字符长度,从第二个字节开始依次记录剩余的新字符长度、新字符和剩余的偏移距离;第二种是首字节记录压缩格式标志、偏移距离和新字符长度,从第二个字节开始依次记录剩余的新字符长度、新字符、字符重复长度和剩余的偏移距离。本发明与目前的LZO无损压缩方法相比,提高了内存页面数据的压缩与解压缩速度,并得到更好的压缩率,从而提高了嵌入式设备的内存数据存储容量和利用率,可用于存储受限的嵌入式设备。

The invention discloses a fast lossless compression method for embedded device memory data in the technical field of data processing, which mainly solves the problem of low compression speed of memory pages in the existing compression method. Its main feature is to design two compression formats suitable for memory page data: one is to record the character repetition length, offset distance and new character length in the first byte, and record the remaining new character length, The new character and the remaining offset distance; the second is that the first byte records the compression format flag, the offset distance and the new character length, and records the remaining new character length, new character, and character repetition length sequentially from the second byte and the remaining offset distance. Compared with the current LZO lossless compression method, the present invention improves the compression and decompression speed of memory page data, and obtains a better compression rate, thereby improving the memory data storage capacity and utilization rate of embedded devices, and can be used for storage Restricted embedded devices.

Description

嵌入式设备内存数据的快速无损压缩方法Fast Lossless Compression Method for Embedded Device Memory Data

技术领域technical field

本发明属于数据处理技术领域,涉及嵌入式设备内存数据的数据压缩方法,本发明在数据压缩时根据内存数据的特征采用新的数据压缩格式提高了压缩的速度,可用在存储受限的嵌入式设备中。The invention belongs to the technical field of data processing, and relates to a data compression method for embedded device memory data. The invention adopts a new data compression format according to the characteristics of memory data during data compression to improve the compression speed, and can be used in embedded devices with limited storage. in the device.

背景技术Background technique

内存是计算机的重要部件之一,它是与CPU进行沟通的桥梁。计算机中所有程序的运行都是在内存中进行的。内存的性能对计算机的影响很大,而在体积、存储容量受限的嵌入式便携设备中,内存对设备性能和用户体验的影响尤为突出。近些年来,随着移动互联网的发展,嵌入式便携设备如手机、平板电脑等已经成为人们必备的一种通信工具。因此对内存数据进行压缩,挺高内存存储能力和利用率将大大提高设备的整体性能。随着社会发展,信息量不断增长,人们对嵌入式设备的系统性能也提出了更高的要求,如更高的速度、更低的耗能、更小的体积、能存取更多的信息等等。为了达到上面的各种性能要求,人们提出了各种改进的方法。相较于高额的硬件技术的突破,更加快速有效的方法之一就是无损数据压缩技术。如若在嵌入式设备中运用无损数据压缩技术,则可以在相同的硬件存储空间中存取更多的数据,提高内存利用率,降低成本,挺高设备性能和用户体验。鉴于上述技术的各种优点,运用这种简单而廉价的改进嵌入式系统性能的技术,研究无损数据压缩技术是很有必要的。Memory is one of the important parts of the computer, it is a bridge to communicate with the CPU. All programs in a computer run in memory. The performance of the memory has a great impact on the computer, and in embedded portable devices with limited volume and storage capacity, the impact of the memory on the device performance and user experience is particularly prominent. In recent years, with the development of the mobile Internet, embedded portable devices such as mobile phones and tablet computers have become a necessary communication tool for people. Therefore, compressing the memory data, increasing the memory storage capacity and utilization rate will greatly improve the overall performance of the device. With the development of society and the increasing amount of information, people have put forward higher requirements for the system performance of embedded devices, such as higher speed, lower energy consumption, smaller volume, and more information can be accessed wait. In order to meet the various performance requirements above, various improved methods have been proposed. Compared with high-cost hardware technology breakthroughs, one of the faster and more effective methods is lossless data compression technology. If lossless data compression technology is used in embedded devices, more data can be accessed in the same hardware storage space, memory utilization can be improved, costs can be reduced, and device performance and user experience can be improved. In view of the various advantages of the above-mentioned technologies, it is necessary to study lossless data compression technology to use this simple and cheap technology to improve the performance of embedded systems.

Lempel和Ziv于1977年提出了一种高效率的无失真压缩技术,即LZ77无损数据压缩算法,该压缩算法的主要原理是利用较短的标记代表前面出现过的重复字串,标记格式为重复长度,偏移距离,如abcdekabcdeha,则可以编码成abcdek(5,6)ha表示,这样从整体上而言,较短的信息代替较长的信息,从而达到了压缩的效果。1982年,James Storer和Thomas Szymanski在LZ77基础上将算法进行改进提出了LZSS算法,提高了压缩效率。后来Lempel-Ziv-Oberhumer又在LZSS的基础上将算法进行改进提出了LZO算法,极大地提高了压缩编码速度。LZO算法是一种基于字典的无损的数据压缩算法,具有压缩速度快、即时性的特点。该算法根据不同的重复长度和偏移距离设计了五种数据压缩格式,编码端按照匹配对,即重复长度,偏移距离的大小选择某一种压缩格式编码,解码端通过压缩格式的首字节大小区分这五种不同的格式,最大的偏移距离可以达到48K。该方法存在的不足之处是,自内存“分页机制”提出之始,内存页面的默认大小便被设置为4096字节,即4KB。虽然在原则上计算机的内存页面大小是可配置的,但绝大多数的操作系统在实现中仍然采用默认的4KB页面。为便于内存数据管理,应对内存数据采用逐页压缩的方式,而LZO初始设计目的是压缩长度不定的数据,它在压缩内存数据时,只能取得很低的压缩率,不能有效的提高内存利用率,并且压缩与解压缩速度都很慢。因此对于内存数据,用目前的LZO的压缩方式和压缩格式都不能适用。Lempel and Ziv proposed a high-efficiency lossless compression technology in 1977, that is, the LZ77 lossless data compression algorithm. The main principle of the compression algorithm is to use shorter tags to represent the repeated strings that have appeared before, and the tag format is repeated. Length and offset distance, such as abcdekabcdeha, can be coded into abcdek(5,6)ha, so that on the whole, shorter information replaces longer information, thereby achieving the effect of compression. In 1982, James Storer and Thomas Szymanski improved the algorithm on the basis of LZ77 and proposed the LZSS algorithm, which improved the compression efficiency. Later, Lempel-Ziv-Oberhumer improved the algorithm on the basis of LZSS and proposed the LZO algorithm, which greatly improved the compression coding speed. The LZO algorithm is a lossless data compression algorithm based on a dictionary, which has the characteristics of fast compression speed and instantaneity. The algorithm designs five data compression formats according to different repetition lengths and offset distances. The encoding end selects a certain compression format for encoding according to the matching pair, that is, the repetition length and the offset distance. The decoding end uses the first word of the compression format The section size distinguishes these five different formats, and the maximum offset distance can reach 48K. The disadvantage of this method is that since the memory "paging mechanism" was proposed, the default size of the memory page is set to 4096 bytes, that is, 4KB. Although in principle the memory page size of a computer is configurable, most operating systems still use the default 4KB page in implementation. In order to facilitate memory data management, the memory data should be compressed page by page, and the initial design purpose of LZO is to compress data with variable length. When compressing memory data, it can only achieve a very low compression rate and cannot effectively improve memory utilization. rate, and the compression and decompression speeds are very slow. Therefore, for memory data, the current LZO compression method and compression format cannot be applied.

发明内容Contents of the invention

本发明的目的在于克服上述已有技术的不足,提出了一种嵌入式设备内存数据的快速无损压缩方法,以能够更快速的压缩与解压缩内存数据,从而有效提高内存存储能力和利用率。The purpose of the present invention is to overcome the deficiencies of the above-mentioned prior art, and propose a fast and lossless compression method for embedded device memory data, so as to compress and decompress memory data more quickly, thereby effectively improving memory storage capacity and utilization rate.

实现本发明的技术方案是:根据内存页面的数据特征,设计一种适合内存页面的压缩格式,针对内存页面数据进行压缩编码,具体步骤包括如下:The technical solution for realizing the present invention is: according to the data characteristics of the memory page, design a kind of compression format suitable for the memory page, carry out compression coding for the memory page data, and concrete steps comprise as follows:

(1)读取嵌入式设备中内存数据的一个内存页面,即按4KB的页面大小逐页读取内存页面;(1) Read a memory page of the memory data in the embedded device, that is, read the memory page page by page according to the page size of 4KB;

(2)判断所读页面的数据是否为新数据,若所读数据未记录在字典中,则判断为新数据,并把新数据位置记入字典中,继续读取内存页面数据,直到未出现新数据为止;(2) Judging whether the data of the read page is new data, if the read data is not recorded in the dictionary, it is judged as new data, and the new data position is recorded in the dictionary, and the memory page data continues to be read until it does not appear up to new data;

所述字典是根据关键值直接访问的哈希表结构,该关键值是通过哈希函数计算得出;The dictionary is a hash table structure directly accessed according to a key value, which is calculated by a hash function;

(3)对已记录在字典中的所读数据,根据字符重复长度和偏移距离,即字符当前位置与哈希表内记录位置之间的距离,选用不同的压缩格式进行编码:(3) For the read data recorded in the dictionary, according to the character repetition length and offset distance, that is, the distance between the current position of the character and the record position in the hash table, select different compression formats for encoding:

对于字符重复长度小于8,且偏移距离小于等于2KB的内存页面数据,其首字节记录字符重复长度L、偏移距离D和新字符长度S;从第二个字节开始依次记录剩余的新字符长度M、新字符C和剩余的偏移距离N;For the memory page data whose character repetition length is less than 8 and whose offset distance is less than or equal to 2KB, the first byte records the character repetition length L, the offset distance D and the new character length S; start from the second byte to record the remaining The new character length M, the new character C and the remaining offset distance N;

对于不满足字符重复长度小于8,且偏移距离小于等于2KB的内存页面数据,其首字节记录压缩格式标志T、偏移距离D和新字符长度S;从第二个字节开始依次记录剩余的新字符长度M、新字符C、字符重复长度L和剩余的偏移距离N;For the memory page data that does not satisfy the character repetition length less than 8, and the offset distance is less than or equal to 2KB, the first byte records the compression format flag T, the offset distance D and the new character length S; start from the second byte to record sequentially The remaining new character length M, new character C, character repetition length L and remaining offset distance N;

(4)判断编码位置是否为当前读入内存页面结尾,若是,则输出压缩后的数据和数据的长度,并记录结束标志,执行步骤(5),否则,返回步骤(2),继续读入新数据;(4) Determine whether the encoding position is the end of the currently read-in memory page, if so, output the compressed data and the length of the data, and record the end mark, and execute step (5), otherwise, return to step (2) and continue reading new data;

(5)判断当前页面是否为内存数据包的最后一个内存页面,若是,则编码结束,否则,返回步骤(1)读入下一个内存页面。(5) Judging whether the current page is the last memory page of the memory data packet, if so, the encoding ends, otherwise, return to step (1) to read the next memory page.

本发明由于所采用的压缩格式简单,从而提高了内存页面数据的压缩与解压缩速度,并得到更好的压缩率,能够较大幅提高嵌入式设备的内存数据存储容量和利用率。Because the compression format adopted by the present invention is simple, the compression and decompression speed of memory page data is improved, better compression rate is obtained, and the memory data storage capacity and utilization rate of embedded devices can be greatly improved.

测试结果表明:本发明与目前的LZO无损压缩方法相比,其压缩时间提高了14.52%,解压缩时间提高了98.84%,压缩率提高了1.1%。The test results show that: compared with the current LZO lossless compression method, the present invention increases the compression time by 14.52%, the decompression time by 98.84%, and the compression rate by 1.1%.

附图说明Description of drawings

图1是本发明的压缩流程图;Fig. 1 is a compression flowchart of the present invention;

图2是本发明中的压缩格式图。Fig. 2 is a diagram of the compression format in the present invention.

具体实施方式detailed description

下面结合图对本发明作进一步详细描述:Below in conjunction with figure, the present invention is described in further detail:

参照图1,本发明的实现步骤如下:With reference to Fig. 1, the realization steps of the present invention are as follows:

步骤1:从嵌入式设备的内存数据包中读入一个内存页面,即按4KB的页面大小逐页读取内存页面。Step 1: Read a memory page from the memory data packet of the embedded device, that is, read the memory page page by page according to the page size of 4KB.

步骤2:从所读内存页面读入四个字符,做第一次哈希运算,即通过第一个哈希函数计算得出关键值,该哈希函数为目前的LZO无损压缩方法中的第一个哈希函数。Step 2: Read four characters from the read memory page, and perform the first hash operation, that is, calculate the key value through the first hash function, which is the first in the current LZO lossless compression method a hash function.

步骤3:根据步骤2中关键值判断字符的位置是否合法,若合法,则进入步骤4,若不合法,则更新哈希表,该哈希表是根据关键值直接访问的数据结构,再返回步骤2。Step 3: Determine whether the position of the character is legal according to the key value in step 2. If it is legal, go to step 4. If not, update the hash table. The hash table is a data structure directly accessed according to the key value, and then return step2.

所述的合法,是指哈希表中所存的每一个位置都只能根据一个关键值访问。The legality mentioned means that every position stored in the hash table can only be accessed according to a key value.

步骤4:判断当前哈希表所存位置中的字符是否与读入字符相同,若相同,则进入步骤7,若不相同,则进入步骤5。Step 4: Judging whether the character in the current location of the hash table is the same as the read character, if they are the same, go to step 7, if not, go to step 5.

所述的当前哈希表所存位置,是指根据步骤2中关键值直接访问的哈希表所存的一个位置。The current storage location of the hash table refers to a location stored in the hash table directly accessed according to the key value in step 2.

步骤5:用步骤2中得到的关键值做第二次哈希运算,即通过第二个哈希函数计算得出第二个关键值,该哈希函数为目前的LZO无损压缩方法中的第二个哈希函数;再根据第二个关键值判断字符位置是否合法,若合法,则进入步骤6,若不合法,则更新哈希表,返回步骤2。Step 5: Use the key value obtained in step 2 to perform the second hash operation, that is, calculate the second key value through the second hash function, which is the first in the current LZO lossless compression method Two hash functions; then judge whether the character position is legal according to the second key value, if it is legal, go to step 6, if not, update the hash table, and return to step 2.

步骤6:判断该哈希表所存地址中的字符是否与读入字符相同,若相同,则进入步骤7,若不相同,则判定读入字符为新字符C,并更新哈希表,返回步骤2。Step 6: Determine whether the character in the address stored in the hash table is the same as the read character, if it is the same, go to step 7, if not, determine that the read character is a new character C, update the hash table, and return to step 2.

所述的哈希表所存地址,是指根据步骤5中第二个关键值直接访问的哈希表所存的一个地址。The address stored in the hash table refers to an address stored in the hash table directly accessed according to the second key value in step 5.

步骤7:计算新字符长度S、字符重复长度L和偏移距离D,即字符当前位置与哈希表内记录位置之间的距离。Step 7: Calculate the new character length S, the character repetition length L and the offset distance D, that is, the distance between the current position of the character and the record position in the hash table.

该计算方法与目前的LZO无损压缩方法中的计算方法相同。The calculation method is the same as that in the current LZO lossless compression method.

步骤8:判断字符重复长度是否小于8且偏移距离是否小于等于2KB,若是,则执行步骤9;若不是,则执行步骤10。Step 8: Determine whether the character repetition length is less than 8 and whether the offset distance is less than or equal to 2KB, if yes, perform step 9; if not, perform step 10.

步骤9:将字符按压缩格式1的记录规则进行编码。Step 9: Encode the characters according to the recording rules of compression format 1.

参照图2(a),本步骤按照压缩格式1的记录规则进行编码的步骤如下:Referring to Figure 2(a), the steps of encoding in this step according to the recording rules of compression format 1 are as follows:

9.1)首字节的前3位记录字符重复长度L,首字节的第4、第5、第6位记录偏移距离D的后3比特,即每一位记录1比特;9.1) The first 3 bits of the first byte record the character repetition length L, and the 4th, 5th, and 6th bits of the first byte record the last 3 bits of the offset distance D, that is, each bit records 1 bit;

9.2)判断新字符长度S是否大于3,若不是,则首字节的最后2位记录新字符长度S,并从第二个字节开始记录新字符C;若是,则首字节的最后2位记录为0作为标志,并用新字符长度S减3,得到剩余的新字符长度M,再判断剩余的新字符长度M是否大于255,若不是,则记录剩余的新字符长度M,若是,则记录一个字节0,并用剩余的新字符长度M减255,直到剩余的新字符长度小于255,记录该剩余的新字符长度,再记录新字符C;9.2) Determine whether the new character length S is greater than 3, if not, record the new character length S in the last 2 bits of the first byte, and record the new character C from the second byte; if so, then record the new character C in the last 2 bits of the first byte Record the bit as 0 as a sign, and subtract 3 from the new character length S to obtain the remaining new character length M, and then judge whether the remaining new character length M is greater than 255, if not, record the remaining new character length M, if so, then Record a byte 0, and subtract 255 from the remaining new character length M until the remaining new character length is less than 255, record the remaining new character length, and then record the new character C;

9.3)记录新字符完成之后,记录剩余的偏移距离N,该剩余的偏移距离N是偏移距离D的前8比特;然后进入步骤11。9.3) After recording the new character, record the remaining offset distance N, which is the first 8 bits of the offset distance D; then enter step 11.

步骤10:将字符按压缩格式2的记录规则进行编码。Step 10: Encode characters according to the recording rules of compression format 2.

参照图2(b),本步骤按照压缩格式2的记录规则进行编码的步骤如下:Referring to Figure 2(b), the steps of encoding in this step according to the recording rules of compression format 2 are as follows:

10.1)首字节的前2位记录为01作为压缩格式标志T,即第1位记录为0,第2位记录为1,首字节的第3、第4、第5、第6位记录偏移距离D的后4比特,即每一位记录1比特。10.1) The first two digits of the first byte are recorded as 01 as the compression format flag T, that is, the first digit is recorded as 0, the second digit is recorded as 1, and the third, fourth, fifth, and sixth digits of the first byte are recorded The last 4 bits of the offset distance D, that is, each bit records 1 bit.

10.2)判断新字符长度S是否大于3,若不是,则首字节的最后2位记录新字符长度S,即每一位记录1比特,并从第二个字节开始记录新字符;若是,则首字节的最后2位记录为0作为标志,并用新字符长度S减3,得到剩余的新字符长度M,再判断剩余的新字符长度M是否大于255,若不是,则记录剩余的新字符长度M,若是,则记录一个字节0,并用剩余的新字符长度M减255,直到剩余的新字符长度小于255,记录该剩余的新字符长度,再记录新字符C;10.2) Determine whether the new character length S is greater than 3, if not, then record the new character length S in the last 2 bits of the first byte, that is, record 1 bit for each bit, and record the new character from the second byte; if so, Then the last 2 bits of the first byte are recorded as 0 as a sign, and the new character length S is subtracted by 3 to obtain the remaining new character length M, and then judge whether the remaining new character length M is greater than 255, if not, record the remaining new character length Character length M, if so, record a byte 0, and subtract 255 from the remaining new character length M, until the remaining new character length is less than 255, record the remaining new character length, and then record the new character C;

10.3)记录新字符完成之后,判断字符重复长度L是否大于255,若不是,则记录字符重复长度L,若是,则记录一个字节0,并用字符重复长度L减255,直到字符重复长度小于255,记录该字符重复长度;10.3) After recording the new character, judge whether the character repetition length L is greater than 255, if not, record the character repetition length L, if so, record a byte 0, and subtract 255 from the character repetition length L until the character repetition length is less than 255 , record the character repetition length;

10.4)记录字符重复长度完成之后,记录剩余的偏移距离N,该剩余的偏移距离N为偏移距离D的前8比特。10.4) After recording the character repetition length, record the remaining offset distance N, which is the first 8 bits of the offset distance D.

步骤11:判断编码位置是否与当前读入内存页面结尾位置相同,若相同,则输出编码后的数据和数据的长度,并记录结束标志,然后进入步骤12,若不相同,则返回步骤2。Step 11: Determine whether the encoding position is the same as the end position of the currently read-in memory page. If they are the same, output the encoded data and the length of the data, and record the end mark, and then enter step 12. If not, return to step 2.

所述的结束标志,是指记录三个字节的数据,即第1个字节记录为17,第2和第3个字节都记录为0。The end mark refers to recording three bytes of data, that is, the first byte is recorded as 17, and the second and third bytes are both recorded as 0.

步骤12:判断当前内存页面是否为内存数据包的最后一个内存页面,即嵌入式设备的内存数据包中的所有数据是否都已经被读取,若是,则编码结束,若不是,则返回步骤1。Step 12: Determine whether the current memory page is the last memory page of the memory data packet, that is, whether all the data in the memory data packet of the embedded device has been read, if so, the encoding ends, if not, return to step 1 .

下面结合实验对本发明的效果做进一步说明:Effect of the present invention is described further below in conjunction with experiment:

本实验采用C语言来编写发明所提出的压缩方法,通过比较本发明与目前的LZO无损压缩方法对内存页面数据的压缩效果,来说明本发明方法压缩与解压缩速度的优点。LZO是目前最好的无损压缩方法。本实验所采用的内存数据为典型移动设备的4KB大小的内存页面数据,实验使用数据为内存页面数据包,数据包大小为453MB。在VS2010程序开发环境中,分别用本发明和目前的LZO无损压缩方法压缩内存页面数据,实验结果如表1所示:This experiment adopts C language to write the compression method proposed by the invention, by comparing the compression effect of the present invention and the current LZO lossless compression method on the memory page data, the advantages of the compression and decompression speed of the method of the present invention are illustrated. LZO is currently the best lossless compression method. The memory data used in this experiment is the 4KB memory page data of a typical mobile device. The data used in the experiment is the memory page data packet, and the data packet size is 453MB. In the VS2010 program development environment, use the present invention and the current LZO lossless compression method to compress memory page data respectively, and the experimental results are shown in Table 1:

表1Table 1

表1中的时间是整个压缩包所有内存页面的压缩时间与解压缩时间,表格中数据是运行了1000次取平均的结果。由表1可以看出,本发明的压缩率提高了1.1%,同时压缩时间与解压缩时间分别提高了14.52%和98.84%,从而提高了嵌入式内存数据存储容量和利用率。The time in Table 1 is the compression time and decompression time of all memory pages in the entire compressed package, and the data in the table is the average result of running 1000 times. It can be seen from Table 1 that the compression rate of the present invention is increased by 1.1%, and the compression time and decompression time are respectively increased by 14.52% and 98.84%, thereby improving the data storage capacity and utilization rate of the embedded memory.

Claims (5)

1.一种嵌入式设备内存数据的快速无损压缩方法,包括如下步骤:1. A fast lossless compression method for embedded device memory data, comprising the steps: (1)读取嵌入式设备中内存数据的一个内存页面,即按4KB的页面大小逐页读取内存页面;(1) Read a memory page of the memory data in the embedded device, that is, read the memory page page by page according to the page size of 4KB; (2)判断所读页面的数据是否为新数据,若所读数据未记录在字典中,则判断为新数据,并把新数据位置记入字典中,继续读取内存页面数据,直到未出现新数据为止;(2) Judging whether the data of the read page is new data, if the read data is not recorded in the dictionary, it is judged as new data, and the new data position is recorded in the dictionary, and the memory page data continues to be read until it does not appear up to new data; 所述字典是根据关键值直接访问的哈希表结构,该关键值是通过哈希函数计算得出;The dictionary is a hash table structure directly accessed according to a key value, which is calculated by a hash function; (3)对已记录在字典中的所读数据,根据字符重复长度和偏移距离,即字符当前位置与哈希表内记录位置之间的距离,选用不同的压缩格式进行编码:(3) For the read data recorded in the dictionary, according to the character repetition length and offset distance, that is, the distance between the current position of the character and the record position in the hash table, select different compression formats for encoding: 对于字符重复长度小于8,且偏移距离小于等于2KB的内存页面数据,其首字节记录字符重复长度L、偏移距离D和新字符长度S;从第二个字节开始依次记录剩余的新字符长度M、新字符C和剩余的偏移距离N;For the memory page data whose character repetition length is less than 8 and whose offset distance is less than or equal to 2KB, the first byte records the character repetition length L, the offset distance D and the new character length S; start from the second byte to record the remaining The new character length M, the new character C and the remaining offset distance N; 对于不满足字符重复长度小于8,且不满足偏移距离小于等于2KB的内存页面数据,其首字节记录压缩格式标志T、偏移距离D和新字符长度S;从第二个字节开始依次记录剩余的新字符长度M、新字符C、字符重复长度L和剩余的偏移距离N;For the memory page data that does not meet the character repetition length less than 8, and does not meet the offset distance less than or equal to 2KB, its first byte records the compression format flag T, offset distance D and new character length S; start from the second byte Record the remaining new character length M, new character C, character repetition length L and remaining offset distance N in sequence; (4)判断编码位置是否为当前读入内存页面结尾,若是,则输出压缩后的数据和数据的长度,并记录结束标志,执行步骤(5),否则,返回步骤(2),继续读入新数据;(4) Determine whether the encoding position is the end of the currently read-in memory page, if so, output the compressed data and the length of the data, and record the end mark, and execute step (5), otherwise, return to step (2) and continue reading new data; (5)判断当前页面是否为内存数据包的最后一个内存页面,若是,则编码结束,否则,返回步骤(1)读入下一个内存页面。(5) Judging whether the current page is the last memory page of the memory data packet, if so, the encoding ends, otherwise, return to step (1) to read the next memory page. 2.根据权利要求1所述的嵌入式设备内存数据的快速无损压缩方法,其特征在于:步骤(3)所述的首字节记录字符重复长度L、偏移距离D和新字符长度S,按如下规则记录:2. the fast lossless compression method of embedded device memory data according to claim 1, is characterized in that: the first byte record character repetition length L described in step (3), offset distance D and new character length S, Record according to the following rules: 首字节的前3位记录字符重复长度L,L<8;The first 3 digits of the first byte record character repetition length L, L<8; 首字节的第4、第5、第6位记录偏移距离D的后3比特,即每一位记录1比特;The 4th, 5th, and 6th bits of the first byte record the last 3 bits of the offset distance D, that is, each bit records 1 bit; 判断新字符长度S是否大于3,若不是,则首字节的最后2位记录新字符长度S,即每一位记录1比特,若是,则首字节的最后2位记录为0作为标志,并用新字符长度S减3,得到剩余的新字符长度M。Determine whether the new character length S is greater than 3, if not, the last 2 bits of the first byte record the new character length S, that is, each bit records 1 bit, if so, the last 2 bits of the first byte are recorded as 0 as a sign, And subtract 3 from the new character length S to obtain the remaining new character length M. 3.根据权利要求1所述的嵌入式设备内存数据的快速无损压缩方法,其特征在于:步骤(3)所述的从第二个字节开始依次记录剩余的新字符长度M、新字符C和剩余的偏移距离N,按如下规则记录:3. the fast lossless compression method of embedded device memory data according to claim 1, is characterized in that: the described in step (3) starts to record remaining new character length M, new character C successively from second byte and the remaining offset distance N, recorded according to the following rules: 判断剩余的新字符长度M是否大于255,若不是,则记录剩余的新字符长度M,若是,则记录一个字节0,并用剩余的新字符长度M减255,直到剩余的新字符长度小于255,记录该剩余的新字符长度,再记录新字符C;Determine whether the remaining new character length M is greater than 255, if not, record the remaining new character length M, if so, record a byte 0, and subtract 255 from the remaining new character length M until the remaining new character length is less than 255 , record the remaining new character length, and then record the new character C; 记录新字符完成之后,记录剩余的偏移距离N,该剩余的偏移距离N是偏移距离D的前8比特。After the new character is recorded, the remaining offset distance N is recorded, and the remaining offset distance N is the first 8 bits of the offset distance D. 4.根据权利要求1所述的嵌入式设备内存数据的快速无损压缩方法,其特征在于:步骤(3)所述的首字节记录压缩格式标志T、偏移距离D和新字符长度S,按如下规则记录:4. the fast lossless compression method of embedded device memory data according to claim 1, is characterized in that: the first byte record compression format sign T, offset distance D and new character length S described in step (3), Record according to the following rules: 首字节的前2位记录为01作为压缩格式标志T,即第1位记录为0,第2位记录为1;The first two digits of the first byte are recorded as 01 as the compression format flag T, that is, the first digit is recorded as 0, and the second digit is recorded as 1; 首字节的第3、第4、第5、第6位记录偏移距离D的后4比特,即每一位记录1比特;The 3rd, 4th, 5th, and 6th bits of the first byte record the last 4 bits of the offset distance D, that is, each bit records 1 bit; 判断新字符长度S是否大于3,若不是,则首字节的最后2位记录新字符长度S,即每一位记录1比特,若是,则首字节的最后2位记录为0作为标志,并用新字符长度S减3,得到剩余的新字符长度M。Determine whether the new character length S is greater than 3, if not, the last 2 bits of the first byte record the new character length S, that is, each bit records 1 bit, if so, the last 2 bits of the first byte are recorded as 0 as a sign, And subtract 3 from the new character length S to obtain the remaining new character length M. 5.根据权利要求1所述的嵌入式设备内存数据的快速无损压缩方法,其特征在于:步骤(3)所述的从第二个字节开始依次记录剩余的新字符长度M、新字符C、字符重复长度L和剩余的偏移距离N,按如下规则记录:5. the fast lossless compression method of embedded device internal memory data according to claim 1 is characterized in that: the described in step (3) starts to record remaining new character length M, new character C successively from second byte , the character repetition length L and the remaining offset distance N are recorded according to the following rules: 判断剩余的新字符长度M是否大于255,若不是,则记录剩余的新字符长度M,若是,则记录一个字节0,并用剩余的新字符长度M减255,直到剩余的新字符长度小于255,记录该剩余的新字符长度,再记录新字符C;Determine whether the remaining new character length M is greater than 255, if not, record the remaining new character length M, if so, record a byte 0, and subtract 255 from the remaining new character length M until the remaining new character length is less than 255 , record the remaining new character length, and then record the new character C; 记录新字符完成之后,判断字符重复长度L是否大于255,若不是,则记录字符重复长度L,若是,则记录一个字节0,并用字符重复长度L减255,直到字符重复长度小于255,记录该字符重复长度;After recording the new character, judge whether the character repetition length L is greater than 255, if not, record the character repetition length L, if so, record a byte 0, and subtract 255 from the character repetition length L until the character repetition length is less than 255, record The character repeat length; 记录字符重复长度完成之后,记录剩余的偏移距离N,该剩余的偏移距离N为偏移距离D的前8比特。After the character repetition length is recorded, the remaining offset distance N is recorded, and the remaining offset distance N is the first 8 bits of the offset distance D.
CN201410696377.5A 2014-11-26 2014-11-26 The fast and lossless compression method of embedded device internal storage data Expired - Fee Related CN104410424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410696377.5A CN104410424B (en) 2014-11-26 2014-11-26 The fast and lossless compression method of embedded device internal storage data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410696377.5A CN104410424B (en) 2014-11-26 2014-11-26 The fast and lossless compression method of embedded device internal storage data

Publications (2)

Publication Number Publication Date
CN104410424A CN104410424A (en) 2015-03-11
CN104410424B true CN104410424B (en) 2017-06-16

Family

ID=52648025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410696377.5A Expired - Fee Related CN104410424B (en) 2014-11-26 2014-11-26 The fast and lossless compression method of embedded device internal storage data

Country Status (1)

Country Link
CN (1) CN104410424B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106385260B (en) * 2016-09-28 2019-05-21 中电莱斯信息系统有限公司 A kind of FPGA realization system of the LZ lossless compression algorithm based on low delay
CN107967296B (en) * 2017-10-31 2020-06-09 西安空间无线电技术研究所 Improved LZO compression method with rapid low resource overhead
CN110417923B (en) * 2018-04-26 2021-10-29 阿里巴巴集团控股有限公司 DNS message processing method, device and equipment
CN112230032A (en) * 2020-08-03 2021-01-15 青岛鼎信通讯股份有限公司 Electric energy meter data compression and decompression method
CN114244373B (en) * 2022-02-24 2022-05-20 麒麟软件有限公司 LZ series compression algorithm coding and decoding speed optimization method
CN114598329B (en) * 2022-03-18 2023-04-25 电子科技大学 Lightweight lossless compression method for rapid decompression application

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5455576A (en) * 1992-12-23 1995-10-03 Hewlett Packard Corporation Apparatus and methods for Lempel Ziv data compression with improved management of multiple dictionaries in content addressable memory
US7190284B1 (en) * 1994-11-16 2007-03-13 Dye Thomas A Selective lossless, lossy, or no compression of data based on address range, data type, and/or requesting agent
CN103138764A (en) * 2011-11-22 2013-06-05 上海麦杰科技股份有限公司 Method and system for lossless compression of real-time data
CN103236847A (en) * 2013-05-06 2013-08-07 西安电子科技大学 Multilayer Hash structure and run coding-based lossless compression method for data
CN103258030A (en) * 2013-05-09 2013-08-21 西安电子科技大学 Mobile device memory compression method based on dictionary encoding and run-length encoding
CN103618554A (en) * 2013-12-01 2014-03-05 西安电子科技大学 Internal storage page compression method based on dictionary
CN104050269A (en) * 2014-06-23 2014-09-17 上海帝联信息科技股份有限公司 Log compression method and device and log decompression method and device
CN104125458A (en) * 2013-04-27 2014-10-29 展讯通信(上海)有限公司 Lossless stored data compression method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5455576A (en) * 1992-12-23 1995-10-03 Hewlett Packard Corporation Apparatus and methods for Lempel Ziv data compression with improved management of multiple dictionaries in content addressable memory
US7190284B1 (en) * 1994-11-16 2007-03-13 Dye Thomas A Selective lossless, lossy, or no compression of data based on address range, data type, and/or requesting agent
CN103138764A (en) * 2011-11-22 2013-06-05 上海麦杰科技股份有限公司 Method and system for lossless compression of real-time data
CN104125458A (en) * 2013-04-27 2014-10-29 展讯通信(上海)有限公司 Lossless stored data compression method and device
CN103236847A (en) * 2013-05-06 2013-08-07 西安电子科技大学 Multilayer Hash structure and run coding-based lossless compression method for data
CN103258030A (en) * 2013-05-09 2013-08-21 西安电子科技大学 Mobile device memory compression method based on dictionary encoding and run-length encoding
CN103618554A (en) * 2013-12-01 2014-03-05 西安电子科技大学 Internal storage page compression method based on dictionary
CN104050269A (en) * 2014-06-23 2014-09-17 上海帝联信息科技股份有限公司 Log compression method and device and log decompression method and device

Also Published As

Publication number Publication date
CN104410424A (en) 2015-03-11

Similar Documents

Publication Publication Date Title
CN103258030B (en) Based on the mobile device memory compression methods that dictionary and brigade commander are encoded
CN104410424B (en) The fast and lossless compression method of embedded device internal storage data
CN103236847B (en) Based on the data lossless compression method of multilayer hash data structure and Run-Length Coding
CN103326732B (en) The method of compression data, the decompression method of data, encoder
CN105207678B (en) A kind of system for implementing hardware of modified LZ4 compression algorithms
US20200294629A1 (en) Gene sequencing data compression method and decompression method, system and computer-readable medium
WO2019153700A1 (en) Encoding and decoding method, apparatus and encoding and decoding device
JP2017511521A (en) Flash memory compression
JP2014132750A (en) Data compression method, and apparatus for performing the method
CN103152430B (en) A kind of reduce the cloud storage method that data take up room
CN104378119B (en) The fast and lossless compression method of file system of embedded device data
CN107565971A (en) A kind of data compression method and device
CN104142892A (en) Data reading-writing method, data reading-writing device and data reading-writing system
CN106656198B (en) Coding method based on L Z77
CN101483779A (en) Compressing method for two-dimension vector map
CN103618554B (en) Memory pages compression method based on dictionary
US20140258247A1 (en) Electronic apparatus for data access and data access method therefor
CN106688186A (en) Sharing initial dictionaries and huffman trees between multiple compressed blocks in LZ-based compression algorithms
CN103078647A (en) Hardware decoding implementation system and method of LZ77 compression algorithm
CN104021121B (en) A text data compression method, device and server
CN103049388B (en) Compression management method and device for paging memory device
CN103605730A (en) A method and device for compressing XML based on variable-length identification codes
CN106559085A (en) A kind of normal form Hafman decoding method and its device
CN103731154A (en) Data compression algorithm based on semantic analysis
JP2023132713A (en) Data expansion device, memory system, and data expansion method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170616