[go: up one dir, main page]

CN103049388B - Compression management method and device for paging memory device - Google Patents

Compression management method and device for paging memory device Download PDF

Info

Publication number
CN103049388B
CN103049388B CN201210519408.0A CN201210519408A CN103049388B CN 103049388 B CN103049388 B CN 103049388B CN 201210519408 A CN201210519408 A CN 201210519408A CN 103049388 B CN103049388 B CN 103049388B
Authority
CN
China
Prior art keywords
written
page data
dictionary
write command
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210519408.0A
Other languages
Chinese (zh)
Other versions
CN103049388A (en
Inventor
郭丹
梁小庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongshan Jiangbolong Electronics Co Ltd
Original Assignee
Shenzhen Netcom Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Netcom Electronics Co Ltd filed Critical Shenzhen Netcom Electronics Co Ltd
Priority to CN201210519408.0A priority Critical patent/CN103049388B/en
Publication of CN103049388A publication Critical patent/CN103049388A/en
Application granted granted Critical
Publication of CN103049388B publication Critical patent/CN103049388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is suitable for the technical field of memories, and provides a compression management method and a device of a paging memory device, wherein the method comprises the following steps: establishing a dictionary by taking a page as a unit, wherein page data, feature codes corresponding to the page data and physical page addresses written by the page data are recorded in the dictionary; acquiring a write command, wherein the write command comprises page data to be written and a logical page address written by the page data to be written; acquiring feature codes of page data to be written, and judging whether feature codes identical to the acquired feature codes exist in the dictionary or not; if not, executing the write command, and writing the page data to be written according to the logical page address written by the page data to be written; if so, not executing the write command, and pointing the logical page address written by the page data to be written to the physical page address written by the page data with the same feature code as the page data to be written. The invention can effectively solve the management problem of the memory device taking the page as the minimum memory unit.

Description

一种分页存储器件的压缩管理方法及装置Compression management method and device for paging memory device

技术领域technical field

本发明属于存储器技术领域,尤其涉及一种分页存储器件的压缩管理方法及装置。The invention belongs to the technical field of memory, and in particular relates to a compression management method and device for a paging memory device.

背景技术Background technique

当前通用的数据压缩方法包括无损数据压缩和有损数据压缩,如基于统计模型的压缩技术和基于字典模型的压缩技术为无损数据压缩技术;多媒体数据压缩包括音频压缩、图像压缩、视频压缩等则大多为有损数据压缩技术;对存储设备而言,必须使用无损的数据压缩方式。The current common data compression methods include lossless data compression and lossy data compression, such as compression technology based on statistical model and compression technology based on dictionary model are lossless data compression technology; multimedia data compression includes audio compression, image compression, video compression, etc. Most of them are lossy data compression techniques; for storage devices, lossless data compression methods must be used.

现有的无损数据压缩方式包括:Existing lossless data compression methods include:

一、字典型数据压缩方法,如表1所示:One, the dictionary type data compression method, as shown in table 1:

IndexIndex StringString 11 PrincePrince 22 Lovelove 33 PrincessPrincess 44 ForeverForever

表1Table 1

例如需要压缩Yes!PrincelovePrincessforever则根据表1提供的字典得到压缩结果为:For example, need to compress Yes! PrincelovePrincessforever gets the compression result according to the dictionary provided in Table 1:

Yes!&1&2&3&4(其中&是特殊符号,如果需要输出&则压缩数据为&&)Yes! &1&2&3&4 (where & is a special symbol, if you need to output &, the compressed data is &&)

缺点:字典的大小和好坏直接影响压缩结果。大的字典开销较大,小的字典无法满足压缩要求,例如通过表1提供的字典压缩Iamhappytoday!则完全无法压缩。Disadvantages: The size and quality of the dictionary directly affect the compression results. A large dictionary has a large overhead, and a small dictionary cannot meet the compression requirements. For example, the dictionary compression Iamhappytoday! provided in Table 1 cannot be compressed at all.

二、行程编码型数据压缩方法2. Run-length encoding data compression method

行程编码的基本原理是:通过一个符号值或串长代替具有相同值的连续符号(连续符号构成了一段连续的“行程”),使符号长度少于原始数据的长度。在各行或者各列数据的代码发生变化时,一次记录该代码及相同代码重复的个数,从而实现数据的压缩。The basic principle of run-length encoding is to replace consecutive symbols with the same value by a symbol value or string length (consecutive symbols constitute a continuous "run"), so that the symbol length is less than the length of the original data. When the code of each row or column of data changes, the code and the number of repetitions of the same code are recorded at one time, so as to realize data compression.

例如:5555557777733322221111111For example: 5555557777733322221111111

行程编码为:(5,6)(7,5)(3,3)(2,4)(1,7)。可见,行程编码的位数远远少于原始字符串的位数。The run-length encoding is: (5, 6) (7, 5) (3, 3) (2, 4) (1, 7). It can be seen that the number of bits of the run-length encoding is far less than that of the original string.

然而,并不是所有的行程编码都远远少于原始字符串的位数,例如:555555是6个字符,而(5,6)是5个字符,这也存在压缩量的问题。另外,行程编码针对随机数据的压缩出来的编码可能比没压缩还大。However, not all run-length encodings have far fewer digits than the original string, for example: 555555 is 6 characters, and (5,6) is 5 characters, which also has a problem with the amount of compression. In addition, the compressed encoding of random data by run-length encoding may be larger than that without compression.

三、哈夫曼型数据压缩方法3. Huffman-type data compression method

现有的哈夫曼型数据压缩方法在进行数据压缩时需要构建哈夫曼树,空间开销过大。而且,压缩后的长度是一个变化值,不便于以页为最小存储单元的存储器件,例如对8k的数据进行压缩时,压缩完可能是7.5k,对于以页为最小存储单元的存储器件来说,如果其页的大小为8k,则压缩后剩余的0.5k基本无法使用。The existing Huffman-type data compression method needs to construct a Huffman tree when performing data compression, and the space overhead is too large. Moreover, the length after compression is a variable value, which is not convenient for storage devices with pages as the minimum storage unit. For example, when compressing 8k data, it may be 7.5k after compression. For storage devices with pages as the minimum storage unit Say, if its page size is 8k, the remaining 0.5k after compression is basically unusable.

综上所述,现有的数据压缩方法存在以下问题:(1)对运算开销过大或者对存储空间要求过高;(2)压缩的效果无法保证;(3)压缩结果不便于以页为最小存储单元的存储器件管理。To sum up, the existing data compression methods have the following problems: (1) The computational overhead is too large or the storage space requirements are too high; (2) The compression effect cannot be guaranteed; (3) The compression result is not easy to use as a page. Storage device management of the smallest storage unit.

发明内容Contents of the invention

本发明实施例的目的在于提供一种分页存储器件的压缩管理方法,以解决现有的压缩管理方法对运算开销过大或者对存储空间要求过高、压缩效果无法保证、且压缩结果不便于以页为最小存储单元的存储器件管理的问题。The purpose of the embodiments of the present invention is to provide a compression management method for a paging storage device, so as to solve the problem that the existing compression management method requires too much computing overhead or too high storage space requirements, the compression effect cannot be guaranteed, and the compression result is not easy to use A problem of memory device management in which a page is the smallest memory unit.

本发明实施例是这样实现的,一种分页存储器件的压缩管理方法,所述方法包括:The embodiment of the present invention is implemented in this way, a compression management method of a paging storage device, the method comprising:

步骤A、以页为单元建立字典,所述字典内记录有页数据、所述页数据对应的特征码以及所述页数据写入的物理页地址;Step A, establish a dictionary with the page as a unit, and record page data, the feature code corresponding to the page data, and the physical page address where the page data is written in the dictionary;

步骤B、获取写命令,所述写命令包含有待写入页数据以及所述待写入页数据写入的逻辑页地址;获取所述待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址。Step B, obtain a write command, the write command includes the page data to be written and the logical page address of the page data to be written; obtain the feature code of the page data to be written, and determine whether the dictionary contains There is a feature code identical to the acquired feature code; if not, execute the write command, write the page data to be written according to the logical page address written in the page data to be written; if so, do not execute the The write command directs the logical page address of the page data to be written to the physical page address of the page data having the same feature code as the page data to be written.

本发明实施例的另一目的在于提供一种分页存储器件的压缩管理装置,所述装置包括:Another object of the embodiments of the present invention is to provide a compression management device for a paging storage device, the device comprising:

字典建立单元,用于以页为单元建立字典,所述字典内记录有页数据、所述页数据对应的特征码以及所述页数据写入的物理页地址;A dictionary building unit, configured to create a dictionary in units of pages, where page data, feature codes corresponding to the page data, and physical page addresses where the page data is written are recorded in the dictionary;

处理单元,用于获取写命令,所述写命令包含有待写入页数据以及所述待写入页数据写入的逻辑页地址;获取所述待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址。A processing unit, configured to obtain a write command, the write command includes the page data to be written and the logical page address of the page data to be written; obtain the feature code of the page data to be written, and judge the dictionary Whether there is the same feature code as the acquired feature code in the system; if not, execute the write command, and write the page data to be written according to the logical page address written by the page data to be written; if so, do not Executing the write command, pointing the logical page address of the page data to be written to the physical page address of the page data having the same feature code as the page data to be written.

从上述技术方案可以看出,本发明实施例中数据压缩的最小单位是页,页内的数据不进行压缩和解压缩,数据的写入和读出也是以页为单位,因此本发明实施例提供的压缩管理方法非常适合采用运算能力不足或者成本低的分页存储器件。而且本发明实施例提供的压缩管理方法,在当前写命令中待写入的数据已经存入字典的话,则不写入数据,因此能有效提高分页存储器件的数据写入效率,减少数据的写入次数,进而减少分页存储器件的磨损。It can be seen from the above technical solutions that the smallest unit of data compression in the embodiment of the present invention is a page, the data in the page is not compressed and decompressed, and the writing and reading of data is also based on the page, so the embodiment of the present invention provides The compression management method is very suitable for paging storage devices with insufficient computing power or low cost. Moreover, in the compression management method provided by the embodiment of the present invention, if the data to be written in the current write command has been stored in the dictionary, the data will not be written, so the data writing efficiency of the paging storage device can be effectively improved, and the writing of data can be reduced. The number of entries, thereby reducing the wear and tear of paging memory devices.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only of the present invention. For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without paying creative efforts.

图1是本发明实施例一提供的分页存储器件压缩管理方法的实现流程图;FIG. 1 is a flow chart for realizing the compression management method of a paging storage device provided by Embodiment 1 of the present invention;

图2是本发明实施例二提供的分页存储器件压缩管理方法的实现流程图;FIG. 2 is a flow chart for realizing the compression management method of the paging storage device provided by Embodiment 2 of the present invention;

图3是本发明实施例三提供的分页存储器件压缩管理方法的实现流程图;Fig. 3 is a flow chart of realizing the compression management method of the paging storage device provided by the third embodiment of the present invention;

图4是本发明实施例四提供的分页存储器件压缩管理装置的组成结构图。FIG. 4 is a structural diagram of an apparatus for managing compression of a paging storage device according to Embodiment 4 of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

为了说明本发明所述的技术方案,下面通过具体实施例来进行说明。In order to illustrate the technical solutions of the present invention, specific examples are used below to illustrate.

实施例一:Embodiment one:

图1示出了本发明实施例一提供的分页存储器件压缩管理方法的实现流程,该方法过程详述如下:Fig. 1 shows the implementation flow of the paging storage device compression management method provided by Embodiment 1 of the present invention, and the process of the method is described in detail as follows:

在步骤S101中,以页为单元建立字典,所述字典内记录有页数据、所述页数据对应的特征码以及所述页数据写入的物理页地址等信息。In step S101, a page is used as a unit to establish a dictionary, and information such as page data, a feature code corresponding to the page data, and a physical page address in which the page data is written is recorded in the dictionary.

在本实施例中,所述分页存储器件为以页为最小存储单元的存储器件,即按块擦除、按页读写的存储器件,例如闪存存储器件等。所述页数据为页内存储的数据。In this embodiment, the paged storage device is a storage device with a page as the smallest storage unit, that is, a storage device that is erased by block and read and written by page, such as a flash memory device. The page data is data stored in the page.

在步骤S102中,获取写命令,所述写命令包含有待写入页数据以及所述待写入页数据写入的逻辑页地址等信息。In step S102, a write command is acquired, and the write command includes information such as page data to be written and a logical page address to be written into the page data to be written.

在步骤S103中,获取所述待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码,若判断结果为“是”,则执行步骤S105,若判断结果为“否”,则执行步骤S104。In step S103, the feature code of the page data to be written is obtained, and it is judged whether there is a feature code identical to the acquired feature code in the dictionary, if the judgment result is "Yes", step S105 is executed, If the result is "no", execute step S104.

在本实施例中,获取所述待写入页数据的特征码,将获取的特征码与所述字典中的所有特征码进行比较,判断所述字典中是否存在与所获取的特征码相同的特征码,若是,则表示当前写命令中的待写入页数据已存入所述字典,执行步骤S105;若否,执行步骤S104。In this embodiment, the feature code of the page data to be written is acquired, the acquired feature code is compared with all the feature codes in the dictionary, and it is judged whether there is a feature code identical to the acquired feature code in the dictionary. If the feature code is yes, it means that the page data to be written in the current write command has been stored in the dictionary, and step S105 is executed; if not, step S104 is executed.

本实施例中所述待写入页数据的特征码以及所述字典中页数据的特征码都可采用相同的算法获取。优选的是,通过哈希算法获取。当然也可以采用其他算法获取,在此并不用以限制本发明。In this embodiment, the feature codes of the page data to be written and the feature codes of the page data in the dictionary can be obtained using the same algorithm. Preferably, it is obtained through a hash algorithm. Of course, other algorithms can also be used to obtain, which is not used to limit the present invention.

哈希(Hash)算法属于对一个大的数据进行消息摘录,得到该段数据的特征码,主要用于密码验证和数据完整性验证,同时也可以用于快速判断两段数据是否相同。Hash (Hash) algorithm belongs to the message extraction of a large data, to obtain the feature code of the data, mainly used for password verification and data integrity verification, but also can be used to quickly determine whether two pieces of data are the same.

以MD5哈希算法为例,例如表2中如下的几段数据:Take the MD5 hash algorithm as an example, such as the following pieces of data in Table 2:

页数据page data MD5值MD5 value 1234567890abcdefg1234567890abcdefg 7206ddfa511b6ab05734b603c1b88be67206ddfa511b6ab05734b603c1b88be6 2234567890abcdefg2234567890abcdefg 628cb0947d31075fa305699029ddfecc628cb0947d31075fa305699029ddfecc 0234567890abcdefg0234567890abcdefg df9e94f848b6608dd2af317b693744badf9e94f848b6608dd2af317b693744ba 1234567890abcdefgh1234567890abcdefgh ad91a28a12e83ce4999b23260f7785c3ad91a28a12e83ce4999b23260f7785c3 1234567890abcdef1234567890abcdef 996ce17f6abc9fe126b57aa5f1d8c92c996ce17f6abc9fe126b57aa5f1d8c92c 1234567890abcdefg1234567890abcdefg 7206ddfa511b6ab05734b603c1b88be67206ddfa511b6ab05734b603c1b88be6

表2Table 2

从表2可以看到,即使页数据只改变一点点,经哈希算法计算后的特征码都会产生大幅度变动,因此采用哈希算法计算特征码,能够降低重码率。As can be seen from Table 2, even if the page data changes only a little, the feature code calculated by the hash algorithm will change greatly. Therefore, using the hash algorithm to calculate the feature code can reduce the repetition rate.

在步骤S104中,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据,即将所述待写入页数据存储至与所述逻辑地址对应的物理地址。In step S104, the write command is executed, and the page data to be written is written according to the logical page address written by the page data to be written, that is, the page data to be written is stored in the logical address corresponding physical address.

在步骤S105中,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址。In step S105, the write command is not executed, and the logical page address of the page data to be written is pointed to the physical page address of the page data having the same feature code as the page data to be written.

在本实施例中,在所述写命令中的待写入页数据之前已写入过,即已存在于分页存储器件中时,不执行所述写指令,只将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,并提醒用户该数据已写入过,因此能有效提高分页存储器件的数据写入效率,减少数据的写入次数,进而减少分页存储器件的磨损。In this embodiment, when the page data to be written in the write command has been written before, that is, when it already exists in the paging storage device, the write command is not executed, and only the page data to be written The written logical page address points to the physical page address written by the page data having the same feature code as the page data to be written, and reminds the user that the data has been written, so the data writing of the paging memory device can be effectively improved Efficiency, reducing the number of data writes, thereby reducing the wear and tear of paging storage devices.

需要说明的是,本实施例所述字典中记录的页数据地址为物理页地址,但也可以为逻辑页地址,在写命令中的待写入页数据之前已写入过时,只需将当前写命令中待写入页数据的写入逻辑页地址指向具有相同特征码的原始数据写入的逻辑页地址即可。另外,当记录的是逻辑页地址时,在读出数据时需要将逻辑页地址转换为物理页地址。It should be noted that the page data address recorded in the dictionary described in this embodiment is a physical page address, but it can also be a logical page address, and the page data to be written in the write command has been written out of date before, and only the current The write logical page address of the page data to be written in the write command may point to the logical page address of the original data with the same feature code. In addition, when a logical page address is recorded, it is necessary to convert the logical page address into a physical page address when reading data.

作为本发明的另一优选实施例,为了降低重码率,所述获取写命令中待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址具体包括:As another preferred embodiment of the present invention, in order to reduce the code repetition rate, the feature code of the page data to be written in the write command is obtained, and it is judged whether there is a feature code identical to the acquired feature code in the dictionary; if No, execute the write command, write the page data to be written according to the logical page address written in the page data to be written; if so, do not execute the write command, write the page data to be written The imported logical page address points to the physical page address written into the page data having the same feature code as the page data to be written specifically includes:

在所述分页存储器件支持多种哈希算法时,对当前写命令中的待写入页数据进行多次消息摘录,获取每次消息摘录后的特征码,判断所述字典中是否存在与所述每次消息摘录后的特征码都相同的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据多次消息摘录后的特征码;When the paging storage device supports a variety of hash algorithms, perform multiple message excerpts on the page data to be written in the current write command, obtain the feature code after each message excerpt, and judge whether there is a corresponding feature code in the dictionary The feature code after each message excerpt is the same feature code, if not, execute the current write command, and write the page data to be written according to the logical page address of the page data to be written in the current write command; if , the write command is not executed, and the logical page address of the page data to be written in the current write command is pointed to the physical page address of the page data having the same feature code as the page data to be written, wherein the The dictionary records the feature code after multiple message excerpts of the page data;

或者,在所述分页存储器件支持一种哈希算法时,对当前写命令中的待写入页数据进行多次不同偏移的消息摘录,获取每次不同偏移消息摘录后的特征码,判断所述字典中是否存在与所述每次不同偏移消息摘录后的特征码都相同的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据多次不同偏移消息摘录后的特征码;Or, when the paging storage device supports a hash algorithm, multiple message excerpts with different offsets are performed on the page data to be written in the current write command, and the feature code after each excerpted message with different offsets is obtained, Judging whether there is a feature code in the dictionary that is the same as the feature code extracted from each different offset message, if not, execute the current write command, and write the logical page according to the page data to be written in the current write command The address is written to the page data to be written; if so, the write command is not executed, and the logical page address to be written in the page data to be written in the current write command is directed to the same feature code as the page data to be written The address of the physical page where the page data is written, wherein the dictionary records the feature code after extracting multiple different offset messages of the page data;

或者,在所述分页存储器件支持错误检查和纠正ECC码时,获取当前写命令中待写入页数据的ECC码和消息摘录后的特征码,判断所述字典中是否同时存在该ECC码和消息摘录后的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据的ECC码和消息摘录后的特征码;Or, when the paging storage device supports error checking and correcting ECC codes, obtain the ECC codes of the page data to be written in the current write command and the feature codes after message excerpts, and judge whether the ECC codes and ECC codes exist in the dictionary at the same time. The feature code after the message is extracted, if not, execute the current write command, write the page data to be written according to the logical page address of the page data to be written in the current write command; if not, do not execute the write command, Point the logical page address written in the page data to be written in the current write command to the physical page address written in the page data having the same feature code as the page data to be written, wherein the ECC of the page data is recorded in the dictionary Code and message excerpted feature code;

或者,获取当前写命令中待写入页数据的特征码,在所述字典中存在与所获取的特征码相同的特征码时,判断所述待写入页数据中的N个字节与所述字典中相同特征码对应的页数据中的N个字节是否全部相同,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述N为大于零且小于或者等于所述分页存储器中页的大小的整数,即如果所述分页存储器中页的大小为8k字节,则所述N小于或等于8k字节。需要说明的是,为了不影响数据压缩效率,上述比较过程可以采用硬件比较器实现。Or, obtain the feature code of the page data to be written in the current write command, and when there is a feature code identical to the acquired feature code in the dictionary, judge that the N bytes in the page data to be written are consistent with the Whether the N bytes in the page data corresponding to the same feature code in the dictionary are all the same, if not, execute the current write command, and write the to-be-written Enter the page data; if, do not execute the write command, the logical page address of the page data to be written in the current write command is directed to the physical page written by the page data with the same feature code as the page data to be written address, wherein the N is an integer greater than zero and less than or equal to the size of a page in the paging memory, that is, if the page size in the paging memory is 8k bytes, then the N is less than or equal to 8k bytes. It should be noted that, in order not to affect the data compression efficiency, the above comparison process may be implemented by using a hardware comparator.

优选的是,为了保证在字典较大的情况下,能进一步提高数据压缩的效率,所述所述获取写命令中待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址具体包括:Preferably, in order to ensure that the efficiency of data compression can be further improved in the case of a large dictionary, the feature code of the page data to be written in the said acquisition write command is determined to determine whether there is a feature code corresponding to the acquired data in the dictionary. The same feature code of the feature code; if not, execute the write command, write the page data to be written according to the logical page address written in the page data to be written; if not, execute the write command, and write The logical page address written in the page data to be written points to the physical page address written in the page data having the same feature code as the page data to be written specifically includes:

获取当前写命令中待写入页数据的特征码,对所述特征码进行消息摘录获得二次摘录值,判断所述字典中是否存在所述二次摘录值,若存在,再判断所述字典中是否存在所述特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据对应特征码的二次摘录值。其中,所述二次摘录值小于特征码的大小且在重码率允许的范围内。Obtain the feature code of the page data to be written in the current write command, perform message extraction on the feature code to obtain a secondary extraction value, and determine whether the secondary extraction value exists in the dictionary, and if so, determine the dictionary Whether there is the feature code in, if not, execute the current write command, write the page data to be written according to the logical page address of the page data to be written in the current write command; if so, do not execute the write command Pointing the logical page address of the page data to be written in the current write command to the physical page address of the page data having the same feature code as the page data to be written, wherein the dictionary records the corresponding page data The secondary excerpt value of the signature. Wherein, the second extraction value is smaller than the size of the feature code and within the allowable range of the repetition rate.

在本实施例中,预先建立特征码表和二次摘录值表,所述特征码表和二次摘录值表分别用于记录页数据的特征码以及所述特征码的二次摘录值。由于刚开始只需匹配数据量小的二次摘录值表,只有在出现相同的二次摘录值的情况下,才需要比较数据量大的特征码表,因此,能够大大提高匹配效率,进而提高数据压缩效率。In this embodiment, a feature code table and a secondary extraction value table are established in advance, and the feature code table and the secondary extraction value table are respectively used to record the feature code of the page data and the secondary extraction value of the feature code. Since it is only necessary to match the secondary extraction value table with a small amount of data at the beginning, only when the same secondary extraction value appears, it is necessary to compare the feature code table with a large amount of data. Therefore, the matching efficiency can be greatly improved, and then the Data compression efficiency.

以下通过举例来详细说明:The following is an example to illustrate in detail:

在字典较大的情况下,例如有1024个页,每页数据采用哈希算法进行消息摘录后生成的特征码为32字节,则生成的特征码表有1024×32字节,在这样的范围内进行搜索对系统硬件要求较高。所以对所述特征码表进行二次摘录,该二次摘录后的值远小于32字节,且在重码率允许的范围内,如表3所示。In the case of a large dictionary, for example, there are 1024 pages, and the signature code generated after using the hash algorithm to extract the message on each page is 32 bytes, and the generated signature table has 1024×32 bytes. Searching within range requires high system hardware. Therefore, the second extraction is performed on the feature code table, and the value after the second extraction is much smaller than 32 bytes, and is within the allowable range of the repetition rate, as shown in Table 3.

原始数据Raw data 特征码值signature value 二次摘录值secondary excerpt value 555555555555555555555555555555555555 130668ae34911a14e4288367b071447a130668ae34911a14e4288367b071447a 130668ae130668ae aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 22d42eb002cefa81e9ad604ea57bc01d22d42eb002cefa81e9ad604ea57bc01d 22d42eb022d42eb0 000000000000000000000000000000000000 3ea032bf79e8c116b05f4698d5a8e0443ea032bf79e8c116b05f4698d5a8e044 3ea032bf3ea032bf ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff c78068577ec5dc1d06605f291a79174fc78068577ec5dc1d06605f291a79174f c7806857c7806857

表3table 3

表3中取特征码值(MD5值)前4个字节为二次哈希值(二次摘录值),用于快速搜索特征码表中是否有匹配的二次摘录值,如果字典中存在与当前写命令中待写入页数据的二次摘录值相同的值,则再进行特征码值的匹配。In Table 3, the first 4 bytes of the feature code value (MD5 value) are taken as the secondary hash value (secondary excerpt value), which is used to quickly search whether there is a matching secondary excerpt value in the feature code table, if it exists in the dictionary If the value is the same as the second excerpt value of the page data to be written in the current write command, then the feature code value is matched.

本实施例中数据压缩的最小单位是页,页内的数据不进行压缩和解压缩,数据的写入和读出也是以页为单位,因此本发明实施例提供的压缩管理方法非常适合采用运算能力不足或者成本低的分页存储器件。而且本发明实施例提供的压缩管理方法,在当前写命令中待写入的数据已经存入字典的话,则不写入数据,因此能有效提高分页存储器件的数据写入效率,减少数据的写入次数,进而减少分页存储器件的磨损。The smallest unit of data compression in this embodiment is a page, the data in the page is not compressed and decompressed, and the writing and reading of data is also based on the page, so the compression management method provided by the embodiment of the present invention is very suitable for using computing power Insufficient or low-cost paged memory devices. Moreover, in the compression management method provided by the embodiment of the present invention, if the data to be written in the current write command has been stored in the dictionary, the data will not be written, so the data writing efficiency of the paging storage device can be effectively improved, and the writing of data can be reduced. The number of entries, thereby reducing the wear and tear of paging storage devices.

实施例二:Embodiment two:

图2示出了本发明实施例二提供的分页存储器件压缩管理方法的实现流程,该方法过程详述如下:FIG. 2 shows the implementation process of the paging storage device compression management method provided by Embodiment 2 of the present invention, and the process of the method is described in detail as follows:

在步骤S201中,以页为单元建立静态字典,所述静态字典内记录有写入次数大于或者等于第一阈值的页数据和/或用户关心的页数据、所述页数据对应的特征码以及所述页数据写入的物理页地址等信息。In step S201, a static dictionary is established with a page as a unit, and the static dictionary records page data whose write times are greater than or equal to the first threshold and/or page data that the user cares about, the feature code corresponding to the page data, and Information such as the physical page address where the page data is written.

在本实施例中,所述静态字典是固定在所述分页存储器件内,不允许更新的字典,所述静态字典可以根据以下条件创建:In this embodiment, the static dictionary is a dictionary that is fixed in the paging storage device and does not allow updating, and the static dictionary can be created according to the following conditions:

1、根据所述分页存储器件的存储容量,设定满足一定比例(如1%)的字典大小(存储容量),例如当分页存储器件的存储容量为100M时,字典的大小(存储容量)为1M;1. According to the storage capacity of the paging storage device, set the dictionary size (storage capacity) that satisfies a certain percentage (such as 1%). For example, when the storage capacity of the paging storage device is 100M, the dictionary size (storage capacity) is 1M;

2、根据所述分页存储器件的主要应用范围进行大量采样,统计分析以后得出最佳字典;例如,将写入次数大于或者等于第一阈值(10次)的数据作为字典中的页数据;2. Carry out a large number of samples according to the main application range of the paging storage device, and obtain the best dictionary after statistical analysis; for example, use the data whose writing times are greater than or equal to the first threshold (10 times) as the page data in the dictionary;

3、根据用户选择自身需要(即用户关心的数据),设定字典内容。3. Set the content of the dictionary according to the user's own needs (that is, the data that the user cares about).

在步骤S202中,获取写命令,所述写命令包含有待写入页数据以及所述待写入页数据写入的逻辑页地址等信息;In step S202, a write command is obtained, and the write command includes information such as the page data to be written and the logical page address to be written by the page data to be written;

在步骤S203中,获取所述待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码,若判断结果为“是”,则执行步骤S205,若判断结果为“否”,则执行步骤S204;In step S203, the feature code of the page data to be written is obtained, and it is judged whether there is a feature code identical to the acquired feature code in the dictionary, and if the judgment result is “Yes”, then step S205 is executed, and if it is judged If the result is "no", execute step S204;

优选的是,判断所述静态字典中是否存在与所获取的特征码相同的特征码,若判断结果为“是”,则执行步骤S205,若判断结果为“否”,则执行步骤S204;Preferably, it is judged whether there is a feature code identical to the acquired feature code in the static dictionary, if the judgment result is "yes", then perform step S205, if the judgment result is "no", then perform step S204;

在步骤S204中,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据,即将所述待写入页数据存储至与所述逻辑地址对应的物理地址;In step S204, the write command is executed, and the page data to be written is written according to the logical page address written by the page data to be written, that is, the page data to be written is stored in the logical address Corresponding physical address;

在步骤S205中,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址。In step S205, the write command is not executed, and the logical page address of the page data to be written is pointed to the physical page address of the page data having the same feature code as the page data to be written.

作为本发明的另一优选实施例,为了降低重码率,所述获取写命令中待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址具体包括:As another preferred embodiment of the present invention, in order to reduce the code repetition rate, the feature code of the page data to be written in the write command is obtained, and it is judged whether there is a feature code identical to the acquired feature code in the dictionary; if No, execute the write command, write the page data to be written according to the logical page address written in the page data to be written; if so, do not execute the write command, write the page data to be written The imported logical page address points to the physical page address written into the page data having the same feature code as the page data to be written specifically includes:

在所述分页存储器件支持多种哈希算法时,对当前写命令中的待写入页数据进行多次消息摘录,获取每次消息摘录后的特征码,判断所述字典中是否存在与所述每次消息摘录后的特征码都相同的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据多次消息摘录后的特征码;When the paging storage device supports a variety of hash algorithms, perform multiple message excerpts on the page data to be written in the current write command, obtain the feature code after each message excerpt, and judge whether there is a corresponding feature code in the dictionary The feature code after each message excerpt is the same feature code, if not, execute the current write command, and write the page data to be written according to the logical page address of the page data to be written in the current write command; if , the write command is not executed, and the logical page address of the page data to be written in the current write command is pointed to the physical page address of the page data having the same feature code as the page data to be written, wherein the The dictionary records the feature code after multiple message excerpts of the page data;

或者,在所述分页存储器件支持一种哈希算法时,对当前写命令中的待写入页数据进行多次不同偏移的消息摘录,获取每次不同偏移消息摘录后的特征码,判断所述字典中是否存在与所述每次不同偏移消息摘录后的特征码都相同的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据多次不同偏移消息摘录后的特征码;Or, when the paging storage device supports a hash algorithm, multiple message excerpts with different offsets are performed on the page data to be written in the current write command, and the feature code after each excerpted message with different offsets is obtained, Judging whether there is a feature code in the dictionary that is the same as the feature code extracted from each different offset message, if not, execute the current write command, and write the logical page according to the page data to be written in the current write command The address is written to the page data to be written; if so, the write command is not executed, and the logical page address to be written in the page data to be written in the current write command is directed to the same feature code as the page data to be written The address of the physical page where the page data is written, wherein the dictionary records the feature code after multiple excerpts of different offset messages of the page data;

或者,在所述分页存储器件支持错误检查和纠正ECC码时,获取当前写命令中待写入页数据的ECC码和消息摘录后的特征码,判断所述字典中是否同时存在该ECC码和消息摘录后的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据的ECC码和消息摘录后的特征码;Or, when the paging storage device supports error checking and correcting ECC codes, obtain the ECC codes of the page data to be written in the current write command and the feature codes after message excerpts, and judge whether the ECC codes and ECC codes exist in the dictionary at the same time. The feature code after the message is extracted, if not, execute the current write command, write the page data to be written according to the logical page address of the page data to be written in the current write command; if not, do not execute the write command, Point the logical page address written in the page data to be written in the current write command to the physical page address written in the page data having the same feature code as the page data to be written, wherein the ECC of the page data is recorded in the dictionary Code and message excerpted feature code;

或者,获取当前写命令中待写入页数据的特征码,在所述字典中存在与所获取的特征码相同的特征码时,判断所述待写入页数据中的N个字节与所述字典中相同特征码对应的页数据中的N个字节是否全部相同,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述N为大于零且小于或者等于所述分页存储器中页的大小的整数,即如果所述分页存储器中页的大小为8k字节,则所述N小于或等于8k字节。需要说明的是,为了不影响数据压缩效率,上述比较过程可以采用硬件比较器实现。Or, obtain the feature code of the page data to be written in the current write command, and when there is a feature code identical to the acquired feature code in the dictionary, judge that the N bytes in the page data to be written are consistent with the Whether the N bytes in the page data corresponding to the same feature code in the dictionary are all the same, if not, execute the current write command, and write the to-be-written Enter the page data; if, do not execute the write command, the logical page address of the page data to be written in the current write command is directed to the physical page written by the page data with the same feature code as the page data to be written address, wherein the N is an integer greater than zero and less than or equal to the size of a page in the paging memory, that is, if the page size in the paging memory is 8k bytes, then the N is less than or equal to 8k bytes. It should be noted that, in order not to affect the data compression efficiency, the above comparison process may be implemented by using a hardware comparator.

优选的是,为了保证在字典较大的情况下,能进一步提高数据压缩的效率,所述所述获取写命令中待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址具体包括:Preferably, in order to ensure that the efficiency of data compression can be further improved in the case of a large dictionary, the feature code of the page data to be written in the said acquisition write command is determined to determine whether there is a feature code corresponding to the acquired data in the dictionary. The same feature code of the feature code; if not, execute the write command, write the page data to be written according to the logical page address written in the page data to be written; if not, execute the write command, and write The logical page address written in the page data to be written points to the physical page address written in the page data having the same feature code as the page data to be written specifically includes:

获取当前写命令中待写入页数据的特征码,对所述特征码进行消息摘录获得二次摘录值,判断所述字典中是否存在所述二次摘录值,若存在,再判断所述字典中是否存在所述特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据对应特征码的二次摘录值。其中,所述二次摘录值小于特征码的大小且在重码率允许的范围内。Obtain the feature code of the page data to be written in the current write command, perform message extraction on the feature code to obtain a secondary extraction value, and determine whether the secondary extraction value exists in the dictionary, and if so, determine the dictionary Whether there is the feature code in, if not, execute the current write command, write the page data to be written according to the logical page address of the page data to be written in the current write command; if so, do not execute the write command Pointing the logical page address of the page data to be written in the current write command to the physical page address of the page data having the same feature code as the page data to be written, wherein the dictionary records the corresponding page data The secondary excerpt value of the signature. Wherein, the second extraction value is smaller than the size of the feature code and within the allowable range of the repetition rate.

在本实施例中,预先建立特征码表和二次摘录值表,所述特征码表和二次摘录值表分别用于记录页数据的特征码以及所述特征码的二次摘录值。由于刚开始只需匹配数据量小的二次摘录值表,只有在出现相同的二次摘录值的情况下,才需要比较数据量大的特征码表,因此,能够大大提高匹配效率,进而提高数据压缩效率。In this embodiment, a feature code table and a secondary extraction value table are established in advance, and the feature code table and the secondary extraction value table are respectively used to record the feature code of the page data and the secondary extraction value of the feature code. Since it is only necessary to match the secondary extraction value table with a small amount of data at the beginning, only when the same secondary extraction value occurs, it is necessary to compare the feature code table with a large amount of data. Therefore, the matching efficiency can be greatly improved, and then the Data compression efficiency.

在本实施例中,根据预先建立的静态字典来判断所述待写入页数据是否之前已写入,由于该静态字典中包含经常性写入的数据,从而可有效提高判断的效率。In this embodiment, it is judged according to the pre-established static dictionary whether the page data to be written has been written before, since the static dictionary contains frequently written data, the efficiency of judgment can be effectively improved.

实施例三:Embodiment three:

图3示出了本发明实施例三提供的分页存储器件压缩管理方法的实现流程,该方法过程详述如下:FIG. 3 shows the implementation process of the paging storage device compression management method provided by Embodiment 3 of the present invention. The process of the method is described in detail as follows:

在步骤S301中,以页为单元建立动态字典,所述动态字典内记录有页数据、所述页数据对应的特征码以及所述页数据写入的物理页地址等信息。In step S301, a dynamic dictionary is established with a page as a unit, and information such as page data, a feature code corresponding to the page data, and a physical page address where the page data is written is recorded in the dynamic dictionary.

在本实施例中,所述动态字典为允许被更新的字典。In this embodiment, the dynamic dictionary is a dictionary that is allowed to be updated.

在步骤S302中,设定第二阈值(例如10次),对写入分页存储器件的页数据的写入次数进行计数。In step S302, a second threshold (for example, 10 times) is set, and the number of writing times of page data written into the paged memory device is counted.

在步骤S303中,获取写命令,所述写命令包含有待写入页数据以及所述待写入页数据写入的逻辑页地址。In step S303, a write command is obtained, the write command includes the page data to be written and the logical page address in which the page data to be written is written.

在步骤S304中,获取所述待写入页数据的特征码,判断所述动态字典中是否存在与所获取的特征码相同的特征码,如果判断结果为“是”,则执行步骤S305,如果判断结果为“否”,则执行步骤S306。In step S304, obtain the feature code of the page data to be written, judge whether there is a feature code identical to the acquired feature code in the dynamic dictionary, if the judgment result is "yes", then perform step S305, if If the judgment result is "No", then step S306 is executed.

在步骤S305中,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址。In step S305, the write command is not executed, and the logical page address of the page data to be written is pointed to the physical page address of the page data having the same feature code as the page data to be written.

在步骤S306中,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据。In step S306, the write command is executed, and the page data to be written is written according to the logical page address of the page data to be written.

在步骤S307中,判断所述待写入页数据的写入次数是否达到所述第二阈值,如果判断结果为“是”,则执行步骤S308,如果判断结果为“否”,则结束本次操作。In step S307, it is judged whether the number of write times of the page data to be written has reached the second threshold, if the judgment result is “Yes”, then step S308 is executed, if the judgment result is “No”, then this time is ended operate.

在步骤S308中,将该待写入页数据更新到所述动态字典中。In step S308, the page data to be written is updated into the dynamic dictionary.

具体包括该待写入页数据、该待写入页数据对应的特征码以及该待写入页数据写入的物理页地址等信息更新到所述动态字典中。Specifically, information including the page data to be written, the feature code corresponding to the page data to be written, and the address of the physical page where the page data to be written is written is updated into the dynamic dictionary.

作为本发明的另一优选实施例,为了防止所述动态字典变得越来越大,影响数据压缩及搜索的效率,还包括:As another preferred embodiment of the present invention, in order to prevent the dynamic dictionary from becoming larger and larger, affecting the efficiency of data compression and search, it also includes:

当所述动态字典的存储容量达到第三阈值时,删除所述动态字典中无效的条目或者在预设时间内未使用的条目,所述无效的条目为所述动态字典中存储的页数据的写入地址已经写入了其他数据,导致该页数据无效;所述预设时间内未使用的条目为该条目对应的页数据在预设时间内未写入过分页存储器件。When the storage capacity of the dynamic dictionary reaches a third threshold, delete invalid entries in the dynamic dictionary or entries that have not been used within a preset time, and the invalid entries are page data stored in the dynamic dictionary The write address has already written other data, which makes the page data invalid; the unused entry within the preset time means that the page data corresponding to the entry has not been written into the paged storage device within the preset time.

或者,在所述页数据的写入次数达到所述第二阈值之后继续对所述页数据的写入次数进行计数,当所述动态字典的存储容量达到第三阈值时,删除所述动态字典中写入次数小于第四阈值的页数据。例如全0x8f的页数据写入次数达到了10次,则该页数据被更新到所述动态字典,但所述动态字典内各页数据的计数仍在继续,当所述动态字典的存储容量达到第三阈值,全0x8f页数据的写入次数小于第四阈值(如15次)时,将全0x8f页数据从所述动态字典中删除。Or, continue counting the number of writes of the page data after the number of writes of the page data reaches the second threshold, and delete the dynamic dictionary when the storage capacity of the dynamic dictionary reaches the third threshold Page data whose writing times are less than the fourth threshold. For example, the page data write times of all 0x8f reaches 10 times, then the page data is updated to the dynamic dictionary, but the counting of each page data in the dynamic dictionary is still continuing, when the storage capacity of the dynamic dictionary reaches The third threshold, when the writing times of all 0x8f page data is less than the fourth threshold (for example, 15 times), delete all 0x8f page data from the dynamic dictionary.

作为本发明的另一优选实施例,为了降低重码率,所述获取写命令中待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址具体包括:As another preferred embodiment of the present invention, in order to reduce the code repetition rate, the feature code of the page data to be written in the write command is obtained, and it is judged whether there is a feature code identical to the acquired feature code in the dictionary; if No, execute the write command, write the page data to be written according to the logical page address written in the page data to be written; if so, do not execute the write command, write the page data to be written The imported logical page address points to the physical page address written into the page data having the same feature code as the page data to be written specifically includes:

在所述分页存储器件支持多种哈希算法时,对当前写命令中的待写入页数据进行多次消息摘录,获取每次消息摘录后的特征码,判断所述字典中是否存在与所述每次消息摘录后的特征码都相同的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据多次消息摘录后的特征码;When the paging storage device supports a variety of hash algorithms, perform multiple message excerpts on the page data to be written in the current write command, obtain the feature code after each message excerpt, and judge whether there is a corresponding feature code in the dictionary The feature code after each message excerpt is the same feature code, if not, execute the current write command, and write the page data to be written according to the logical page address of the page data to be written in the current write command; if , the write command is not executed, and the logical page address of the page data to be written in the current write command is pointed to the physical page address of the page data having the same feature code as the page data to be written, wherein the The dictionary records the feature code after multiple message excerpts of the page data;

或者,在所述分页存储器件支持一种哈希算法时,对当前写命令中的待写入页数据进行多次不同偏移的消息摘录,获取每次不同偏移消息摘录后的特征码,判断所述字典中是否存在与所述每次不同偏移消息摘录后的特征码都相同的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据多次不同偏移消息摘录后的特征码;Or, when the paging storage device supports a hash algorithm, multiple message excerpts with different offsets are performed on the page data to be written in the current write command, and the feature code after each excerpted message with different offsets is obtained, Judging whether there is a feature code in the dictionary that is the same as the feature code extracted from each different offset message, if not, execute the current write command, and write the logical page according to the page data to be written in the current write command The address is written to the page data to be written; if so, the write command is not executed, and the logical page address to be written in the page data to be written in the current write command is directed to the same feature code as the page data to be written The address of the physical page where the page data is written, wherein the dictionary records the feature code after extracting multiple different offset messages of the page data;

或者,在所述分页存储器件支持错误检查和纠正ECC码时,获取当前写命令中待写入页数据的ECC码和消息摘录后的特征码,判断所述字典中是否同时存在该ECC码和消息摘录后的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据的ECC码和消息摘录后的特征码;Or, when the paging storage device supports error checking and correcting ECC codes, obtain the ECC codes of the page data to be written in the current write command and the feature codes after message excerpts, and judge whether the ECC codes and ECC codes exist in the dictionary at the same time. The feature code after the message is extracted, if not, execute the current write command, write the page data to be written according to the logical page address of the page data to be written in the current write command; if not, do not execute the write command, Point the logical page address written in the page data to be written in the current write command to the physical page address written in the page data having the same feature code as the page data to be written, wherein the ECC of the page data is recorded in the dictionary Code and message excerpted feature code;

或者,获取当前写命令中待写入页数据的特征码,在所述字典中存在与所获取的特征码相同的特征码时,判断所述待写入页数据中的N个字节与所述字典中相同特征码对应的页数据中的N个字节是否全部相同,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述N为大于零且小于或者等于所述分页存储器中页的大小的整数,即如果所述分页存储器中页的大小为8k字节,则所述N小于或等于8k字节。需要说明的是,为了不影响数据压缩效率,上述比较过程可以采用硬件比较器实现。Or, obtain the feature code of the page data to be written in the current write command, and when there is a feature code identical to the acquired feature code in the dictionary, judge that the N bytes in the page data to be written are consistent with the Whether the N bytes in the page data corresponding to the same feature code in the dictionary are all the same, if not, execute the current write command, and write the to-be-written Enter the page data; if, do not execute the write command, the logical page address of the page data to be written in the current write command is directed to the physical page written by the page data with the same feature code as the page data to be written address, wherein the N is an integer greater than zero and less than or equal to the size of a page in the paging memory, that is, if the page size in the paging memory is 8k bytes, then the N is less than or equal to 8k bytes. It should be noted that, in order not to affect the data compression efficiency, the above comparison process may be implemented by using a hardware comparator.

优选的是,为了保证在字典较大的情况下,能进一步提高数据压缩的效率,所述所述获取写命令中待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址具体包括:Preferably, in order to ensure that the efficiency of data compression can be further improved in the case of a large dictionary, the feature code of the page data to be written in the said acquisition write command is determined to determine whether there is a feature code corresponding to the acquired data in the dictionary. The same feature code of the feature code; if not, execute the write command, write the page data to be written according to the logical page address written in the page data to be written; if not, execute the write command, and write The logical page address written in the page data to be written points to the physical page address written in the page data having the same feature code as the page data to be written specifically includes:

获取当前写命令中待写入页数据的特征码,对所述特征码进行消息摘录获得二次摘录值,判断所述字典中是否存在所述二次摘录值,若存在,再判断所述字典中是否存在所述特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据对应特征码的二次摘录值。其中,所述二次摘录值小于特征码的大小且在重码率允许的范围内。Obtain the feature code of the page data to be written in the current write command, perform message extraction on the feature code to obtain a secondary extraction value, and determine whether the secondary extraction value exists in the dictionary, and if so, determine the dictionary Whether there is the feature code in, if not, execute the current write command, write the page data to be written according to the logical page address of the page data to be written in the current write command; if so, do not execute the write command Pointing the logical page address of the page data to be written in the current write command to the physical page address of the page data having the same feature code as the page data to be written, wherein the dictionary records the corresponding page data The secondary excerpt value of the signature. Wherein, the second extraction value is smaller than the size of the feature code and within the allowable range of the repetition rate.

在本实施例中,预先建立特征码表和二次摘录值表,所述特征码表和二次摘录值表分别用于记录页数据的特征码以及所述特征码的二次摘录值。由于刚开始只需匹配数据量小的二次摘录值表,只有在出现相同的二次摘录值的情况下,才需要比较数据量大的特征码表,因此,能够大大提高匹配效率,进而提高数据压缩效率。In this embodiment, a feature code table and a secondary extraction value table are established in advance, and the feature code table and the secondary extraction value table are respectively used to record the feature code of the page data and the secondary extraction value of the feature code. Since it is only necessary to match the secondary extraction value table with a small amount of data at the beginning, only when the same secondary extraction value appears, it is necessary to compare the feature code table with a large amount of data. Therefore, the matching efficiency can be greatly improved, and then the Data compression efficiency.

作为本发明的另一优选实施例,所述字典还可以同时包含静态字典和动态字典;所述静态字典中的页数据为写入次数大于或者等于第一阈值的数据和/或用户关心的数据;As another preferred embodiment of the present invention, the dictionary can also include both a static dictionary and a dynamic dictionary; the page data in the static dictionary is data whose write times are greater than or equal to the first threshold and/or data that the user cares about ;

所述方法在获取写命令的步骤之前还包括:Before the step of obtaining the write command, the method also includes:

设定第二阈值,对写入分页存储器件的页数据的写入次数进行计数;Setting a second threshold to count the number of write times of page data written into the paging storage device;

所述获取所述待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码具体包括:The acquisition of the feature code of the page data to be written, and judging whether there is a feature code identical to the acquired feature code in the dictionary specifically includes:

获取所述待写入页数据的特征码,判断所述静态字典中是否存在与所获取的特征码相同的特征码,若否,根据所述待写入页数据写入分页存储器件的次数判断所述待写入页数据是否需要更新到所述动态字典。Acquiring the feature code of the page data to be written, judging whether there is a feature code identical to the acquired feature code in the static dictionary, if not, judging according to the number of times the page data to be written is written into the paging storage device Whether the page data to be written needs to be updated to the dynamic dictionary.

实施例四:Embodiment four:

图4示出了本发明实施例四提供的分页存储器件压缩管理装置的组成结构,为了便于说明,仅示出了与本发明实施例相关的部分。FIG. 4 shows the composition and structure of the paging storage device compression management apparatus provided by Embodiment 4 of the present invention. For ease of description, only the parts related to the embodiment of the present invention are shown.

该分页存储器件压缩管理装置可以应用于存储设备中,可以是运行于存储设备内的软件单元、硬件单元或者软硬件相结合的单元,也可以作为独立的挂件集成到存储设备中或者运行于存储设备的应用系统中。The paging storage device compression management device can be applied to storage devices, and can be a software unit, a hardware unit, or a combination of software and hardware running in the storage device, or it can be integrated into the storage device as an independent pendant or run on the storage device. in the application system of the device.

该分页存储器件压缩管理装置包括字典建立单元41以及处理单元42。其中,各单元的具体功能如下:The paging storage device compression management device includes a dictionary establishment unit 41 and a processing unit 42 . Among them, the specific functions of each unit are as follows:

字典建立单元41,用于以页为单元建立字典,所述字典内记录有页数据、所述页数据对应的特征码以及所述页数据写入的物理页地址;A dictionary building unit 41, configured to create a dictionary in units of pages, where page data, a feature code corresponding to the page data, and a physical page address where the page data is written are recorded in the dictionary;

处理单元42,用于获取写命令,所述写命令包含有待写入页数据以及所述待写入页数据写入的逻辑页地址;获取所述待写入页数据的特征码,判断所述字典中是否存在与所获取的特征码相同的特征码;若否,执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将所述待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址。The processing unit 42 is configured to obtain a write command, the write command includes the page data to be written and the logical page address of the page data to be written; obtain the feature code of the page data to be written, and determine the Whether there is the same feature code as the acquired feature code in the dictionary; If not, execute the write command, write the page data to be written according to the logical page address written by the page data to be written; if so, The write command is not executed, and the logical page address of the page data to be written is pointed to the physical page address of the page data having the same feature code as the page data to be written.

优选的是,所述字典为静态字典,所述静态字典中的页数据为写入次数大于或者等于第一阈值的数据和/或用户关心的页数据。Preferably, the dictionary is a static dictionary, and the page data in the static dictionary is data whose write times are greater than or equal to a first threshold and/or page data that the user cares about.

优选的是,所述字典为动态字典;Preferably, the dictionary is a dynamic dictionary;

所述处理单元42在获取写命令之前还用于,设定第二阈值,对写入分页存储器件的页数据的写入次数进行计数;The processing unit 42 is further configured to set a second threshold before acquiring the write command, and count the number of write times of the page data written into the paging memory device;

所述处理单元42在执行所述写命令,根据所述待写入页数据写入的逻辑页地址写入所述待写入页数据之后,还用于在所述页数据的写入次数达到所述第二阈值时,将该页数据更新到所述动态字典中。After the processing unit 42 executes the write command and writes the page data to be written according to the logical page address written in the page data to be written, it is further configured to write the page data when the number of write times reaches When the second threshold is reached, update the page data into the dynamic dictionary.

优选的是,所述字典包含静态字典和动态字典;所述静态字典中的页数据为写入次数大于或者等于第一阈值的数据和/或用户关心的数据;Preferably, the dictionary includes a static dictionary and a dynamic dictionary; the page data in the static dictionary is data whose writing times is greater than or equal to the first threshold and/or data that the user cares about;

所述处理单元42在获取写命令之前还用于,设定第二阈值,对写入分页存储器件的页数据的写入次数进行计数;The processing unit 42 is further configured to set a second threshold before acquiring the write command, and count the number of write times of the page data written into the paging memory device;

所述处理单元42,用于获取所述待写入页数据的特征码,判断所述静态字典中是否存在与所获取的特征码相同的特征码,若否,根据所述待写入页数据写入分页存储器件的次数判断所述待写入页数据是否需要更新到所述动态字典。The processing unit 42 is configured to obtain the feature code of the page data to be written, and judge whether there is a feature code identical to the acquired feature code in the static dictionary, and if not, according to the page data to be written The times of writing into the paging storage device determine whether the page data to be written needs to be updated to the dynamic dictionary.

进一步的是,所述处理单元42还用于,当所述动态字典的存储容量达到第三阈值时,删除所述动态字典中无效的条目或者在预设时间内未使用的条目,所述无效的条目为所述动态字典中存储的页数据的写入地址已经写入了其他数据,导致该页数据无效;所述预设时间内未使用的条目为该条目对应的页数据在预设时间内未写入过分页存储器件;或者,Further, the processing unit 42 is further configured to, when the storage capacity of the dynamic dictionary reaches a third threshold, delete invalid entries in the dynamic dictionary or entries that have not been used within a preset time, and the invalid The entry is the write address of the page data stored in the dynamic dictionary and other data has been written, resulting in the page data being invalid; the entry that is not used within the preset time is the page data corresponding to the entry within the preset time The paged memory device has not been written to; or,

在所述页数据的写入次数达到所述第二阈值之后继续对所述页数据的写入次数进行计数,当所述动态字典的存储容量达到第三阈值时,删除所述动态字典中写入次数小于第四阈值的页数据。Continue to count the number of writes of the page data after the number of writes of the page data reaches the second threshold, and when the storage capacity of the dynamic dictionary reaches the third threshold, delete the write in the dynamic dictionary page data whose number of times is less than the fourth threshold.

进一步的,所述处理单元42具体用于,Further, the processing unit 42 is specifically used for:

在所述分页存储器件支持多种哈希算法时,对当前写命令中的待写入页数据进行多次消息摘录,获取每次消息摘录后的特征码,判断所述字典中是否存在与所述每次消息摘录后的特征码都相同的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据多次消息摘录后的特征码;When the paging storage device supports a variety of hash algorithms, perform multiple message excerpts on the page data to be written in the current write command, obtain the feature code after each message excerpt, and judge whether there is a corresponding feature code in the dictionary The feature code after each message excerpt is the same feature code, if not, execute the current write command, and write the page data to be written according to the logical page address of the page data to be written in the current write command; if , the write command is not executed, and the logical page address of the page data to be written in the current write command is pointed to the physical page address of the page data having the same feature code as the page data to be written, wherein the The dictionary records the feature code after multiple message excerpts of the page data;

或者,在所述分页存储器件支持一种哈希算法时,对当前写命令中的待写入页数据进行多次不同偏移的消息摘录,获取每次不同偏移消息摘录后的特征码,判断所述字典中是否存在与所述每次不同偏移消息摘录后的特征码都相同的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据多次不同偏移消息摘录后的特征码;Or, when the paging storage device supports a hash algorithm, multiple message excerpts with different offsets are performed on the page data to be written in the current write command, and the feature code after each excerpted message with different offsets is obtained, Judging whether there is a feature code in the dictionary that is the same as the feature code extracted from each different offset message, if not, execute the current write command, and write the logical page according to the page data to be written in the current write command The address is written to the page data to be written; if so, the write command is not executed, and the logical page address to be written in the page data to be written in the current write command is directed to the same feature code as the page data to be written The address of the physical page where the page data is written, wherein the dictionary records the feature code after extracting multiple different offset messages of the page data;

或者,在所述分页存储器件支持错误检查和纠正ECC码时,获取当前写命令中待写入页数据的ECC码和消息摘录后的特征码,判断所述字典中是否同时存在该ECC码和消息摘录后的特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据的ECC码和消息摘录后的特征码;Or, when the paging storage device supports error checking and correcting ECC codes, obtain the ECC codes of the page data to be written in the current write command and the feature codes after message excerpts, and judge whether the ECC codes and ECC codes exist in the dictionary at the same time. The feature code after the message is extracted, if not, execute the current write command, write the page data to be written according to the logical page address of the page data to be written in the current write command; if not, do not execute the write command, Point the logical page address written in the page data to be written in the current write command to the physical page address written in the page data having the same feature code as the page data to be written, wherein the ECC of the page data is recorded in the dictionary Code and message excerpted feature code;

或者,获取当前写命令中待写入页数据的特征码,在所述字典中存在与所获取的特征码相同的特征码时,判断所述待写入页数据中的N个字节与所述字典中相同特征码对应的页数据中的N个字节是否全部相同,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述N为大于零且小于或者等于所述分页存储器中页的大小的整数。Or, obtain the feature code of the page data to be written in the current write command, and when there is a feature code identical to the acquired feature code in the dictionary, judge that the N bytes in the page data to be written are consistent with the Whether the N bytes in the page data corresponding to the same feature code in the dictionary are all the same, if not, execute the current write command, and write the to-be-written Enter the page data; if, do not execute the write command, the logical page address of the page data to be written in the current write command is directed to the physical page written by the page data with the same feature code as the page data to be written address, wherein the N is an integer greater than zero and less than or equal to the size of a page in the paging memory.

进一步的,所述处理单元42具体用于,Further, the processing unit 42 is specifically used for:

获取当前写命令中待写入页数据的特征码,再对所述特征码进行消息摘录获得二次摘录值,判断所述字典中是否存在所述二次摘录值,若存在,再判断所述字典中是否存在所述特征码,若否,执行当前写命令,根据当前写命令中待写入页数据写入的逻辑页地址写入所述待写入页数据;若是,不执行所述写命令,将当前写命令中待写入页数据写入的逻辑页地址指向与所述待写入页数据具有相同特征码的页数据写入的物理页地址,其中所述字典中记录有页数据对应特征码的二次摘录值。Obtain the feature code of the page data to be written in the current write command, and then extract the message of the feature code to obtain a secondary excerpt value, and judge whether the secondary excerpt value exists in the dictionary, and if so, then judge the Whether there is the feature code in the dictionary, if not, execute the current write command, write the page data to be written according to the logical page address of the page data to be written in the current write command; if so, do not execute the write Command, point the logical page address of the page data to be written in the current write command to the physical page address of the page data with the same feature code as the page data to be written, wherein the page data is recorded in the dictionary Corresponding to the second excerpt value of the feature code.

在本实施例中,所述页数据的特征码可以通过哈希算法获取。In this embodiment, the feature code of the page data may be acquired through a hash algorithm.

本实施例提供的分页存储器件压缩管理装置可以使用在前述对应的分页存储器件压缩管理方法,详情参见上述分页存储器件压缩管理方法实施例一至三的相关描述,在此不再赘述。The paging storage device compression management apparatus provided in this embodiment can be used in the above-mentioned corresponding paging storage device compression management method. For details, refer to the relevant descriptions of Embodiments 1 to 3 of the above-mentioned paging storage device compression management method, which will not be repeated here.

本领域普通技术人员可以理解为上述实施例四所包括的各个单元只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。Those of ordinary skill in the art can understand that the various units included in the fourth embodiment above are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, the specific functions of each functional unit The names are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present invention.

综上所述,本发明实施例中数据压缩的最小单位是页,页内的数据不进行压缩和解压缩,数据的写入和读出也是以页为单位,因此本发明实施例提供的压缩管理方法非常适合采用运算能力不足或者成本低的分页存储器件。而且本发明实施例提供的压缩管理方法,在当前写命令中待写入的数据已经存入字典的话,则不写入数据,因此能有效提高分页存储器件的数据写入效率,减少数据的写入次数,进而减少分页存储器件的磨损。另外,本发明实施例通过多种比较方式,可有效降低比较过程中的重码率,并能提高匹配的效率和数据压缩效率,具有较强的易用性和实用性。To sum up, the smallest unit of data compression in the embodiment of the present invention is the page, the data in the page is not compressed and decompressed, and the writing and reading of data is also based on the page, so the compression management provided by the embodiment of the present invention The method is very suitable for adopting paging memory devices with insufficient computing power or low cost. Moreover, in the compression management method provided by the embodiment of the present invention, if the data to be written in the current write command has been stored in the dictionary, the data will not be written, so the data writing efficiency of the paging storage device can be effectively improved, and the writing of data can be reduced. The number of entries, thereby reducing the wear and tear of paging storage devices. In addition, the embodiment of the present invention can effectively reduce the code repetition rate in the comparison process through multiple comparison methods, and can improve matching efficiency and data compression efficiency, and has strong usability and practicability.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下做出若干等同替代或明显变型,而且性能或用途相同,都应当视为属于本发明由所提交的权利要求书确定的专利保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field to which the present invention belongs, several equivalent substitutions or obvious modifications are made without departing from the concept of the present invention, and the performance or use is the same, all should be regarded as belonging to the present invention by the submitted claims The scope of patent protection determined by the book.

Claims (16)

1. A method for compression management of a paged storage device, the method comprising:
step A, establishing a dictionary by taking a page as a unit, wherein page data, feature codes corresponding to the page data and physical page addresses written by the page data are recorded in the dictionary, and secondary extraction values of the feature codes corresponding to the page data are also recorded in the dictionary;
step B, acquiring a write command, wherein the write command comprises page data to be written and a logical page address written by the page data to be written; acquiring feature codes of the page data to be written, and judging whether feature codes identical to the acquired feature codes exist in the dictionary or not; if not, executing the write command, and writing the page data to be written according to the logical page address written by the page data to be written; if so, not executing the write command, and pointing the logical page address written by the page data to be written to a physical page address written by page data with the same feature code as the page data to be written;
the step B specifically comprises the following steps:
acquiring a feature code of page data to be written in a current writing command, extracting a message of the feature code to obtain a secondary extracted value, judging whether the secondary extracted value exists in the dictionary, if so, judging whether the feature code exists in the dictionary, if not, executing the current writing command, and writing the page data to be written in according to a logical page address written in the page data to be written in the current writing command; if so, the write command is not executed, and the logical page address written by the page data to be written in the current write command points to the physical page address written by the page data with the same feature code as the page data to be written.
2. The method of claim 1, wherein the dictionary is a static dictionary, and the page data in the static dictionary is data written a number of times greater than or equal to a first threshold.
3. The method of claim 1, wherein the dictionary is a dynamic dictionary, the dynamic dictionary being a dictionary that is allowed to be updated;
the method further comprises, before the step of obtaining a write command:
setting a second threshold value, and counting the writing times of the page data written into the paged memory device;
after the step of executing the write command and writing the page data to be written according to the logical page address written by the page data to be written, the method further comprises the following steps:
and updating the page data to be written into the dynamic dictionary when the writing times of the page data to be written reach the second threshold value.
4. The method of claim 1, wherein the dictionary comprises a static dictionary and a dynamic dictionary; the page data in the static dictionary is data with the writing times larger than or equal to a first threshold, and the dynamic dictionary is a dictionary allowed to be updated;
the method further comprises, before the step of obtaining a write command:
setting a second threshold value, and counting the writing times of the page data written into the paged memory device;
the step of acquiring the feature codes of the page data to be written and judging whether the feature codes identical to the acquired feature codes exist in the dictionary specifically comprises the following steps:
and acquiring the feature codes of the page data to be written, judging whether the feature codes identical to the acquired feature codes exist in the static dictionary, and if not, judging whether the page data to be written needs to be updated to the dynamic dictionary according to the times of writing the page data to be written into the paging memory device.
5. The method of claim 3 or 4, further comprising:
when the storage capacity of the dynamic dictionary reaches a third threshold value, deleting an invalid entry or an unused entry in the dynamic dictionary within a preset time, wherein the invalid entry is that other data have been written in a write address of page data stored in the dynamic dictionary, so that the page data is invalid; the unused entry in the preset time is that the page data corresponding to the entry is not written into the paged memory device in the preset time.
6. The method of claim 3 or 4, further comprising:
and continuing to count the writing times of the page data after the writing times of the page data reach the second threshold, and deleting the page data with the writing times smaller than a fourth threshold in the dynamic dictionary when the storage capacity of the dynamic dictionary reaches a third threshold.
7. The method of claim 1, wherein step B further comprises:
when the paging memory device supports various hash algorithms, performing message extraction on page data to be written in a current write command for multiple times, acquiring feature codes after message extraction each time, judging whether feature codes identical to the feature codes after message extraction each time exist in the dictionary or not, if not, executing the current write command, and writing the page data to be written in according to a logical page address written in the page data to be written in the current write command; if so, not executing the write command, and pointing a logical page address written by the page data to be written in the current write command to a physical page address written by the page data with the same feature code as the page data to be written, wherein feature codes obtained after extracting a plurality of messages of the page data are recorded in the dictionary;
or when the paging memory device supports a hash algorithm, performing message extraction of different offsets for multiple times on page data to be written in a current write command, acquiring feature codes after message extraction of different offsets each time, judging whether feature codes identical to the feature codes after message extraction of different offsets each time exist in the dictionary or not, if not, executing the current write command, and writing the page data to be written in according to a logical page address written in the page data to be written in the current write command; if so, not executing the write command, and pointing a logical page address written by the page data to be written in the current write command to a physical page address written by the page data with the same feature code as the page data to be written, wherein feature codes obtained by extracting page data from different offset messages for multiple times are recorded in the dictionary;
or, when the paged memory device supports error checking and correcting of ECC codes, acquiring an ECC code of page data to be written in a current write command and a feature code after message extraction, determining whether the ECC code and the feature code after message extraction exist in the dictionary at the same time, if not, executing the current write command, and writing the page data to be written according to a logical page address written in the page data to be written in the current write command; if so, not executing the write command, and pointing a logical page address written by the page data to be written in the current write command to a physical page address written by the page data with the same feature code as the page data to be written, wherein an ECC code of the page data and a feature code after message extraction are recorded in the dictionary;
or acquiring a feature code of page data to be written in the current write command, judging whether N bytes in the page data to be written are all the same as N bytes in page data corresponding to the same feature code in the dictionary when the feature code which is the same as the acquired feature code exists in the dictionary, if not, executing the current write command, and writing the page data to be written in according to a logical page address written in the page data to be written in the current write command; if so, not executing the write command, and pointing a logical page address written by the page data to be written in the current write command to a physical page address written by the page data with the same feature code as the page data to be written, wherein N is an integer which is greater than zero and less than or equal to the size of the page in the paged memory.
8. The method according to any one of claims 1 to 4, wherein the obtaining the feature code of the page data to be written specifically comprises:
and acquiring the feature code of the page data to be written through a Hash algorithm.
9. An apparatus for compression management of a paged storage device, the apparatus comprising:
the dictionary establishing unit is used for establishing a dictionary by taking a page as a unit, wherein page data, feature codes corresponding to the page data and physical page addresses written by the page data are recorded in the dictionary, and secondary extraction values of the feature codes corresponding to the page data are also recorded in the dictionary;
the processing unit is used for acquiring a write command, and the write command comprises page data to be written and a logical page address written by the page data to be written; acquiring feature codes of the page data to be written, and judging whether feature codes identical to the acquired feature codes exist in the dictionary or not; if not, executing the write command, and writing the page data to be written according to the logical page address written by the page data to be written; if so, not executing the write command, and pointing the logical page address written by the page data to be written to a physical page address written by page data with the same feature code as the page data to be written;
the processing unit is specifically configured to,
acquiring a feature code of page data to be written in a current writing command, extracting a message of the feature code to obtain a secondary extracted value, judging whether the secondary extracted value exists in the dictionary, if so, judging whether the feature code exists in the dictionary, if not, executing the current writing command, and writing the page data to be written in according to a logical page address written in the page data to be written in the current writing command; if so, the write command is not executed, and the logical page address written by the page data to be written in the current write command points to the physical page address written by the page data with the same feature code as the page data to be written.
10. The apparatus of claim 9, wherein the dictionary is a static dictionary, and the page data in the static dictionary is data written a number of times greater than or equal to a first threshold.
11. The apparatus of claim 9, wherein the dictionary is a dynamic dictionary, the dynamic dictionary being a dictionary that is allowed to be updated;
the processing unit is further configured to, before the write command is acquired, set a second threshold value to count the number of writes of the page data to the paged memory device;
and after the write command is executed and the page data to be written is written according to the logical page address written by the page data to be written, the processing unit is also used for updating the page data to the dynamic dictionary when the writing frequency of the page data reaches the second threshold value.
12. The apparatus of claim 9, wherein the dictionary comprises a static dictionary and a dynamic dictionary; the page data in the static dictionary is data with the writing times larger than or equal to a first threshold, and the dynamic dictionary is a dictionary allowed to be updated;
the processing unit is further configured to, before the write command is acquired, set a second threshold value to count the number of writes of the page data to the paged memory device;
and the processing unit is used for acquiring the feature codes of the page data to be written, judging whether the feature codes identical to the acquired feature codes exist in the static dictionary, and if not, judging whether the page data to be written needs to be updated to the dynamic dictionary according to the number of times of writing the page data to be written into the paging memory device.
13. The apparatus according to claim 11 or 12, wherein the processing unit is further configured to, when the storage capacity of the dynamic dictionary reaches a third threshold, delete an invalid entry in the dynamic dictionary or an entry that is not used within a preset time, where the invalid entry is that a write address of page data stored in the dynamic dictionary has written other data, resulting in invalidation of the page data; the unused entry in the preset time is that the page data corresponding to the entry is not written into the paged memory device in the preset time.
14. The apparatus of claim 11 or 12, wherein the processing unit is further to,
and continuing to count the writing times of the page data after the writing times of the page data reach the second threshold, and deleting the page data with the writing times smaller than a fourth threshold in the dynamic dictionary when the storage capacity of the dynamic dictionary reaches a third threshold.
15. The apparatus of claim 9, wherein the processing unit is specifically to,
when the paging memory device supports various hash algorithms, performing message extraction on page data to be written in a current write command for multiple times, acquiring feature codes after message extraction each time, judging whether feature codes identical to the feature codes after message extraction each time exist in the dictionary or not, if not, executing the current write command, and writing the page data to be written in according to a logical page address written in the page data to be written in the current write command; if so, not executing the write command, and pointing a logical page address written by the page data to be written in the current write command to a physical page address written by the page data with the same feature code as the page data to be written, wherein feature codes obtained after extracting a plurality of messages of the page data are recorded in the dictionary;
or when the paging memory device supports a hash algorithm, performing message extraction of different offsets for multiple times on page data to be written in a current write command, acquiring feature codes after message extraction of different offsets each time, judging whether feature codes identical to the feature codes after message extraction of different offsets each time exist in the dictionary or not, if not, executing the current write command, and writing the page data to be written in according to a logical page address written in the page data to be written in the current write command; if so, not executing the write command, and pointing a logical page address written by the page data to be written in the current write command to a physical page address written by the page data with the same feature code as the page data to be written, wherein feature codes obtained by extracting page data from different offset messages for multiple times are recorded in the dictionary;
or, when the paged memory device supports error checking and correcting of ECC codes, acquiring an ECC code of page data to be written in a current write command and a feature code after message extraction, determining whether the ECC code and the feature code after message extraction exist in the dictionary at the same time, if not, executing the current write command, and writing the page data to be written according to a logical page address written in the page data to be written in the current write command; if so, not executing the write command, and pointing a logical page address written by the page data to be written in the current write command to a physical page address written by the page data with the same feature code as the page data to be written, wherein an ECC code of the page data and a feature code after message extraction are recorded in the dictionary;
or acquiring a feature code of page data to be written in the current write command, judging whether N bytes in the page data to be written are all the same as N bytes in page data corresponding to the same feature code in the dictionary when the feature code which is the same as the acquired feature code exists in the dictionary, if not, executing the current write command, and writing the page data to be written in according to a logical page address written in the page data to be written in the current write command; if so, not executing the write command, and pointing a logical page address written by the page data to be written in the current write command to a physical page address written by the page data with the same feature code as the page data to be written, wherein N is an integer which is greater than zero and less than or equal to the size of the page in the paged memory.
16. The apparatus according to any one of claims 9 to 12, wherein the processing unit is configured to obtain a feature code of the page data to be written by a hash algorithm.
CN201210519408.0A 2012-12-06 2012-12-06 Compression management method and device for paging memory device Active CN103049388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210519408.0A CN103049388B (en) 2012-12-06 2012-12-06 Compression management method and device for paging memory device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210519408.0A CN103049388B (en) 2012-12-06 2012-12-06 Compression management method and device for paging memory device

Publications (2)

Publication Number Publication Date
CN103049388A CN103049388A (en) 2013-04-17
CN103049388B true CN103049388B (en) 2015-12-23

Family

ID=48062035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210519408.0A Active CN103049388B (en) 2012-12-06 2012-12-06 Compression management method and device for paging memory device

Country Status (1)

Country Link
CN (1) CN103049388B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10216418B2 (en) * 2015-06-01 2019-02-26 Samsung Electronics Co., Ltd. Storage apparatus and method for autonomous space compaction
US11042328B2 (en) 2015-06-01 2021-06-22 Samsung Electronics Co., Ltd. Storage apparatus and method for autonomous space compaction
CN106970826B (en) * 2017-03-10 2020-05-08 浙江大学 A solution to page fault exception based on huge page

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6374266B1 (en) * 1998-07-28 2002-04-16 Ralph Shnelvar Method and apparatus for storing information in a data processing system
CN101067800A (en) * 2007-07-05 2007-11-07 炬力集成电路设计有限公司 Memory deterioration balance method based on local sampling
CN101630290A (en) * 2009-08-17 2010-01-20 成都市华为赛门铁克科技有限公司 Method and device of processing repeated data
CN101719099A (en) * 2009-11-26 2010-06-02 成都市华为赛门铁克科技有限公司 Method and device for reducing write amplification of solid state disk
CN102646069A (en) * 2012-02-23 2012-08-22 华中科技大学 A method of prolonging the service life of solid-state disk
CN102722455A (en) * 2012-05-22 2012-10-10 深圳市江波龙电子有限公司 Method and device for storing data in flash memory equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6374266B1 (en) * 1998-07-28 2002-04-16 Ralph Shnelvar Method and apparatus for storing information in a data processing system
CN101067800A (en) * 2007-07-05 2007-11-07 炬力集成电路设计有限公司 Memory deterioration balance method based on local sampling
CN101630290A (en) * 2009-08-17 2010-01-20 成都市华为赛门铁克科技有限公司 Method and device of processing repeated data
CN101719099A (en) * 2009-11-26 2010-06-02 成都市华为赛门铁克科技有限公司 Method and device for reducing write amplification of solid state disk
CN102646069A (en) * 2012-02-23 2012-08-22 华中科技大学 A method of prolonging the service life of solid-state disk
CN102722455A (en) * 2012-05-22 2012-10-10 深圳市江波龙电子有限公司 Method and device for storing data in flash memory equipment

Also Published As

Publication number Publication date
CN103049388A (en) 2013-04-17

Similar Documents

Publication Publication Date Title
JP6316974B2 (en) Flash memory compression
US20180196609A1 (en) Data Deduplication Using Multi-Chunk Predictive Encoding
CN104462141B (en) Method, system and the storage engines device of a kind of data storage and inquiry
CN103488709B (en) A kind of index establishing method and system, search method and system
CN103152430B (en) A kind of reduce the cloud storage method that data take up room
US9977598B2 (en) Electronic device and a method for managing memory space thereof
CN103678158B (en) A data layout optimization method and system
CN107480074B (en) Caching method and device and electronic equipment
CN108027713A (en) Data de-duplication for solid state drive controller
CN106201774B (en) NAND FLASH storage chip data storage structure analysis method
CN102469142A (en) Data transfer methods for deduplicators
CN102136296B (en) Method for identifying metadata format of NANDFlash memory chip
CN104410424B (en) The fast and lossless compression method of embedded device internal storage data
CN104035725B (en) electronic device for accessing data and data access method thereof
CN104378119B (en) The fast and lossless compression method of file system of embedded device data
CN103049388B (en) Compression management method and device for paging memory device
WO2015067145A1 (en) Application recognition method and device
KR20170040343A (en) Adaptive rate compression hash processing device
US8909897B2 (en) Method for generating a delta for compressed data
CN103049387B (en) A kind of Compression manager method of Fragmentation device and device
US11347423B2 (en) System and method for detecting deduplication opportunities
CN117112004B (en) Differential data determination method, differential restoration method, device, equipment and medium
CN109255090B (en) An index data compression method for web graphs
CN106909623A (en) A kind of data set and date storage method of supporting efficient mass data to analyze and retrieve
CN102637204A (en) Method for querying texts based on mutual index structure

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20171124

Address after: 528400 Zhongshan Zhongshan Torch Development Zone, Guangdong Province, No. 191, No. seven

Patentee after: ZHONGSHAN JIANGBOLONG ELECTRONICS CO., LTD.

Address before: 518000, Guangdong, Nanshan District Province, Shenzhen Road, No. 8 financial services technology innovation base, 8 floor, A, B, C, D, E,, F1

Patentee before: Shenzhen jiangbolong Electronic Co., Ltd.

TR01 Transfer of patent right