CN102012849B

CN102012849B - Flash memory-based database restoring method

Info

Publication number: CN102012849B
Application number: CN201010552789A
Authority: CN
Inventors: 孟小峰; 向锂; 范玉雷
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2010-11-19
Filing date: 2010-11-19
Publication date: 2012-10-24
Anticipated expiration: 2030-11-19
Also published as: CN102012849A

Abstract

The invention relates to a method for restoring a database based on flash memory, which is characterized in that it includes a memory, a flash memory, a checkpoint management module, a system startup module and a system shutdown module, and the memory stores data page buffer, log Buffer and log area summary; the log area, data area, page table and bit map used to identify the allocation of flash memory space are stored in the flash memory; the checkpoint management module sets up checkpoints in the log area; the system startup module needs to Scan the log area and merge the obtained logs and data; when the system shutdown module starts, it needs to insert a series of logs in the log area, and perform a series of operations at the same time to complete the system shutdown; the log buffer records a part of the log, and flushes the log written to flash memory. The invention can reduce the erasing of the flash memory and give full play to the advantages of the flash memory. The invention can be widely used in the recovery of system database.

Description

A method of database recovery based on flash memory

技术领域 technical field

本发明涉及一种数据库恢复方法，特别是关于一种基于闪存的数据库恢复方法。The invention relates to a method for restoring a database, in particular to a method for restoring a database based on flash memory.

背景技术 Background technique

随着闪存的广泛使用，越来越多的数据存储在闪存设备之上。闪存数据管理成为了一个新的研究领域。基于闪存的数据库技术面临着巨大挑战，引起了业界越来越多的关注。如果将现有的针对磁盘存储设备设计的数据库技术直接运行在闪存上，在查询处理中读的速度只提高了1～10倍，写入的速度只提高了1～3倍。而事实上，闪存的读写速度为微秒级，是磁盘读写速度的100倍左右。由此可见，现有的面向传统磁盘的数据库系统若直接运行在闪存存储器上，并不能充分发挥闪存相较于磁盘的高速读写带宽，闪存的优异读写性能大打折扣。其原因在于闪存和磁盘两者的物理特性存在巨大的差异。例如，在闪存上进行数据更新时，由于闪存以页为读写单位而且覆盖写之前必须擦除整个块，即使一页中的小部分数据发生更新，为了支持现有数据库的原地更新模式，也需要将该页所在块的数据全部读入内存，然后修改该页数据，再擦除整个块，最后将内存中的整块的数据写回到闪存中。这极大地降低了更新性能，从而使得数据库的更新性能急剧下降，系统的事务吞吐率没有得到应有的提高。With the widespread use of flash memory, more and more data is stored on flash memory devices. Flash memory data management has become a new research field. The flash memory-based database technology is facing great challenges, which has attracted more and more attention from the industry. If the existing database technology designed for disk storage devices is directly run on the flash memory, the reading speed in query processing is only increased by 1 to 10 times, and the writing speed is only increased by 1 to 3 times. In fact, the read and write speed of flash memory is on the order of microseconds, which is about 100 times that of disk. It can be seen that if the existing traditional disk-oriented database system runs directly on the flash memory, it cannot fully utilize the high-speed read and write bandwidth of the flash memory compared with the disk, and the excellent read and write performance of the flash memory is greatly reduced. The reason for this is the huge difference in the physical characteristics of flash and disk. For example, when updating data on flash memory, because flash memory uses pages as read-write units and the entire block must be erased before overwriting, even if a small part of the data in a page is updated, in order to support the in-place update mode of the existing database, It is also necessary to read all the data of the block where the page is located into the memory, then modify the data of the page, then erase the entire block, and finally write the data of the entire block in the memory back to the flash memory. This greatly reduces the update performance, resulting in a sharp decline in the update performance of the database, and the transaction throughput of the system has not been improved as it should.

系统数据库的恢复方法一般有两部分所构成，一是在事务正常处理时能够保留足够的信息可用于故障恢复；二是在故障发生之后所要采取的措施，将数据库恢复到一致性的状态，保持事务的原子性和持久性的状态。这两个方面是紧密相联系的，保持的信息的类型和程度决定了故障发生之后所要采取的步骤的类型和代价的大小。同样，故障发生之后的操作的类型和代价大小也决定着事务正常处理时额外开销的大小。现有的数据库恢复方法有以下几种：(1)基于日志的恢复技术在磁盘中被广泛采用。这种方法产生了很多种不同的记录日志和恢复的办法，有很多种不同的提交协议指导着记录日志和恢复的过程。不同的协议之下，日志记录的设计，日志/数据缓冲区的管理，检查点机制，记录日志和恢复的过程都很不一样。(2)影子页是另外一种有名的事务恢复技术，它不需要显式的回滚操作。影子页技术的主要思想是采用异地更新的更新模式，在事务的生存周期中维护两张页表：当前页表(Current Page Table)和影子页表(Shadow Page Table)。当事务开始时，这两张页表是相同的。影子页表在事务执行过程中从不改变，事务对数据库的更新后的新地址写入当前页表中。当事务提交时，当前页表被写入非易失性存储器中，并且用一个原子操作置换当前页表和影子页表，于是当前页表成为了影子页表，下一个事务执行。如果系统崩溃时当前页表丢失，数据库可以用影子页表进行恢复。(3)架构在操作系统之上的各种系统软件(如数据库管理系统)和应用软件都需要借助事务(Transaction)这样的接口来应对各种故障和处理从灾难中恢复的问题。如果能够在文件系统层面提供事务(Transaction)的接口，将会大大降低其上软件为了各自处理系统故障所付出的冗余代价。The recovery method of the system database generally consists of two parts. One is to retain enough information for fault recovery during normal transaction processing; the other is to take measures after a fault occurs to restore the database to a consistent state and maintain The state of the transaction's atomicity and durability. These two aspects are closely related, and the type and degree of information maintained determine the type and cost of steps to be taken after the failure occurs. Similarly, the type and cost of the operation after the failure also determines the size of the extra overhead during the normal processing of the transaction. The existing database recovery methods are as follows: (1) The log-based recovery technology is widely used in disk. This approach yields many different approaches to logging and recovery, with many different commit protocols guiding the logging and recovery process. Under different protocols, the log record design, log/data buffer management, checkpoint mechanism, log record and recovery process are very different. (2) Shadow pages are another well-known transaction recovery technology that does not require explicit rollback operations. The main idea of shadow page technology is to adopt the update mode of remote update, and maintain two page tables in the life cycle of the transaction: the current page table (Current Page Table) and the shadow page table (Shadow Page Table). When the transaction starts, the two page tables are identical. The shadow page table never changes during transaction execution, and the transaction writes the updated new address of the database into the current page table. When the transaction is committed, the current page table is written into the non-volatile memory, and an atomic operation is used to replace the current page table and the shadow page table, so the current page table becomes the shadow page table, and the next transaction is executed. If the current page table is lost when the system crashes, the database can use the shadow page table to recover. (3) All kinds of system software (such as database management system) and application software based on the operating system need to use interfaces such as transactions to deal with various failures and deal with recovery from disasters. If a transaction (Transaction) interface can be provided at the file system level, it will greatly reduce the redundancy cost paid by the software on it to deal with system failures individually.

很多先前的工作致力于利用写时复制(copy-on-write)和日志技术在磁盘文件系统上提供这样的事务接口。但是，由于写时复制(copy-on-write)技术会使得原本在磁盘上聚簇存放的数据变得十分分散的数据片段(data fragmentation)。在磁盘这样的机械设备上离散的数据的读写效率十分低下。为了解决这个问题，很多工作采用检查点(checkpoint)和定期的清除(cleaning)机制来重组离散数据。研究表明这样的重组开销是十分昂贵的，使得支持事务的文件系统得不偿失。闪存存储器的物理特性很好地适合了事务文件系统的要求，在闪存文件系统之上支持事务接口成为了很好的选择。事务闪存(Tansactional Flash)在闪存文件系统之上提供了事务接口，它采用了基于每页的元数据和圈提交协议来减少额外的提交(commit)记录的性能和空间开销。其主要做法是把每个事务修改过的页用元数据指针连接起来，形成一个圈。这样，如果事务正常提交，其新版本就会在一个圈里，如果一个事务的没有正常提交，其修改后的数据页就不能形成一个圈。圈提交协议用很小的开销在文件系统层面提供了事务接口，但是这个接口不是完善的，而且垃圾回收、回滚和系统恢复也需要复杂度逻辑支持，其开销也相对较大。Much previous work has been devoted to providing such transactional interfaces on disk file systems using copy-on-write and journaling techniques. However, due to the copy-on-write (copy-on-write) technology, the data originally stored in clusters on the disk will become very scattered data fragmentation. The read and write efficiency of discrete data on a mechanical device such as a disk is very low. To solve this problem, many works adopt checkpoint and regular cleaning mechanism to reorganize discrete data. Studies have shown that such reorganization overhead is so expensive that a transaction-enabled file system is not worth the cost. The physical characteristics of the flash memory are well suited to the requirements of the transaction file system, and it is a good choice to support the transaction interface on the flash file system. Transactional Flash (Tansactional Flash) provides a transactional interface on top of the flash file system, which uses per-page metadata and circle commit protocols to reduce the performance and space overhead of additional commit records. The main method is to connect the pages modified by each transaction with metadata pointers to form a circle. In this way, if the transaction is submitted normally, its new version will be in a circle. If a transaction is not submitted normally, its modified data pages cannot form a circle. The circle commit protocol provides a transaction interface at the file system level with a small overhead, but this interface is not perfect, and garbage collection, rollback, and system recovery also require complex logic support, and the overhead is relatively large.

发明内容 Contents of the invention

针对上述问题，本发明的目的是提供一种能减少对闪存擦除操作，并能使闪存充分发挥其高效读写特性的基于闪存的数据库恢复方法。In view of the above problems, the object of the present invention is to provide a flash memory-based database recovery method that can reduce the erasing operations on the flash memory and enable the flash memory to fully exert its high-efficiency read and write characteristics.

为实现上述目的，本发明采取以下技术方案：一种基于闪存的数据库恢复方法，其特征在于：它包括一内存、一闪存、一检查点管理模块、一系统启动模块和一系统关闭模块，所述内存中存储有数据页缓冲区、日志缓冲区和日志区摘要；所述闪存中存储有日志区、数据区、页表和用来标识所述闪存空间分配状况的位示图；所述检查点管理模块在所述日志区中设立检查点；所述系统启动模块启动时需要扫描所述日志区，将获得的日志和数据进行合并；所述系统关闭模块启动时需要在所述日志区中插入一系列日志，同时进行一系列的操作来完成系统关闭；所述日志缓冲区记录一部分日志，并把日志刷写到所述闪存上。To achieve the above object, the present invention adopts the following technical solutions: a flash memory-based database recovery method, characterized in that: it includes a memory, a flash memory, a checkpoint management module, a system startup module and a system shutdown module, the A data page buffer, a log buffer, and a summary of the log area are stored in the memory; a log area, a data area, a page table, and a bitmap for identifying the allocation status of the flash memory space are stored in the flash memory; the checking The point management module sets up a checkpoint in the log area; when the system startup module starts, it needs to scan the log area, and merges the obtained log and data; when the system shutdown module starts, it needs to check in the log area A series of logs are inserted, and a series of operations are performed simultaneously to complete system shutdown; the log buffer records a part of the logs, and writes the logs to the flash memory.

所述检查点管理模块的检查点的建立步骤如下：(1)在日志区中插入一条记录<检查点编号，活动事务列表，开始>来标志这个检查点的开始；(2)等活动事务列表里面的所有活跃事务都已经显式提交或者撤销之后，插入另外一条记录<检查点编号，结束>到日志区中，标志着该检查点的正常结束。The steps for setting up the checkpoint of the checkpoint management module are as follows: (1) insert a record <checkpoint number, active transaction list, start> in the log area to mark the beginning of this checkpoint; (2) wait for the active transaction list After all active transactions in it have been explicitly committed or withdrawn, insert another record <checkpoint number, end> into the log area, marking the normal end of the checkpoint.

所述系统启动模块的启动过程如下：(1)检查日志区的尾部，判断是否有标志正常关机的检查点结束记录<检查点编号，结束>存在；若无该条记录，则判断在关机检查点的活跃事务列表中是否存在提交或者撤销的事务，存在则在日志区的尾部添加一条<事务编号，放弃>记录来显式撤销该事务，然后再插入<检查点编号，结束>记录来标志该检查点的结束；相反，则直接插入<检查点编号，结束>记录来标志该检查点的结束；(2)扫描日志区，在内存中重建日志区摘要。The start-up process of described system start-up module is as follows: (1) check the afterbody of log area, judge whether there is the checkpoint end record <checkpoint numbering of sign normal shutdown, end> to exist; Check whether there is a committed or revoked transaction in the active transaction list of the point, if it exists, add a <transaction number, abandon> record at the end of the log area to explicitly cancel the transaction, and then insert a <checkpoint number, end> record to mark The end of the checkpoint; on the contrary, directly insert the <checkpoint number, end> record to mark the end of the checkpoint; (2) scan the log area and rebuild the log area summary in memory.

所述系统关闭模块启动时的步骤如下：(1)插入标志检查点开始的记录<检查点编号，活动事务列表，开始>，开始系统关闭操作；(2)将数据页缓冲区的数据扫出到闪存的数据区内，并添加相关的日志记录到日志缓冲区中；(3)将页表日志区的日志记录扫出到闪存的日志区中；(4)插入标志检查点结束的记录<检查点编号，结束>。The step when described system shuts down module startup is as follows: (1) insert the record <checkpoint numbering, active transaction list, start> of mark checkpoint start>, start system shutdown operation; (2) sweep out the data of data page buffer into the data area of the flash memory, and add relevant log records to the log buffer; (3) sweep out the log records in the log area of the page table to the log area of the flash memory; (4) insert the record that marks the end of the checkpoint < Checkpoint number, end >.

所述数据页缓冲区的管理是在所述数据页缓冲区存满时，根据替换算法扫出数据页，并在所述日志缓冲区的尾部增加一条页表日志记录<事务编号，逻辑页地址，物理页地址>记录此页逻辑地址和物理地址之间的映射关系。The management of the data page buffer is to sweep out the data page according to the replacement algorithm when the data page buffer is full, and add a page table log record<transaction number, logical page address at the end of the log buffer , physical page address> record the mapping relationship between the logical address and the physical address of this page.

所述日志缓冲区的管理中，所述日志缓冲区和闪存的日志区都要求顺序写入和顺序扫出。In the management of the log buffer, both the log buffer and the log area of the flash memory require sequential writing and sequential sweeping.

查找所述页表中页面地址的步骤如下：(1)查找日志缓冲区，判断是否有成功提交事务修改了待查找的逻辑页面，若有，则返回最近的成功提交事务所标识的物理地址；反之，进入下一步；(2)查找日志区摘要，判断是否有待查找的逻辑页面地址，若有则返回；反之，进入下一步；(3)查找日志区，判断是否有成功提交事务修改了待查找的逻辑页面，若有，则返回最近的成功提交事务所表示的物理地址；反之，进入下一步；(4)在页表里面查找，返回页面地址，结束。The step of searching page address in described page table is as follows: (1) search log buffer, judge whether to have successfully submitted transaction to have revised the logical page to be searched, if so, then return the physical address of recent successful submitted transaction identification; On the contrary, go to the next step; (2) search the log area summary, judge whether there is a logical page address to be searched, and return if so; otherwise, go to the next step; (3) search the log area, and judge whether there is a successfully submitted transaction to modify the pending page address; If there is a logical page to be searched, then return the physical address indicated by the recent successful submission transaction; otherwise, enter the next step; (4) search in the page table, return the page address, and end.

所述日志区存满时，将所述页表和日志区进行合并，所述日志区的大小S_r要满足如下公式：

其中，S_f代表所述闪存的大小，S_p代表所述闪存中页面的大小，S_l代表页表日志记录的大小。When the log area is full, the page table and the log area are merged, and the size S _r of the log area will satisfy the following formula:

Wherein, S _f represents the size of the flash memory, S _p represents the size of the page in the flash memory, and S ₁ represents the size of the page table log record.

本发明由于采取以上技术方案，其具有以下优点：1、本发明由于采用基于闪存的日志结构，充分发挥了闪存的以页为单位的操作特性，同时减少了对闪存的擦除。因此系统的事务吞吐率得到了很大的提高，闪存的优越性能得到充分发挥。2、本发明由于采用基于闪存的数据库恢复方法能够很好地适应闪存存储器的物理特性，在恢复速度和恢复数据方面都优于传统的针对磁盘数据库的恢复方法。3、本发明没有改变传统的内存与闪存的两层存储结构，而且同时支持完整的事务接口，其通用性得到了保证，有利于闪存存储器的快速应用。4、本发明由于是基于磁盘文件系统的应用，可以很方便地运行在闪存存储设备之上。本发明可以广泛应用于系统数据库的恢复中。Because the present invention adopts the above technical scheme, it has the following advantages: 1. Since the present invention adopts a log structure based on flash memory, it fully utilizes the operating characteristics of flash memory in units of pages, while reducing the erasure of flash memory. Therefore, the transaction throughput rate of the system has been greatly improved, and the superior performance of flash memory has been fully utilized. 2. The present invention can adapt well to the physical characteristics of the flash memory due to the database recovery method based on the flash memory, and is superior to the traditional disk database recovery method in terms of recovery speed and data recovery. 3. The present invention does not change the traditional two-layer storage structure of memory and flash memory, and supports a complete transaction interface at the same time, its versatility is guaranteed, and it is beneficial to the rapid application of flash memory. 4. Since the present invention is based on the application of the disk file system, it can be conveniently run on the flash storage device. The invention can be widely used in the recovery of system database.

附图说明 Description of drawings

图1是本发明的整体结构示意图；Fig. 1 is the overall structural representation of the present invention;

图2是本发明的检查点管理模块中检查点建立流程示意图；FIG. 2 is a schematic diagram of a checkpoint establishment process in the checkpoint management module of the present invention;

图3是本发明的系统启动模块启动流程示意图；Fig. 3 is a schematic diagram of the startup process of the system startup module of the present invention;

图4是本发明的系统关闭模块启动流程示意图；Fig. 4 is a schematic diagram of the start-up process of the system shutdown module of the present invention;

图5是本发明查找页表中页面地址流程示意图。Fig. 5 is a schematic diagram of the process of searching the page address in the page table according to the present invention.

具体实施方式 Detailed ways

下面结合附图和实施例对本发明进行详细的描述。The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

本发明采用异地更新的更新模式，用一个页表去管理闪存上的空闲空间。新的页表映射地址并不直接写入页表，而是通过日志的方式记录。通过日志的方式使得对于页表的修改是原子的，以此来获得事务的原子性。The present invention adopts the update mode of remote update, and uses a page table to manage the free space on the flash memory. The new page table mapping address is not directly written into the page table, but is recorded in a log. The modification of the page table is atomic through the log method, so as to obtain the atomicity of the transaction.

如图1所示，本发明包括一内存1、一闪存2、一检查点管理模块3、一系统启动模块4和一系统关闭模块5，其中，内存1中存储有数据页缓冲区6、日志缓冲区7和日志区摘要8；闪存2中存储有日志区9、数据区10、页表11和用来标识闪存2空间分配状况的位示图12；检查点管理模块3主要在日志区9中设立检查点，即插入一条检查点日志，为了能够更高效的完成恢复操作；系统启动模块4启动时需要扫描日志区9，将获得的日志和数据进行合并；系统关闭模块5启动时需要在日志区9中插入一系列日志，同时进行一系列的操作来完成系统关闭；日志缓冲区7记录一部分日志，然后把日志刷写到闪存2上，达到永久存储的目的。本发明根据日志内容进行查找，由于日志内容中有记录数据的逻辑地址和物理地址的映射关系，因此通过查找日志可以找到数据的真实位置。As shown in Figure 1, the present invention includes a memory 1, a flash memory 2, a checkpoint management module 3, a system startup module 4 and a system shutdown module 5, wherein the memory 1 is stored with a data page buffer 6, a log Buffer 7 and log area summary 8; flash memory 2 stores log area 9, data area 10, page table 11 and bitmap 12 used to identify the space allocation status of flash memory 2; checkpoint management module 3 is mainly in log area 9 Set up a checkpoint in the checkpoint, that is, insert a checkpoint log, in order to complete the recovery operation more efficiently; the log area 9 needs to be scanned when the system startup module 4 starts, and the obtained log and data are merged; when the system shutdown module 5 starts, it needs to be in A series of logs are inserted in the log area 9, and a series of operations are performed simultaneously to complete the system shutdown; the log buffer 7 records a part of the logs, and then writes the logs to the flash memory 2 to achieve the purpose of permanent storage. The present invention searches according to the log content, and since the log content has a mapping relationship between the logical address and the physical address of the recorded data, the real location of the data can be found by searching the log.

日志区9对每个页表11的记录均由两列构成，第一列为逻辑页地址，第二列是该逻辑页地址对应的物理页地址，则该页表日志记录有如下两种类型：The record of each page table 11 in the log area 9 is composed of two columns, the first column is the logical page address, and the second column is the physical page address corresponding to the logical page address, then the page table log records have the following two types :

(1)<事务编号，事务状态>；(1) <transaction number, transaction status>;

(2)<事务编号，逻辑页地址，物理页地址>。(2) <transaction number, logical page address, physical page address>.

其中，事务编号是事务的ID编号，是一个事务的唯一标识。事务状态的取值可以是“提交”或者“放弃”。一条<事务编号，逻辑页地址，物理页地址>记录把一个逻辑页地址转化成为一个相应的物理页地址。Wherein, the transaction number is an ID number of a transaction, which is a unique identifier of a transaction. The value of the transaction status can be "commit" or "abandon". A <transaction number, logical page address, physical page address> record converts a logical page address into a corresponding physical page address.

如图2所示，上述实施例中，检查点管理模块3的检查点在日志区9中用来减少维护日志区摘要8，以及在合并页表11和日志区9时所要扫描的页表日志记录数目。当系统被关闭和日志区9足够长时，检查点被插入到日志区9中，则检查点的建立步骤如下：As shown in FIG. 2, in the above-mentioned embodiment, the checkpoint of the checkpoint management module 3 is used in the log area 9 to reduce the maintenance of the log area summary 8, and the page table log to be scanned when the page table 11 and the log area 9 are merged. number of records. When the system is shut down and the log area 9 is long enough, the checkpoint is inserted into the log area 9, and the checkpoint establishment steps are as follows:

(1)在日志区9中插入一条记录<检查点编号，活动事务列表，开始>来标志这个检查点的开始，其中检查点编号是这个检查点的唯一标识，活动事务列表里面包含此时仍在活跃的事务ID编号列表；(1) Insert a record <checkpoint number, active transaction list, start> into the log area 9 to mark the start of this checkpoint, where the checkpoint number is the unique identifier of this checkpoint, and the active transaction list contains List of active transaction ID numbers;

(2)等活动事务列表里面的所有活跃事务都已经显式提交或者撤销之后，插入另外一条记录<检查点编号，结束>到日志区9中，标志着该检查点的正常结束。(2) After all active transactions in the active transaction list have been explicitly committed or withdrawn, insert another record <checkpoint number, end> into log area 9, marking the normal end of the checkpoint.

如图3所示，上述各实施例中，系统启动模块4重新启动时，首先检查上次关机是否正常，此次开机是否是从一次系统崩溃中重新启动，然后做相应的处理；最后必须扫描日志区9重建内存1中的日志区摘要8。系统启动模块4的启动过程如下：As shown in Figure 3, in above-mentioned each embodiment, when system start-up module 4 restarts, at first check whether shutting down last time is normal, whether this starting up restarts from a system crash, then do corresponding processing; Must scan at last Log area 9 rebuilds log area summary 8 in memory 1. The startup process of the system startup module 4 is as follows:

(1)检查日志区9的尾部，判断是否有标志正常关机的检查点结束记录<检查点编号，结束>存在；如果没有该条记录，说明系统是从一次崩溃中重启，则必须完成上次没有完成的检查点过程，即对于那些在关机检查点的活跃事务列表中，但是在崩溃发生时还没有显式提交或者撤销的事务时，则在日志区9的尾部添加一条日志记录<事务编号，放弃>来显式撤销该事务，然后再插入<检查点编号，结束>记录来标志该检查点的结束；反之，若在崩溃发生时显式提交或者撤销的事务，则直接插入<检查点编号，结束>记录来标志该检查点的结束；(1) Check the tail of the log area 9 to determine whether there is a checkpoint end record <checkpoint number, end> that marks a normal shutdown; if there is no such record, it means that the system restarted from a crash, and the last time must be completed The checkpoint process that has not been completed, that is, for those transactions that are in the active transaction list of the shutdown checkpoint but have not been explicitly committed or withdrawn when the crash occurs, add a log record < transaction number at the end of the log area 9 , give up> to explicitly cancel the transaction, and then insert the <checkpoint number, end> record to mark the end of the checkpoint; otherwise, if the transaction is explicitly committed or withdrawn when the crash occurs, directly insert the <checkpoint number, end > record to mark the end of the checkpoint;

(2)扫描日志区9，在内存中重建日志区摘要8。(2) Scan the log area 9 and rebuild the log area summary 8 in memory.

如图4所示，上述各实施例中，系统关闭模块5启动时，数据缓冲区6和日志缓冲区7的所有数据被分别扫出到闪存2上的数据区10和日志区9中。与此同时，一个检查点插入到日志区9尾部以标志此次正常关机。系统关闭模块5启动时的步骤如下：As shown in FIG. 4 , in each of the above embodiments, when the system shutdown module 5 is started, all data in the data buffer 6 and the log buffer 7 are swept out to the data area 10 and the log area 9 on the flash memory 2 respectively. At the same time, a checkpoint is inserted into the end of log area 9 to mark this normal shutdown. The steps when the system shutdown module 5 starts are as follows:

(1)插入标志检查点开始的记录<检查点编号，活动事务列表，开始>，开始系统关闭操作；(1) Insert the record <checkpoint number, active transaction list, start> that marks the start of the checkpoint to start the system shutdown operation;

(2)将数据页缓冲区6的数据扫出到闪存2的数据区10内，并添加相关的日志记录到日志缓冲区7中；(2) sweep out the data of data page buffer 6 in the data area 10 of flash memory 2, and add relevant log record in the log buffer 7;

(3)将页表日志区9的日志记录扫出到闪存2的日志区9中；(3) the log record of page table log area 9 is swept out in the log area 9 of flash memory 2;

(4)插入标志检查点结束的记录<检查点编号，结束>。(4) Insert a record <checkpoint number, end> marking the end of the checkpoint.

上述步骤中，如果系统故障非正常关机，正常关机时的检查点将不完整，在日志区9的末尾则找不到相应的标志检查点正常结束的记录。In the above steps, if the system fails to shut down abnormally, the checkpoint during normal shutdown will be incomplete, and no corresponding record indicating the normal end of the checkpoint will be found at the end of the log area 9 .

上述各实施例中，数据页缓冲区6的管理是在数据页缓冲区6存满时没有足够的空间需要淘汰数据页时，根据替换算法扫出数据页，并在日志缓冲区9的尾部增加一条页表日志记录<事务编号，逻辑页地址，物理页地址>记录此页逻辑地址和物理地址之间的映射关系。In the above-mentioned embodiments, the management of the data page buffer 6 is that when the data page buffer 6 is full and there is not enough space to eliminate the data page, the data page is swept out according to the replacement algorithm and added at the end of the log buffer 9. A page table log record <transaction number, logical page address, physical page address> records the mapping relationship between the logical address and the physical address of this page.

上述各实施例中，日志缓冲区7的管理中，日志缓冲区7和闪存2的日志区9都要求顺序写入和顺序扫出，如果日志缓冲区7扫出的日志中有标识显式提交事务的日志记录<事务编号，放弃>，则要从尾至头扫描日志区9，直到最后一个检查点的开始位置，在日志区摘要8中更新该提交事务修改过的页表地址。In the above-mentioned embodiments, in the management of the log buffer 7, both the log buffer 7 and the log area 9 of the flash memory 2 require sequential writing and sequential sweeping. For the log record <transaction number, abandonment> of the transaction, it is necessary to scan the log area 9 from the end to the beginning until the beginning of the last checkpoint, and update the page table address modified by the committed transaction in the log area summary 8.

如图5所示，上述各实施例中，在查找页表11中页面地址过程时，每次查找一个逻辑页面地址都必须从头到尾扫描整个页表日志文件。如果需要扫描的日志记录过多，这个查找过程有可能成为系统新的瓶颈。为了不必要每次都扫描整个日志文件，把常访问的页表地址在内存1的日志区摘要8中缓冲起来，即日志区摘要8中存放最近最常访问的页面地址。因此，一个逻辑页面的正确及时的物理页面地址就可能存在四个地方：日志缓冲区7、日志区摘要8、日志区9或者页表11之中。所以查找页表11中页面地址的步骤如下：As shown in FIG. 5 , in the above embodiments, when searching for a page address in the page table 11 , the entire page table log file must be scanned from beginning to end each time a logical page address is searched. If there are too many log records to be scanned, this search process may become a new bottleneck of the system. In order not to scan the entire log file every time, the frequently accessed page table address is buffered in the log area summary 8 of memory 1, that is, the log area summary 8 stores the most recently accessed page address. Therefore, the correct and timely physical page address of a logical page may exist in four places: the log buffer 7, the log area summary 8, the log area 9 or the page table 11. So the steps to find the page address in page table 11 are as follows:

(1)查找日志缓冲区7，判断是否有成功提交事务修改了待查找的逻辑页面，如果有，则返回最近的成功提交事务所标识的物理地址；如果没有找到，则进入下一步；(1) search the log buffer 7, and judge whether there is a successful submission transaction to modify the logical page to be searched, if there is, then return the physical address of the recent successful submission transaction identification; if not found, then enter the next step;

(2)查找日志区摘要8，判断是否有待查找的逻辑页面地址，如果有则返回；如果没有找到则进入下一步；(2) Search log area summary 8, judge whether there is the logical page address to be searched, if have then return; If not found then enter next step;

(3)查找日志区9，判断是否有成功提交事务修改了待查找的逻辑页面，如果有，则返回最近的成功提交事务所表示的物理地址；如果没有找到，则进入下一步；(3) Search the log area 9, judge whether there is a successful submission transaction to modify the logical page to be searched, if there is, then return the physical address of the recent successful submission transaction representation; if not found, then enter the next step;

(4)在页表11里面查找，返回页面地址，结束。(4) Search in the page table 11, return the page address, and end.

上述各步骤中，由于日志缓冲区9和日志区摘要8都在内存1中，如果访问的局部性和日志区摘要8的命中区不至于太差的话，页面地址查找过程就可以在前两个步骤之后返回。这样这个查找过程就会足够快，而不至于成为系统的瓶颈。经过实验分析表明，只要花去很小的额外代价，日志区摘要8的命中率就可以达到90％以上。In the above steps, since the log buffer 9 and the log area summary 8 are both in the memory 1, if the locality of access and the hit area of the log area summary 8 are not too bad, the page address lookup process can be performed in the first two Return after the step. In this way, the search process will be fast enough so as not to become a bottleneck of the system. Experimental analysis shows that as long as a small additional cost is spent, the hit rate of the summary 8 in the log area can reach more than 90%.

上述各实施例中，当日志区9存满时，则要将页表11和日志区9进行合并，进而产生新的页表，然后擦除掉日志区9再次存放页表日志记录。定期进行这样的合并过程也可以减少当日志区摘要8不命中时需要扫描的日志区9的页表日志数目，加快页面地址的产生过程。为了防止日志区9在整个闪存2的使用寿命之前达到擦除次数限制而报废，日志区9的大小Sr需要满足如下公式：In the above embodiments, when the log area 9 is full, the page table 11 and the log area 9 will be merged to generate a new page table, and then the log area 9 will be erased to store the page table log records again. Regularly performing such a merging process can also reduce the number of page table logs in the log area 9 that need to be scanned when the log area digest 8 misses, and speed up the page address generation process. In order to prevent the log area 9 from reaching the erasure limit before the service life of the entire flash memory 2 and being scrapped, the size Sr of the log area 9 needs to satisfy the following formula:

${S S}_{r r} &GreaterEqual; &Greater Equal; \frac{{S S}_{f f}}{{S S}_{p p}} {S S}_{l l} - - - - - - ((11))$

其中，S_f标识闪存2的大小，S_p标识闪存2中页面的大小，S_l标识页表日志记录的大小。Wherein, S _f identifies the size of the flash memory 2, S _p identifies the size of the page in the flash memory 2, and S _l identifies the size of the page table log record.

页表11和日志区9的合并过程也可以在后台进行，本发明采用与影子页及闪存转换层中类似的机制：在合并开始时保存页表11的两个版本，例如PT0和PT1；然后在合并的过程中保持PT0不变，在后台合并PT1和日志区9，新的页表写入原来PT1的位置，合并成功之后用一个原子操作修改指向页表11的指针指向PT1使之成为新的有效的页表；然后擦除日志区9，完成页表11和日志区9的合并过程。这样可以有效确保合并后的原子性。The merging process of the page table 11 and the log area 9 can also be carried out in the background, and the present invention adopts a mechanism similar to that in the shadow page and the flash memory translation layer: two versions of the page table 11, such as PT0 and PT1, are preserved when merging begins; then Keep PT0 unchanged during the merge process, merge PT1 and log area 9 in the background, and write the new page table to the original PT1 location. After the merge is successful, use an atomic operation to modify the pointer pointing to page table 11 to point to PT1 to make it a new page table. The valid page table; then erase log area 9, complete the merging process of page table 11 and log area 9. This can effectively ensure the atomicity after the merge.

上述各实施例仅用于说明本发明，其中各步骤都是可以有所变化的，凡是在本发明技术方案的基础上进行的等同变换和改进，均不应排除在本发明的保护范围之外。Above-mentioned each embodiment is only for illustrating the present invention, and wherein each step all can be changed to some extent, all equivalent transformations and improvements carried out on the basis of the technical solution of the present invention, all should not be excluded outside the protection scope of the present invention .

Claims

1. A database recovery method based on flash memory, characterized in that: it includes a memory, a flash memory, a checkpoint management module, a system startup module and a system shutdown module, stored in the memory with data page buffer, Summary of log buffer and log area; log area, data area, page table and bit map used to identify the allocation status of the flash memory space are stored in the flash memory; the checkpoint management module is set up in the log area check point; when the system startup module starts, it needs to scan the log area, and merge the obtained log and data; when the system shutdown module starts, it needs to insert a series of logs in the log area, and simultaneously perform a series of Operation to complete the system shutdown; the log buffer records a part of the log, and writes the log to the flash memory;

The checkpoint establishment steps of the checkpoint management module are as follows; (1) insert a record <checkpoint number, active transaction list, start> in the log area to mark the beginning of this checkpoint; (2) wait for the active transaction list After all active transactions in it have been explicitly committed or revoked, insert another record <checkpoint number, end> into the log area, marking the normal end of the checkpoint;

The start-up process of described system start-up module is as follows: (1) check the afterbody of log area, judge whether there is the checkpoint end record <checkpoint numbering of sign normal shutdown, end> to exist; Check whether there is a committed or revoked transaction in the active transaction list of the point, if it exists, add a <transaction number, abandon> record at the end of the log area to explicitly cancel the transaction, and then insert a <checkpoint number, end> record to mark The end of the checkpoint; on the contrary, directly insert the <checkpoint number, end> record to mark the end of the checkpoint; (2) scan the log area and rebuild the log area summary in memory;

The step when described system shuts down module startup is as follows: (1) insert the record <checkpoint numbering, active transaction list, start> of mark checkpoint start>, start system shutdown operation; (2) sweep out the data of data page buffer into the data area of the flash memory, and add relevant log records to the log buffer; (3) sweep out the log records in the log area of the page table to the log area of the flash memory; (4) insert the record that marks the end of the checkpoint < Checkpoint number, end >.

2. a kind of database restoration method based on flash memory as claimed in claim 1, is characterized in that: the management of described data page buffer is when described data page buffer is full, sweeps out data page according to replacement algorithm, And add a page table log record <transaction number, logical page address, physical page address> at the tail of the log buffer to record the mapping relationship between the page logical address and the physical address.

3. a kind of database recovery method based on flash memory as claimed in claim 1, is characterized in that: in the management of described log buffer, the log area of described log buffer and flash memory all requires sequential writing and order to sweep out .

4. a kind of database recovery method based on flash memory as claimed in claim 1, is characterized in that: the step of searching page address in described page table is as follows:

(1) Search the log buffer to determine whether the logical page to be searched has been modified by successfully submitting the transaction, and if so, return the physical address of the latest successfully submitted transaction identification; otherwise, enter the next step;

(2) Search the summary of the log area, judge whether there is a logical page address to be searched, and return if there is; otherwise, enter the next step;

(3) Search the log area to judge whether there is a successful submitted transaction to modify the logical page to be searched, if so, return the physical address indicated by the recent successful submitted transaction; otherwise, enter the next step;

(4) Search in the page table, return the page address, and end.

5. a kind of database recovery method based on flash memory as claimed in claim 1 or 2, is characterized in that: when described log area is stored full, described page table and log area are merged, and the size S of described log area _r must satisfy the following formula:

{S S}_{r r} &GreaterEqual; &Greater Equal; \frac{{S S}_{f f}}{{S S}_{p p}} {S S}_{l l},,