[go: up one dir, main page]

CN113220490A - Transaction persistence method and system for asynchronous write-back persistent memory - Google Patents

Transaction persistence method and system for asynchronous write-back persistent memory Download PDF

Info

Publication number
CN113220490A
CN113220490A CN202110605179.3A CN202110605179A CN113220490A CN 113220490 A CN113220490 A CN 113220490A CN 202110605179 A CN202110605179 A CN 202110605179A CN 113220490 A CN113220490 A CN 113220490A
Authority
CN
China
Prior art keywords
transaction
log
data
memory
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110605179.3A
Other languages
Chinese (zh)
Other versions
CN113220490B (en
Inventor
陈康
武永卫
姜进磊
任峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110605179.3A priority Critical patent/CN113220490B/en
Publication of CN113220490A publication Critical patent/CN113220490A/en
Application granted granted Critical
Publication of CN113220490B publication Critical patent/CN113220490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • G06F9/3871Asynchronous instruction pipeline, e.g. using handshake signals between stages

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种内存事务持久化方法及其基于此方法构建的持久化事务内存系统,主要涉及系统软件技术领域。该方法的主要特征是在事务提交阶段,构建并持久化日志,在易失型存储器上更新数据。持久化内存上的数据更新由专门线程周期性执行。该线程通过位图机制确定易失型存储器和持久化内存中可能处于不一致状态的缓存块,调用写回原语同步数据,并回收不再需要的日志条目空间。因为持久化内存的读写粒度和写回粒度不同,且实际负载具有空间局部性,故上述方法可有效减少所需的写回原语次数,并提高事务处理效率。对比现有系统,基于此方法实现的系统相对同类实现可显著缩短事务执行时间,同时具有可扩展性。

Figure 202110605179

The invention discloses a memory transaction persistence method and a persistent transaction memory system constructed based on the method, which mainly relate to the technical field of system software. The main feature of this method is that during the transaction commit phase, the log is constructed and persisted, and the data is updated on volatile memory. Data updates on persistent memory are performed periodically by specialized threads. The thread uses the bitmap mechanism to identify cache blocks in volatile and persistent memory that may be in an inconsistent state, calls the write-back primitive to synchronize data, and reclaims space for log entries that are no longer needed. Because the read-write granularity and write-back granularity of persistent memory are different, and the actual load has spatial locality, the above method can effectively reduce the required number of write-back primitives and improve transaction processing efficiency. Compared with the existing systems, the system implemented based on this method can significantly shorten the transaction execution time compared with similar implementations, and at the same time has scalability.

Figure 202110605179

Description

Transaction persistence method and system for asynchronous write-back persistent memory
Technical Field
The invention relates to the technical field of system software, in particular to a transaction persistence method and system for a persistent memory.
Background
For a long time, most software completes efficient processing by means of a memory, and completes data storage by using a magnetic disk. The proposal of persistent memory changes the inherent storage architecture. The persistent memory has the characteristics of persistence, byte addressing, low delay and high bandwidth, allows the data to be directly modified by using a conventional read-write instruction, and effectively absorbs the advantages of the traditional memory and a magnetic disk. With the advent of commercial persistent memory products developed by intel corporation, it is anticipated that more software will be available to make full use of persistent memory. But it is difficult to use the persistent memory correctly because the modification to the persistent memory (note: using instructions such as MOV) is temporarily stored in the Cache (Cache), and the data disappears after power-off; these modifications may be written back to persistent memory ahead of time due to cache eviction mechanisms. In order to ensure the consistency of the data in the persistent memory, it is necessary to reasonably use write-back (such as CLWB) and fence instruction (such as MFENCE) to ensure that the latest data is synchronized in the persistent memory.
Transactional Memory (TM) provides a shared memory concurrency control mechanism that emulates database transactions. A transaction is a piece of code that reads and writes to memory. These read and write operations are logically a separate unit whose intermediate state is not visible to other transactions.
Disclosure of Invention
(I) technical problem to be solved
Conventionally, in order to simplify the programming of the persistent memory, a common method is to modify and expand functions based on the conventional software transactional memory, thereby implementing the persistent transactional memory function. Transactional memory treats a set of memory read and write operations as a whole and implements complete ACID semantics as with database transactions. Typical systems are Mnemosyn [ ASPLOS 11], Dude [ ASPLOS 17], pises [ USENIX ATC 19], and the like. Conventional databases typically append write entries to the log file during commit. At regular intervals, the checkpoint thread takes a snapshot of the current database state (writes all critical data structures in memory to disk) and reclaims logs that are no longer needed. Compared with the traditional database, the transactional memory system generally adopts redo logging technology (redo logging), and is generally divided into three stages: (P1) constructing and persisting the log (including transaction sequence number, modified address and new value), (P2) updating data, (P3) persisting the data updated in the previous step, and then deleting the log. Traditionally, through these three processes, the transaction is considered committed. The concrete points are as follows: (T1) after the commit point, all threads can see the values updated by the transaction; (T2) after the commit point, if the system is down, the values updated by the transaction or subsequent committed transactions can be seen after the necessary recovery process is completed.
Existing architectures implement data persistence in transactional memory systems with cache lines (cache lines) as granularity, which is longer in duration at phase (P3) than at phase (P1) in many application scenarios. Some systems (e.g., Mnemosyne) resort to arranging all three phases described above to be completed sequentially in the commit phase of the transaction, which significantly reduces the processing efficiency of the transaction. Some systems (e.g., DudeTM) separate the (P2) th and (P3) th phases from transaction commit, each by an independent asynchronous thread, which has significant scalability problems, and the nature of the committed transaction (T2) cannot be guaranteed. There are also systems (e.g., pises) that reduce the level of transaction isolation (snapshot isolation) so that read-only transactions are non-blocking in many cases, but that do not address the long persistent memory change times in write transactions and require some degree of modification to the application.
The inventor finds that: in an actual persistent memory system, persistent data occupies most of the overhead relative to persistent logs. The inventors have also noted that memory accesses of a real system conform to the characteristics of spatial and temporal locality, and write-back instructions are implemented at some memory unit granularity (cache block granularity, typically 64 bytes), so it is likely that some cache blocks are written back repeatedly many times, which results in a significant write-back magnification phenomenon.
Thus, the inventors thought: if a certain method is adopted, the flashing requests are combined, so that the flashing amplification can be reduced, and the maximum flashing performance is achieved, thereby shortening the transaction execution time and achieving the performance similar to that of the traditional transaction memory.
The invention aims to solve the problems of transaction execution time extension, throughput rate reduction and poor expandability caused by an inefficient data persistence strategy in a traditional persistent transactional memory system, and provides an efficient, easy-to-use and extensible persistent transactional memory system.
(II) technical scheme
In order to solve the technical problem, the invention provides a persistent memory transaction persistence method based on asynchronous write back, and accordingly, a new persistent transaction memory system (XIHE-TM) is realized. The working method of the system comprises the following steps:
s1, the system includes working thread and recovery thread, the working thread is used to build and persist the log during the transaction submission, the recovery thread checks the transaction submission state according to a predefined time period, and completes the persistence and log recovery work on the data address space.
Initialization is required before the transaction is formally executed. And starting a recovery thread, wherein the function of the recovery thread is to monitor the execution condition of the front-end affairs, and the recovery thread asynchronously executes the work of persistent snapshot and log recovery in batches after a plurality of front-end affairs are detected to complete submission at regular intervals. The detailed steps of these operations are described in the S3 process.
S2, a transaction commit stage adopts a before-Write Log (WAL) paradigm, that is, before a new value is written into a memory address space, a Redo Log (Redo Log) needs to be prepared and persisted. During the transaction commit phase, after all target write addresses are locked and atomic consistency is verified through the transaction, the following steps need to be completed in sequence:
and S2.1, constructing a redo log. For the persistent transactional memory, the address and the new value of each write request need to be extracted to build a log. In addition, to ensure correct recovery, the transaction sequence number needs to be recorded together.
And S2.2, persisting the log obtained in the step S2.1 in a specific area of the memory, and ensuring that the log is persisted. If there is not enough log space, the transaction will continue to wait.
S2.3, writing new data into the specified target address by using a conventional read-write instruction, but not using any persistence primitive.
Finally, this transaction is marked as "committed," allowing the front-end worker thread to begin executing the next transaction.
S3, when the recovery thread detects the committed transaction, the execution effect of the transaction can be persisted and the log space can be recovered through the following steps. In order to improve the performance, a method of processing a plurality of transactions in batches at fixed time intervals is adopted.
S3.1, the recovery thread periodically checks whether a transaction which is submitted and has not been deleted in the log exists, and if not, retries after waiting for a period of time;
and S3.2, the recovery thread reads the addresses corresponding to the transactions and determines a cache block address list needing to be refreshed.
And S3.3, the recovery thread persists the modification of the group of transactions to the persistent memory by calling a write-back (CLWB) instruction for each cache block address obtained in the step S3.2.
And S3.4, safely deleting the logs related to the step S3.1 by the recovery thread. After that, the recycle thread waits for a period of time, and returns to step S3.1 to continue checking whether there is a transaction that is "committed" and the log is not deleted.
Through the process of S2, the transaction is completed in the form of log; the transaction completes the persistence in the form of a data structure on the persistent memory, via the process of S3.
If the system crashes, the next time the system is started, the recovery process needs to be executed, and the steps are as follows:
s4.1, scanning all effective log records in the persistent memory, and arranging the effective log records according to the sequence of increasing transaction sequence numbers;
s4.2, replaying the transactions one by one, namely writing new data into a corresponding address for each record;
s4.3, determining a cache block address list needing to be refreshed in the replay;
s4.4, for each obtained cache block address, through calling a write-back instruction, persisting the modification of the memory in the replay;
s4.5 deletes all logs.
(III) advantageous effects
Compared with the existing transaction memory persistence method and system, the technical scheme provided by the invention has the following advantages:
1. according to the memory transaction persistence method based on asynchronous write back of the cache block, time-consuming data persistence is transferred to the background thread, and meanwhile, a plurality of log entries of the same cache block are merged and written back, so that the transaction submission time of a working thread is shortened, the time during address locking is reduced, the data write back quantity is reduced, the data persistence is prevented from becoming the performance bottleneck of a transaction memory, and high expandability and low persistence overhead of a system are achieved. If the data is in the biased distribution, the invention can further reduce the data write-back requirement and has obvious effect on improving the performance. The system implemented based on the invention achieves persistence and reduces write-back requests by 6.30% -44.05% over various test sets relative to Dude. The throughput rate on the actual test set is 5.82 times higher than that of the Dude TM.
2. The memory transaction persistence method based on asynchronous write back of the cache block has universality, not only can be used for a system realized based on the method, but also can be used for realizing other persistence transaction systems (such as persistence hardware transaction memory, databases and the like) and other persistence data structures (such as B + trees and the like).
3. The asynchronous thread executing the cache write-back operation of the embodiment of the invention does not influence the correctness of the execution of the foreground thread. The write-back instruction mentioned in this specification synchronizes a value in a cache to a corresponding location in a persistent memory. This operation does not invalidate the cache block itself and therefore does not adversely affect the cache hit rate during execution of the transaction. Meanwhile, since the data in the view of the application program is not changed in the process and the processor provides proper cache consistency, the asynchronous thread executing the cache write-back operation according to the embodiment of the invention does not affect the correctness of the foreground thread execution.
Drawings
FIG. 1 shows a schematic overall process diagram of a method for a persistent transaction for persistent memory according to an embodiment of the invention.
Fig. 2 is a main flowchart of a memory transaction persistence method based on asynchronous write-back of cache blocks according to the present invention.
FIG. 3 illustrates a memory allocation and data flow process according to an embodiment of the present invention.
FIG. 4 is pseudo code for the recycle thread backbone flow of the present invention.
FIG. 5 is a graph of throughput versus number of work threads for a system implemented in accordance with the present invention and a comparison system in hash table, red-black tree, and B + tree loads.
FIG. 6 is a data write back per operation versus skewness distribution coefficient per hash table load curve for a system implemented in accordance with the present invention and a comparison system.
FIG. 7 is a table of execution time versus number of threads for a system and comparison system implemented in accordance with the present invention in a Stanford multiprocessor transaction application test Set (STAMP) actual test load.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the drawings and examples (example transactions).
FIG. 1 shows a schematic overall flow diagram of a method for a persistent transaction for persistent memory according to an embodiment of the invention.
As shown in the left-hand box of fig. 1, for each transaction:
a log persistence step, namely constructing and persisting a log on a persisted memory;
a data update step, updating data on volatile memory (including but not limited to processor cache), and then identifying the transaction as committed.
As shown in the right-hand box of fig. 1, data update on the persistent memory for a single transaction is delayed, after a group of transactions is accumulated, data is synchronized to the persistent memory for the group of transactions by the processor, a persistence operation (i.e., persistence to the persistent memory) is performed on the data updated in the foregoing data update step, and the corresponding log is deleted.
Preferably, a worker thread and a recycle thread are established and maintained, the worker thread performs the operations shown in the left-hand box in fig. 1, and the recycle thread performs the operations shown in the right-hand box in fig. 1 as a background thread, for example, the recycle thread checks the transaction execution status according to a predefined time period and completes the persistence and log recycle work on the data address space.
Compared with the existing transaction atomization submission method, the transaction persistence method provided by the embodiment of the invention has the advantages that the data persistence operation on the persistent memory is moved out of the transaction submission stage, so that the problem that most of extra expenses in a persistent memory system are occupied by traditional persistent data is solved to a great extent, and the transaction submission time is shortened. More importantly, the time at which the relevant address space is protected by the transactional write lock. Other transactions, particularly read transactions, need to wait for the previous transaction to release the write lock to obtain committed memory data on the same address space. Therefore, the method can also shorten the commit time of other transactions. In summary, the transaction persistence method of the embodiment of the present invention is significant for improving the transaction commit throughput rate. In addition, the operations of data persistence on the persistent memory are executed in bulk on a set of transactions (generally by a write-back instruction), and by write-back at the granularity of an in-memory unit, i.e. the granularity of a cache block, the problem that the same cache block is repeatedly written back to the persistent memory multiple times in the conventional mode is alleviated or even largely avoided. The method also reduces or avoids the problem that the granularity (8 bytes) of the single-time read-write memory of the user is inconsistent with the actual granularity (64-256 bytes) of the persistent memory. Experimental results prove that the method can achieve the maximum brushing performance, reduce the time consumed by the processor due to brushing, and also has positive significance for improving the transaction submission throughput rate.
It should be noted that Group commit (Group commit) is a common technique for transaction commit optimization in the field of databases, and the idea is to merge actions of multiple transaction log writes and reduce sequential disk writes. In contrast to traditional group commit, the present invention still requires each transaction to persist its own created log separately, with the merged write back being only the data involved in the (P3) th of the three phases of the redo log technique described in the summary of the invention section above. Therefore, unlike group commit, which requires accumulating a certain number of transactions and committing the group of transactions together at the end, the transactions in the present invention commit immediately after completing the two phases (P1), (P2) in the redo log technique described above.
Figure 2 illustrates a more detailed general flowchart of a memory transaction persistence method based on asynchronous write back of cache blocks, according to an embodiment of the invention.
FIG. 3 illustrates a memory allocation and data flow process in an embodiment of the invention.
For ease of understanding, the following description will be made with an example transaction that requires the data for three addresses a, b, and c in persistent memory to be modified to X1, Y3, and Z2, respectively, and must satisfy the atomicity, consistency, isolation, and persistence (ACID properties) of the transaction. This example is intended to illustrate the invention, but not to limit the scope of the invention.
As shown in fig. 2, in step S1, a worker thread and a data persistence thread (hereinafter sometimes referred to as a data persistence thread as a recycle thread) are initialized; then, in the second stage, the working thread executes steps S2.1, S2.2, and S2.3 in the foreground, specifically, in step S2.1, a log is created, in step S2.2, the log is persisted, and in step S2.3, the completion of the data update flag is submitted; in the third phase, the background thread executes steps S3.1, S3.2, S3.3, S3.4, specifically in step S3.1, detects a committed transaction, and in step S3.2, determines a cache block to be flushed; in step S3.3 the cache block is flushed; the log is deleted in step S3.4. It should be noted that there is no sequential relationship between the third phase and the second phase in time, i.e. the two phases may be executed sequentially or in parallel.
FIG. 3 illustrates an example transactional memory allocation and data flow process, according to an embodiment of the invention. As shown in fig. 3, the persistent memory is divided into two regions as needed: the data structure area and the redo log area, wherein (1) the data structure area is visible to the user and can store the data structure, and the user can complete the operation on the data by using a specified transaction interface according to the user's own will; (2) the redo log area contains log records for each completed commit transaction for control by the system.
At the level of the operating system above (including the application and the system implemented in accordance with the present invention), each data structure appears to have only one copy in the address space of the persistent memory. But due to the presence of a volatile Cache (Cache) located in the CPU, two data versions are actually formed: version of data V1 located in persistent memory (lower part of FIG. 3) and version V2 visible to applications (upper part of FIG. 3). When a system failure event occurs, only V1 is retained due to the cache block data loss, which is also the version visible to the application for the next execution (note that it is different from V2 above). If a memory word has just been modified, V2 must be modified accordingly, but V1 may not be modified (and may also be modified). When the processor evicts a cache block (there is no way to know on the operating system whether the cache block is evicted) or actively calls the CLWB instruction, it can be guaranteed that V1 is modified to be consistent with V2. The processes marked in fig. 3, namely the processes marked in (a) and (b), are essentially the processes of explicitly calling CLWB instructions by the system implemented by the invention and ensuring that V1 and V2 are consistent.
Conventional databases often use DRAM as a large-capacity cache to avoid repeatedly reading and writing disks. Therefore, the database module can sense whether the page is positioned in the database cache, actively participate in the processes of allocating and eliminating the page, and ensure the consistency of the page content of the application program view angle and the corresponding page content on the disk. In contrast, after the processor cache based on the write-back mode issues a write memory instruction, the updated data may be stored only in the cache block, and not written into the persistent memory; it is also possible to write back to persistent memory at any later time. This process is not visible to software. The system realized based on the invention can only ensure that a certain cache block is written into the persistent memory by explicitly calling a cache line write-back instruction (CLWB) provided by the processor, and cannot detect a cache block elimination event. Due to the above constraints, the present invention is fundamentally different from the transaction commit method of the conventional database.
The system operation method according to the embodiment of the present invention is described below with reference to fig. 2 and 3.
S1, initialization phase. The number of worker threads, i.e., threads that execute transactions and complete the commit of transactions, is set by the transactional memory user. The system establishes a plurality of recovery threads according to needs, and each recovery thread fixedly processes transactions and logs thereof generated by a plurality of working threads. Each recycle thread periodically checks, on a regular basis, whether there are transactions that have completed committing and are unprocessed. If so, one of the threads performs the process described in step S3 to complete the asynchronous processing stage.
As an example, during the initialization phase, each recycle thread also needs to build a private bitmap (other threads are not visible to avoid synchronization overhead) and all zeroes. It is operated on using conventional read and write instructions. The persistent memory address space is partitioned at the granularity of cache blocks (typically 64 bytes), each block corresponding to a Bit on a bitmap. A separate bitmap is currently allocated for each recycle thread. For example, if the size of the persistent memory address space is 4GB, the bitmap space requirement is 4096/64/8-8 MB, so the storage overhead is acceptable. The bitmap may be stored in conventional memory, such as DRAM, which does not require persistence.
S2, the transaction execution phase, can be further subdivided into two sub-phases. The first phase is from the start of the transaction (TX _ BEGIN) to the COMMIT of the transaction (TX _ COMMIT). The stage adopts the same concurrency control method as the traditional transactional memory. For a transactional Write (TX _ STORE) operation, an address and a new value need to be added to the Write Set (Write Set) located on DRAM. The old value is still retained at the target address at this time. During the second phase transaction COMMIT (TX _ COMMIT), write locks are applied to all write addresses of the first phase to prevent other transactions from reading and writing the same variable; after verifying the atomic consistency of the passed transaction, the following steps are sequentially completed:
s2.1, constructing a redo log obtained during the execution of the transaction. The present system builds a log by extracting the address, new value of each write request, such as for embodiments that require (a, X1), (b, Y3), and (c, Z2) to be extracted from the aforementioned write set. In addition, in order to ensure correct recovery, the transaction sequence number (T11) needs to be recorded together to form a log record (step 1 shown in fig. 3 and indicated by the reference numeral (r)). The transaction sequence number is a 64-bit long unsigned integer. If there is a dependency between the two transactions, i.e. transaction b commits after transaction a and transaction b reads (or writes) the variable written by transaction a, then the sequence number of transaction b must be greater than transaction a. In addition, the sequence of the sequence numbers of all the transactions under the same thread is consistent with the sequence of the commit time. This information helps to achieve proper transaction recovery.
And S2.2, generating a redo log data structure and adding the redo log data structure to a log container to which the current corresponding working thread belongs. When the log container has no empty space to insert a new record, the corresponding working thread waits for the background recovery thread to finish the work of recovering the space. These records will then be persisted. The log container adopts a mode of a circular queue, and a log unit between a head pointer and a tail pointer of the queue is regarded as a valid log unit. The modification and persistence operations of the head pointer and the tail pointer are inseparable and can be realized by Link-and-persistence and other technologies. The redo log generated in this step is incorporated into an effective log unit by modifying the head pointer of the log and writing it back to the persistent memory, thereby indicating that the transaction is persistent (step 2 shown in fig. 3, indicated by a label @).
S2.3, the transaction guarantees persistence after the end of the previous phase, which requires the execution of these write operations, but does not require persistence. After this stage is completed, other transactions can access the updated value of the transaction, thus meeting the condition of transaction commit (T1): after the commit point, all threads can see the values updated by the transaction. At this time, it cannot be guaranteed whether the new value is persistent, but because the log provides sufficient information, even if a crash occurs, we can still recover to the state where the transaction is completely executed by replaying the log, so as to ensure the atomicity of the transaction (step 3 shown in fig. 3, indicated by the reference symbol (c)), that is, all memory write operations of a single transaction can be completely executed. This indicates that the (T2) condition for transaction commit is met: after the commit point, if the system is down, the values updated by the transaction or subsequent committed transactions can be seen after the necessary recovery process is completed. After the above process is finished, the transaction commit is completed.
Each worker thread has a private "last committed transaction sequence number" field, and this variable itself does not require persistence. As mentioned above, the sequence number sequence of all transactions under the same thread is consistent with the commit time, so the field of the sequence number of the latest committed transaction only stores one value, which indicates that the transactions less than or equal to the value in the working thread are all committed, and therefore can be recycled for subsequent processing. This does not represent that transactions less than or equal to the value in other worker threads have all committed. The background recycle thread would interrogate this value to determine which transactions have completed the S2.1-S2.3 steps (i.e., which "committed" transactions are), and recycle the log securely. After step S2.3 is completed, the "last committed transaction sequence number" field of the corresponding worker thread needs to be updated.
S3, when the recovery thread detects the committed transaction, the execution effect of the transaction can be persisted and the log space can be recovered through the following steps. As an example, to improve performance, a method of batch processing a plurality of transactions at fixed time intervals may be employed.
Log records can be safely deleted if the transactions that created them, and the transactions on which they depend (earlier committed transactions with read-write or write-write conflicts) have been persisted. The reason why the latter (earlier committed transactions with read-write or write-write conflicts) is guaranteed is that the persistence and write-back processes are protected by concurrent control, and it is unlikely that a dependent transaction will persist later than the current transaction. It is only necessary to ensure that the former (i.e., the transaction that created these log records has been persisted). This process includes the following three stages.
S3.1, preparation phase. This phase requires the collection of the "most recently committed transaction sequence number" for each target worker thread (lines 2-3 of FIG. 4). Finally, the distinction is done using a Memory Fence (MFENCE) and the next stage to avoid re-ordering (line 4 of FIG. 4).
In one example, each target worker thread has an independent "most recently committed transaction sequence number," indicating that of all transactions initiated by that worker thread, transactions less than or equal to that value are "committed" (transaction sequence numbers within each thread are monotonically increasing in actual run order), so that their logs can be safely deleted in step S3.4.
The "most recently committed transaction sequence number" of all target worker threads is obtained uniformly during the prepare phase because the transaction is still committed normally and is not halted during the entire reclamation process. If a transaction initiates a commit after S3.2 (trace phase) begins, completing the commit before S3.3 (flush phase) ends may result in modification of the data structure located in the cache, but the flush of the corresponding cache block is completed before this process, so the latest value is not written back to the persistent memory, and the log is deleted, thereby compromising consistency. This situation needs to be avoided so we have to determine the transaction range involved before starting the flush.
S3.2, tracking. The recovery thread scans the log container according to the 'latest submitted transaction serial number' of each working thread, and finds the log records of all submitted transactions. Each transaction includes several regular records, containing modified addresses and new values, but only the address information needs to be read at this stage. Unlike the basic approach where a write-back primitive (e.g., CLWB) is called uniformly for each address, the recycle thread sets the value of the corresponding location on the bitmap to 1 (line 9 of fig. 4), indicating that the cache block is about to be flushed. Persistence with the cache block as a granularity not only affects the latest value of the address where the current log record is located, but also flushes nearby addresses back to the persistent memory device. Therefore, multiple modifications to the same cache block can be combined into one time, so that the transaction submission time of a working thread is shortened, the time during address locking is reduced, the data write-back quantity is reduced, the data persistence is prevented from becoming the performance bottleneck of a transactional memory, and the high expandability and low persistence overhead of the system are realized.
S3.3, a flashing stage. The recycle thread retrieves the bitmap and, when a cache block is hit, issues a write back (CLWB) operation on that cache block. Finally, a persistent fence is introduced to ensure that the write-over task has been completed before the next stage.
S3.4, a truncation stage. Because the effects of these transactions have persisted to persisted memory (unless subsequent transactions modify the value in the same location, but the updated data is available from the log and therefore does not compromise the consistency of the data), log records scanned in the last phase can be safely truncated. In one example, by simply modifying the tail pointer and persisting the tail pointer to indicate truncation of the log, the corresponding log unit no longer becomes a valid log unit. In addition, the bitmap needs to be reset to zero out all of it. To this end, the recycle thread may pause and wait for the next cycle, i.e., resume from step S3.1, until the application exits.
Through the process of S2, the transaction is completed in the form of log; the transaction completes the persistence in the form of a data structure on the persistent memory, via the process of S3.
The behavior of the recycle thread is essentially to ensure that the data structure has been written back correctly to the persistent memory device and to securely delete the log. A similar task in conventional databases is snapshot generation, writing the cache on DRAM back to disk. The key differences between the two are as follows: (1) the traditional database writes new data into a database file through file I/O operation with a coarser granularity (such as a page), the transmission data volume is larger, and the time interval is long; the invention synchronizes data to the persistent memory device through the processor instruction with the granularity of the cache block, the process does not need kernel intervention, and the time interval is short; (2) while conventional databases require careful handling of executing transactions during snapshot generation to avoid disrupting database consistency, the present invention allows any write transactions to be committed during the execution of the recycle thread (i.e., the write transaction thread is not blocked by the recycle thread).
If the system is crashed, when the system is started next time, only the data structure on the persistent memory needs to be read, and the effective logs are replayed in sequence according to the serial numbers, so that the consistent data structure can be obtained.
The steps of the recovery process are as follows:
s4.1, in the recovery phase, only the transaction sequence number can be used to determine a serializable order of transactions. The ordering of these transactions in the order of increasing transaction sequence numbers must be correct. The order in which correct execution is accomplished requires that the transaction sequence number is generally relaxed to increment, but the overhead of reconstruction tends to be greater than reordering. It is required that all valid log records be scanned and arranged in the order of increasing transaction sequence numbers in persistent memory.
The transaction sequence number may not be continuous because the corresponding transaction was terminated or the log crashed without having persisted. Both cases do not affect the data structure and can therefore be safely ignored.
And S4.2, replaying one transaction by one transaction according to the sequence of S4.1, namely writing new data into a corresponding address for each record.
And S4.3, determining a cache block address list needing to be refreshed in the playback.
And S4.4, for each cache block address obtained in the step S4.3, persisting the modification to the memory in the step S4.2 by calling a write-back instruction.
Because memory accesses are generally localized in real systems, such optimization can also significantly improve recovery performance.
And S4.5, deleting all logs.
According to the memory transaction persistence method based on asynchronous write back of the cache block, time-consuming data persistence is transferred to the background thread, and meanwhile, a plurality of log entries of the same cache block are merged and written back, so that the transaction submission time of a working thread is shortened, the time during address locking is reduced, the data write back quantity is reduced, the data persistence is prevented from becoming the performance bottleneck of a transaction memory, and high expandability and low persistence overhead of a system are achieved. If the data is in the biased distribution, the invention can further reduce the data write-back requirement and has obvious effect on improving the performance. Finally, the method has universality and can also be used for other transaction processing systems, persistent data structure implementation and the like. In particular, the method provided by the invention can be used for constructing the persistent transactional memory not only on the software transactional memory, but also on the general hardware transactional memory.
To verify the practical use effect of the method proposed by the present invention, the inventor has designed an experimental environment based on the intel alavine memory and the four-processor totalization 72 core, and in order to avoid the influence of the asymmetric memory access (NUMA) effect on the performance, the test is only performed on the fixed processor node. The system (XIHE-TM) realized according to the embodiment of the invention realizes three data structures of a B + tree, a red-black tree and a hash table, respectively tests the writing performance of the system in uniform insertion and Zipf skewed distribution updating of two loads, and transplants Stanford multiprocessor transaction application test Set (STAMP) test software to adapt to the requirement of persistent transaction memory. The log space per worker thread is 2MB by default.
The selected comparison system comprises: volatile implementations (non-persistent transactional memory implementations based on TinySTM PPoPP 08, but nodes are still allocated from persistent memory, representing the theoretical performance upper bound of all persistent transactional memory), dude tm, romulus lr, etc. The data below are all the average values obtained by 10 measurements.
FIG. 5 shows throughput-work thread number curves for XIHE-TM versus the above-described comparison system in hash table, red-black tree, and B + tree loads. Wherein the keys and values are 64-bit integer, the range of the keys is 2M, and the read-write ratio is 2: 8. RomulusLR does not support concurrent execution of two or more write transactions, so that the RomulusLR has no expandability on performance and the throughput rate is lowest; dude's perform better than other implementations when the number of threads is small, sometimes even better than volatile implementations because transaction execution is done on DRAM. However, as the number of threads increases and the number of rebuilt threads is still only 1, the rebuilt threads become a bottleneck of the system, and the scalability of the rebuilt threads is still limited. Compared with the first two jobs, the XIHE-TM has good expandability because adverse effects on overall performance caused by a single recovery thread are avoided, and read-write transaction parallelism and fault consistency are realized by utilizing the existing lock mechanism of a transaction memory. The maximum throughput of XIHE-TM is 3.22 times that of Dude TM and 11.4 times that of Romulus LR. Furthermore, XIHE-TM can also achieve higher throughput if the data distribution is more concentrated.
FIG. 6 is a data write back per operation versus skewing profile coefficient per hash table load curve for a system XIHE-TM and a comparison system implemented according to an embodiment of the invention. Each transaction write operation in Romulus LR requires 2.125 data flushes on average, Dude TM requires 1.43 data flushes on average, and data distribution has a limited effect on the amount of flushes. The number of flushes for XIHE-TM is smaller than for Dude TM, and when the data set distribution, i.e., Zipf distribution coefficient, is increased, the number of flushes for XIHE-TM is further decreased to a minimum of 55% of Dude TM (at this time the skewing distribution coefficient is 0.99). If the log size is further enlarged, namely the flashing period is prolonged, the curve is moved downwards, and at the moment, cache blocks with higher probability are merged, so that the reduction of the flashing number is realized.
FIG. 7 is a table of execution time versus number of threads in each load under the Stanford multiprocessor transaction application test Set (STAMP) for a system and comparison system implemented in accordance with the present invention. The inventors analyzed two test sets, namely Vacation and SSCA 2. In the Vacation test set, because the DudeTM transaction execution is done on DRAM, it is more efficient when the number of cores is less than 8, while XIHE-TM performs better when 16 cores. SSCA2 is not much, primarily computing-based, code that is protected by the transaction mechanism. The processing time of XIHE-TM at 2 cores and above is shorter than that of Dude TM, and the maximum performance increase can be 5.82 times.
The inventors also measured and analyzed the recovery time required under this mechanism. The time required for recovery is related to the number of valid logs stored at the time of failure, which is affected by the runtime state, the maximum capacity of the logs, and so on. Under the default configuration, the time to restore the hash table data structure is approximately 335 milliseconds.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and substitutions can be made without departing from the technical principle of the present invention, and these modifications and substitutions should also be regarded as the protection scope of the present invention.

Claims (14)

1. A method of persisting transactions for persisted memory comprising:
for each transaction, performing:
a log persistence step, namely constructing and persisting a log on a persisted memory;
a data updating step, namely updating data on the volatile memory, and then identifying the transaction as committed;
the data update of a single transaction on the persistent memory is delayed, after a group of transactions is accumulated, the data is synchronized to the persistent memory for the group of transactions by the processor, and the corresponding log is deleted.
2. The method of claim 1, further comprising:
and establishing and maintaining a working thread and a recovery thread, wherein the working thread builds and persists a log during the transaction submission period, and the recovery thread checks the transaction execution state according to a predefined time period and completes the persistence and log recovery work on a data address space.
3. The method of claim 2, building and persisting a log during transaction commit comprising:
the log entry is inserted with the memory write address and its new value in the transaction, and the transaction sequence number information, ensuring that the log is synchronized to the persistent memory.
4. The method of claim 1, the updating data on volatile memory followed by identifying a transaction as committed comprising:
new data is written to the specified target address using read and write instructions and this transaction is marked as "committed," allowing the front-end worker thread to begin executing the next transaction.
5. The method of claim 2, wherein after accumulating a set of transactions, synchronizing, by the processor, data to the persistent memory for the set of transactions, and deleting the corresponding log comprises:
when the recovery thread detects the submitted transaction and the corresponding log is not deleted in one period, reading the addresses corresponding to the transactions, and determining a cache block address list needing to be refreshed; calling a cache write-back instruction only once for each cache block address by taking the cache block as a granularity unit; finally, the recycle thread securely deletes the log involved in this step.
6. The method of claim 2, the persistent memory being divided into two regions: a data structure area and a redo log area; the data structure area is visible to the user and is used for storing the data structure, and the user can complete the operation on the data by using a specified transaction interface according to the user's own intention; the redo log area contains log records for each completed commit transaction for control by the system.
7. The method of claim 2, the identification of whether the transaction was committed being stored in a bitmap, the bitmap being established by the recycle thread and visible only to the recycle thread, the cache block on each persistent memory corresponding to a bit on the bitmap.
8. The method of claim 2, the bitmap being stored in a DRAM.
9. The method of claim 2, after the reclamation thread determines that a cache block is to be flushed, setting a value for a corresponding location on a corresponding bitmap; and resetting the bitmap after the flush.
10. The method of claim 2, wherein the worker thread maintains an identification identifying the most recently committed transaction, and wherein the recycle thread queries the identification to determine which transactions have completed the data update step, i.e. detects committed transactions, and performs a persistence operation on the data of the committed transactions according to a predetermined cycle, and deletes the corresponding log.
11. The method of claim 2, wherein the recycle thread checks the transaction execution status for a predefined period of time and performs the persistence and log recycle operations on the data address space comprises:
a preparation phase, namely collecting the latest submitted transaction identification of each target working thread, and distinguishing by using a memory fence instruction and a next phase to avoid reordering;
setting a bitmap according to the collected latest submitted transaction identifier of each target working thread, and identifying a cache block to be refreshed;
and in the flash stage, the recovery thread retrieves the bitmap, when a certain cache block is hit, a write-back operation on the cache block is sent out, and finally, a persistent barrier is introduced to ensure that the flash task is completed before the next stage.
And a truncation stage, namely truncating the log record scanned in the previous stage, wherein the truncation stage comprises modifying a tail pointer of the log, persisting the tail pointer to represent the truncation of the log, and resetting a bitmap to clear all bitmaps.
12. A method of data recovery on persistent memory, comprising:
scanning all effective log records in a persistent memory, and arranging the effective log records according to the sequence of the transaction sequence number;
replaying transaction by transaction, namely writing new data into a corresponding address for each record;
determining a cache block address list which needs to be refreshed in the playback;
for each obtained cache block address, persisting modifications to the memory in the replay by calling a write-back instruction; and deleting all logs.
13. A computing device having a processor, volatile memory and persistent memory, the processor being operative to perform the method of any of claims 1 to 12.
14. A non-transitory computer readable medium having stored thereon computer instructions which, when executed by a computer, perform the method of any one of claims 1 to 12.
CN202110605179.3A 2021-05-31 2021-05-31 Transaction persistence method and system for asynchronously writing back to persistent memory Active CN113220490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110605179.3A CN113220490B (en) 2021-05-31 2021-05-31 Transaction persistence method and system for asynchronously writing back to persistent memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110605179.3A CN113220490B (en) 2021-05-31 2021-05-31 Transaction persistence method and system for asynchronously writing back to persistent memory

Publications (2)

Publication Number Publication Date
CN113220490A true CN113220490A (en) 2021-08-06
CN113220490B CN113220490B (en) 2024-11-26

Family

ID=77081923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110605179.3A Active CN113220490B (en) 2021-05-31 2021-05-31 Transaction persistence method and system for asynchronously writing back to persistent memory

Country Status (1)

Country Link
CN (1) CN113220490B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237500A (en) * 2021-12-09 2022-03-25 北京美信时代科技有限公司 Method and system for improving writing efficiency through cache transaction
CN115878638A (en) * 2021-09-29 2023-03-31 超聚变数字技术有限公司 Database transaction processing method, electronic equipment and device
TWI812012B (en) * 2021-09-06 2023-08-11 日商鎧俠股份有限公司 information processing device
WO2023159976A1 (en) * 2022-02-28 2023-08-31 华为技术有限公司 Data segmented writing method, data reading method and apparatus
CN117632016A (en) * 2023-11-28 2024-03-01 天翼云科技有限公司 A distributed storage asynchronous data compression method
CN117827688A (en) * 2023-12-01 2024-04-05 天翼云科技有限公司 A log space recovery optimization method for cache
CN119248524A (en) * 2024-12-06 2025-01-03 江苏华库数据技术有限公司 Database storage space management method, device, equipment, medium and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881371A (en) * 2015-05-29 2015-09-02 清华大学 Persistent internal memory transaction processing cache management method and device
CN112214171A (en) * 2020-10-12 2021-01-12 华东师范大学 A non-volatile memory buffer design method for SQLite database
CN112347107A (en) * 2020-11-11 2021-02-09 Oppo(重庆)智能科技有限公司 Data persistence method, mobile terminal and computer-readable storage medium
CN112540931A (en) * 2020-12-16 2021-03-23 华中科技大学 Method and processor for ensuring data breakdown consistency in secure nonvolatile memory

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881371A (en) * 2015-05-29 2015-09-02 清华大学 Persistent internal memory transaction processing cache management method and device
CN112214171A (en) * 2020-10-12 2021-01-12 华东师范大学 A non-volatile memory buffer design method for SQLite database
CN112347107A (en) * 2020-11-11 2021-02-09 Oppo(重庆)智能科技有限公司 Data persistence method, mobile terminal and computer-readable storage medium
CN112540931A (en) * 2020-12-16 2021-03-23 华中科技大学 Method and processor for ensuring data breakdown consistency in secure nonvolatile memory

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SUN LONG 等: "DP2:Reducing transaction overhead with differential and dual persistency in persistent memory", 《HTTP://OR.NSFC.GOV.CN/HANDLE/00001903-5/417160, 1 April 2018 (2018-04-01) *
陈娟 等: "一种基于微日志的持久性事务内存系统", 计算机研究与发展, no. 09, 15 September 2018 (2018-09-15) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI812012B (en) * 2021-09-06 2023-08-11 日商鎧俠股份有限公司 information processing device
CN115878638A (en) * 2021-09-29 2023-03-31 超聚变数字技术有限公司 Database transaction processing method, electronic equipment and device
CN114237500A (en) * 2021-12-09 2022-03-25 北京美信时代科技有限公司 Method and system for improving writing efficiency through cache transaction
WO2023159976A1 (en) * 2022-02-28 2023-08-31 华为技术有限公司 Data segmented writing method, data reading method and apparatus
CN117632016A (en) * 2023-11-28 2024-03-01 天翼云科技有限公司 A distributed storage asynchronous data compression method
CN117827688A (en) * 2023-12-01 2024-04-05 天翼云科技有限公司 A log space recovery optimization method for cache
CN117827688B (en) * 2023-12-01 2025-03-11 天翼云科技有限公司 Log space recovery optimization method applied to cache
CN119248524A (en) * 2024-12-06 2025-01-03 江苏华库数据技术有限公司 Database storage space management method, device, equipment, medium and program product

Also Published As

Publication number Publication date
CN113220490B (en) 2024-11-26

Similar Documents

Publication Publication Date Title
CN113220490B (en) Transaction persistence method and system for asynchronously writing back to persistent memory
US11379324B2 (en) Persistent memory transactions with undo logging
Arulraj et al. Bztree: A high-performance latch-free range index for non-volatile memory
Arulraj et al. Write-behind logging
EP3493071B1 (en) Multi-version concurrency control (mvcc) in non-volatile memory
US9798630B2 (en) Hardware-supported memory temporal copy
Levandoski et al. High performance transactions in deuteronomy
US5369757A (en) Recovery logging in the presence of snapshot files by ordering of buffer pool flushing
JP3593366B2 (en) Database management method
US7376674B2 (en) Storage of multiple pre-modification short duration copies of database information in short term memory
US5455944A (en) Method for managing logging and locking of page free space information in a transaction processing system
US7266669B2 (en) File system with file management function and file management method
US8700585B2 (en) Optimistic locking method and system for committing transactions on a file system
US7996363B2 (en) Real-time apply mechanism in standby database environments
US8103838B2 (en) System and method for transactional locking using reader-lists
CN110515705B (en) Scalable persistent transactional memory and how it works
US10482013B2 (en) Eliding memory page writes upon eviction after page modification
EP4002148A1 (en) Cloud-native object storage for page-based relational database
US11593352B2 (en) Cloud-native object storage for page-based relational database
CN111752685B (en) Persistent memory transaction commit method under multi-core architecture
CN113722052A (en) Nonvolatile memory updating method based on data double versions
JP3107094B2 (en) Method and apparatus for shortening shared buffer lock period
Du et al. Buffered Persistence in B+ Trees
KR20210058613A (en) Locking method for parallel i/o of a single file in non-volatiel memeroy file system and computing device implementing the same
Du et al. Reconciling Hardware Transactional Memory and Persistent Programming with Buffered Durability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant