This application claims the entitled HARDWARE-SUPPORTED MEMORY TEMPORAL submitted on March 8th, 2013
The priority of the U.S. Provisional Patent Application No. 61/775,041 of COPY AND LOGGING, the U.S. Provisional Patent Application go out
It is incorporated by reference into this in all purposes.
Specific implementation mode
The present invention can realize in many ways, including be implemented as:Process;Equipment;System;Material composition;It is embodied in
Computer program product on computer readable storage medium;And/or processor, such as following processors:It is configured to hold
Row is stored in the instruction that on the memory for being coupled to the processor and/or the memory by being coupled to the processor provides.At this
In specification, these embodiments or any other adoptable form of the present invention can be referred to as technology.In general, institute is public
The sequence of the step of open procedure can change within the scope of the invention.Unless stated otherwise, such as it is described as being configured to
The processor of execution task or the component of memory etc may be implemented as by provisional configuration into the execution times at given time
The general component of business or the specific component for being manufactured into execution task.As used herein, term " processor " relates to
And one or more device, circuits, and/or it is configured to processing data(Such as computer program instructions)Processing core.
Retouching in detail for one or more embodiments of the invention is together provided with the attached drawing of the diagram principle of the invention below
It states.The present invention is described in conjunction with these embodiments, but the present invention is not limited to any embodiments.The scope of the present invention is only by right
It is required that limit, and the present invention includes many replacements, modification and equivalent.Many specific details are elaborated in the following description
To provide a thorough understanding of the present invention.These details are provided for exemplary purposes, and without some or all of
It can implement the present invention according to claim in the case of these specific details.For purposes of clarity, it is not described in
Known technologic material in technical field related to the present invention, so as not to unnecessarily obscure the present invention.
Describe the temporary copy and log recording of the hardware supported of memory.In some embodiments, using with center
The hardware component of processing unit separation provides hardware supported.In various embodiments, in order to support temporary copy, based on known
Storage state and log information generate snapshot.In various embodiments, storage indirectly is based at least partially on to indicate to come really
Determine log information.
Figure 1A is the frame of the embodiment of the system for the temporary copy for illustrating the hardware supported for being configured to provide memory
Figure.
System 100 includes the one or more central processing unit for being configured to execute program instructions(CPU, also referred to as
Application processor or processor)102, the one or more for being configured to provide interim low latency storage to CPU 102 is high
Speed caches 104 and is configured to provide the main memory 108 of instruction and data to CPU 102.Main memory 108 is typically
With than 104 bigger of cache capacity and the higher stand-by period.In some embodiments, cache is using static state
Random access memory(SRAM)It realizes, and main memory is to use dynamic random access memory(DRAM)It realizes.
Other embodiment is possible.In addition, the system can have additional storage, such as disk.
The copy of the data frequently used is stored in cache 104.When CPU 102 needs data(For example, working as
When using data from database request particular segment), cache 104 is examined first.If do not found in cache 104
Cache miss then occurs for data, and examines main memory 108 with location data.
In this example, storage control 106 is configured to manage the data flow to and from main memory 108(Packet
Include instruction), consequently facilitating the access by CPU 102 to main memory 108.Storage control 106 is implemented as dividing with CPU 102
From module, and both parts need not be in direct communication with each other(In other words, they need not have direct interface or connection).It deposits
Storage controller 106 and CPU 102 can exchange data via cache 104.
Copy coprocessor(CCP)110 are configured to be cooperated with CPU to support consistency reading and log recording function.
As described in more detail herein, CCP 110 is configured to execute such as copy data and offer snapshot etc
Action.CCP 110 is considered as the hardware component detached with CPU 102.CCP need not have to be directly connected to CPU(For example, connecing
Mouth, bus).In some embodiments, CCP and CPU are implemented on the chip or circuit of separation.In various embodiments, CCP
By to docked with CPU 102 from storage control 106 and/or 104 transmission data of cache.In some embodiments
In, CCP 110 is implemented as the component detached with storage control, and both parts are led to each other via communication interface
Letter.In some embodiments, CCP 110 and storage control 106 are integrated, a part for the circuit as storage control.
Data(Such as other of database or data are collected)It is stored in main memory 108.In some embodiments,
Specific memory section is designated as being logged.For example, one or more configuration registers can be arranged to refer to by operating system
Surely the address for the memory block being logged and size.The write-in of opposite memory block carries out log recording.In this example, it cancels
Daily record 112 and Redo log 114 are maintained by CCP 110 in main memory 108.For specific memory section(For example, particular address
The memory page at place), Redo log includes the update of executed, that is, examines the new value lighted from upper one.Cancel daily record includes
It examines to light from upper one and be updated by the value of overwrite by these(That is, old value).
In some systems, data are continually submitted, but less frequently are stored data at particular test point
Backing storage(For example, it is written to persistent data store, such as disk).Redo log allows through operations described below come in failure
Restore submitted state afterwards:From the fast of backing storage read data status at check point corresponding with earlier time
According to, and the state submitted in Redo log is then applied to inspection dotted state, so that data mode carries in time
The preceding state submitted to the end with log recording.Therefore, Redo log allow system to avoid must be in each submit by original place
Update is written out to the cost of long-time memory while still allowing the recovery from the loss of storage state.
Cancel daily record is used to copy until shape by the way that the entry of cancel daily record to be applied to the later time of state in reverse order
State returns to the state that it is at the appointed time located to provide the data mode at earlier time by " revocation ".It is somebody's turn to do " later time "
General case is current time, in this case, it is known that state it is corresponding with the current state of database.Cancel daily record is just
In the realization of atomic transaction(Atomic transaction includes the set for the write operation that must be submitted together or does not include write operation
Set), this is because conflicting caused by being written to the different affairs of identical data may be revoked.
For example, the initial storage value in memory block " 1 ", and it is subsequently modified into storage value " 2 ", then, " 1 " is stored in revocation
In daily record and " 2 " are stored in Redo log.Given original state " 1 " is simultaneously based on Redo log, it may be determined that later to carry
The state of friendship is " 2 ".Give later state " 2 " and cancel daily record, it may be determined that the state more early submitted is " 1 ".
In some embodiments, it indicates to indicate such as 108 etc physical storage to processor using indirect storage,
In indirect storage indicates, the real data row in the physical address and physical storage of processor publication(It is also referred to as high
Fast cache lines)There are the indirects of certain rank between position.The detailed example that this indirect storage indicates is found in for institute
Purposefully with it entirely through 8,407,428 Hes of United States Patent (USP) No. being incorporated by this attorney docket for HICAP001
For all purposes with it entirely through the United States Patent (USP) No. 7 being incorporated by this attorney docket for HICAP003,
In 650,460.
Figure 1B is to illustrate the exemplary figure that storage indicates indirectly.In this example, the page in main memory is divided
For section or row.Some in these rows are for storing actual data content and being referred to as data line.Some storages in these rows
The physics row identifier of reference data row(PLID), and it is referred to as conversion row or indirect row.As indicated, data line 152-156 is deposited
Store up real data, and physics row identifier(PLID)P1-P4 is used to form memory corresponding with proper data
Data line.Processor(For example, CPU)The addresses PLID that the address calculation issued from processor goes out are used as accessing indirectly
By the physical address of the processor publication of the data line of PLID references.For example, PLID P1 and P2 set(Indirect row)Reference data
Row 152 and 154 is gathered, corresponding with data content " ABCD ".Another PLID P3 and P4 set reference datas row 156 and 154
Set, it is corresponding with data content " EFCD ".In order to access data content " ABCD ", processor access include PLID P1 with
The physical address of the indirect row of P2, and then the data line comprising the data is positioned using these PLID, that is, with PLID1
With the corresponding data lines of PLID2.In some embodiments, storage control is by providing mappings of the PLID to data line come just
In data access.Gather including PLID(It quotes the corresponding set for the physical data row for including actual data content)Data knot
Structure is referred to as indirect row.Write operation is equivalent to storage at the position in conversion row entry corresponding with writing address
PLID changes into different PLID so that different data row is cited.
In some embodiments, by the array for the data line that storage organization for storing data is fixed size, often
A data line is addressed by PLID.Reference count is carried out to data row, and the data line can be shared.It in other words, can be with
In the presence of multiple PLID of reference individual data row.The size of data line depends on embodiment, and can be in different embodiments
It is different.In some embodiments, deduplication is carried out to data row(deduplicate)(In other words, each data line has unique
Content, and the PLID for quoting same data content is done so by quoting identical data row).For example, data content " CD "
It is used, but is only stored in individual data row by multiple PLID.
In some embodiments, each data line is immutable.In other words, once data line is assigned with particular value,
It does not just change within the duration of application.If necessary to which data are written, then by the indirect of the PLID of storage reference legacy data
Row entry is changed to the different PLID of storage reference new data.For example, row entry initially stores PLID P1, number of references indirectly
According to content AB.If data content needs, which are replaced by, changes into EF, entry is changed into PLID P3.
Technique described herein applies in general to the memory indicated using storage expression indirectly.Although more fully below
It discusses the indirect storage similar with content shown in Figure 1B to indicate, but other storage expressions indirectly can be used.Fig. 1 C
It is to illustrate another exemplary figure that storage indicates indirectly, wherein PLID is organized into directed acyclic graph(DAG).
Consistency is read
Fig. 2 is the embodiment for illustrating the consistency reading process realized in 100 etc the system of such as Figure 1A
Flow chart.In this example, process 200 by CCP in response to being called by the consistency read requests of CPU request.
At 202, the consistency read requests by the snapshot of specific time for memory block are received.Consistency reading is asked
Seek the snapshot for including position and memory block with interested memory block(That is, copy)Requested particular point in time is related
Information.In some embodiments, consistency read requests are the instructions sent from CPU to CCP via storage control.
At 204, temporary copy operation is executed.
In some embodiments, both cancel daily record and Redo log are used by temporary copy.In some embodiments,
Temporary copy operation selects revocation or Redo log including based on context.In some embodiments, temporary copy is being called
Daily record is selected before operation, and selected daily record is operated with by temporary copy.The selection can be controlled by CPU, storage
The progress such as device, CCP itself.As will be described in further detail, daily record selection depend on consistency reading process be by with
In executing destruction operation to obtain the snapshot for the data for being in the state more early submitted, recast operation is still used to carry out to obtain
Obtain the snapshot of the data of the state in later submission.In some embodiments, daily record is selected according to the specification of caller;
In some embodiments, daily record is selected based on the requested time.
Temporary copy operation includes the known state based on selected daily record, memory block(For example, being in submitted shape
The existing snapshot of the memory block of state)Associated timestamp generates snapshot with snapshot.Temporary copy is at the appointed time located to give birth to
At the snapshot of memory block.The snapshot of physical storage generated is provided to first processor with by being held in first processor
It is capable using.
Fig. 3 is the flow chart for the embodiment for illustrating temporary copy process.Process 300 can be used to implement 204 processes
200.In this example, temporary copy operation is designated as having following function interfaces:
temporalCopy(src, dest, timestamp);
Wherein, src and dest corresponds respectively to source storage location(For example, source cache position)With destination storage location
(For example, destination cache location).Time at specified time stamp(Such as:The morning 11 on January 12nd, 2014:00;
201401121100 etc.), function generation includes position src(For example, 0x10001111)The position dest of the buffer status at place
(For example, physical address 0x1000000)The caching at place.The storage state of Src is known, and the storage state of dest is to wait for
Fixed.In the function interface, it is known that state correspond to current time at src state.In some embodiments, the function
Interface can be provided for the specified time in addition to current time(Such as, src is set check point(checkpointed)With retain
To the time of disk)The additional parameter of the src states at place.In some embodiments, temporary copy function is called by CPU to indicate
CCP executes temporary copy function.
In some embodiments, temporary copy is executed to the memory block including one or more pages.In some embodiments
In, memory block is independently of page minor structure.For example, memory block may include multiple indirect rows of indirect storage organization(For example,
The array of PLID).For example, size is 4 kilobytes(The size of traditional page)Memory block can be divided into and each have 64 words
64 rows of section.If the size of PLID is 32 bits, drawn per-page using 4 conversion rows of 16 PLID of each storage
With the data line in the area.In other embodiments, other memory block/data line/PLID sizes can be used.
In some embodiments, src and dest is specified each indicates individual data structure, offer and source and destination
The related additional information in ground memory block itself.For example, in some embodiments, the application by src be appointed as virtual address without
It is physical address.In such an embodiment, individual data structure includes the virtual memory mapping of operating system, this can will be with
The associated file of source region, for cancelling and daily record and other attributes of the recast to the change in the area(Such as affairs behavior)Refer to
It is set to additional information.Dest can be specified similarly.Operating system software converts the virtual address to physical storage locations,
Ensure that the physical storage locations include content associated with the logic content and further determine that daily record will be from the additional information
It is used by temporary copy.In another embodiment, src is designated as the area in logical data sets.That is, which identify may position
L ogical data unit at another physical address or at any physical address for being not located at specified time.In such case
Under, realize that the software of the data set maintains the copy of instruction logical data is stored in where(For example, in what check point and height
In speed caching), how the other configurations instantiated to the data in memory are joined for associated with src daily record and control
Several additional informations.In some embodiments, dest parameters are omitted, and temporary copy is returned to the knot as temporary copy
The data of fruit are stored in the instruction of position therein.
In this example, at 302, by the data copy in the storage location of source to destination storage location.Between use
Storage is connect to indicate(Such as, those storage expressions indirectly shown in Figure 1B -1C)In embodiment to indicate memory, copy behaviour
Make to include PLID of the copy in conversion row.Since the real data row quoted by PLID is not copied, then the data copied
Amount can be considerably smaller than all data contents in the memory block of source, keep copy function very efficient.
At 304, known time associated with the known state of source storage location is stabbed(For example, being in known state
Current time in the case of current state)Associated specified time stamp is compared with the state to be generated.Compare
As a result be used to select appropriate daily record.In some embodiments, known time stamp is specified in the forward direction CCP of temporary copy operation
(Or the corresponding position of the entry in daily record).In some embodiments, temporalCopy functions include specify the information one
A or multiple additional parameters.
If timestamp is identical(For example, as it is known that both state and specified time stamp both correspond to current time), then known
State is identical as designated state, and there is no change.Therefore, the memory block in its known state is created not at 318
Modification copy, and the process terminates at 320.
Than the known time in stamp evening specified time stabs instruction by revocation cause the change of data in the memory block of source come
The more early state of data is generated, and therefore, selects cancel daily record.Correspondingly, at 306, cancel daily record is scanned, with
Identification at the appointed time can be applied to the change of source memory block submitted between known time.In some embodiments, should
Scan stabbed from the ratio known time in cancel daily record it is early most late(Or the day in the case where current time is used as known time
The ending of will)Start, also, when reaching the timestamp more early than specified time in daily record or when entire daily record has been swept
When retouching, which terminates.At 308, by following this sequences, change is applied to destination cachings:Application changing the latest first
Become, the change that at the appointed time source cache is carried out between known time to revocation.It is obtained in the caching of destination
Data are the expected datas by specified time.If unidentified go out to change, change is not applied.The process is then at 320
It terminates.
Known shape is had submitted in the memory block of source by re-applying than the early known time of specified time stamp stabs instruction
What is occurred after state changes to generate the later state of data, and therefore, selects Redo log.Correspondingly, right at 310
Redo log is scanned, with identify can be applied between known time and specified time source memory block submitted change
Become.In some embodiments, the scanning since the ratio known time in Redo log stab it is late most earlier, also, ought be in day
When reaching the timestamp more late than specified time in will or when entire daily record has been scanned, which terminates.At 312,
By following this sequences, change is applied to destination cachings:Earliest change is applied first, to re-apply when known
Between between specified time to source cache carry out change.If unidentified go out to change, change is not applied.The process then exists
It is terminated at 320.
In some embodiments, which optionally determines that at the appointed time place is with the presence or absence of the copy of memory block.For example,
It keeps memory block to be set the independent daily record of the time of check point, and whether there is at the time to determine using the independent daily record
Copy, and tested to revocation/Redo log to determine whether there is further changing for pair snapshot for setting check point.Such as
There is the snapshot for setting check point and there is no changing in fruit, then provide the logical copy of snapshot, and never call and as above illustrate
Re-create the process of snapshot.
In some embodiments, virtual-to-physical address transitional information is provided to CCP, and CCP is to support using void
The temporary copy of quasi- address.It can further use virtual address rather than physical address stores log information.
Fig. 4 A-4C are the exemplary data graphs for illustrating the data and daily record that are used in example consistency reading process.Figure
4A illustrates the data set of the change in experience affairs.In this example, data are stored in fabric memory.Specifically
Ground, memory block store indirect row, the PLID set of the corresponding set of the indirect row storage reference data row.Note that the value of PLID
It can be arbitrary, and be selected to the first, second, third and fourth data line of reference.
In t0=11:At 00, indirect row storage PLID P0, P1, P2 and P3, respectively reference store the data of A, E, C and F
Row.This is the state of the memory block when affairs start initially submitted.Entry is not present in revocation or Redo log.
In t1=11:At 05, the conversion row entry of storage PLID P3 is modified to PLID P9, PLID P9 reference D without
It is F.Therefore, cancel daily record has recorded:At time t1, the entry storage PLID P3 from capable beginning offset 3;And
Redo log has recorded:At time t1, the entry storage PLID P9 from capable beginning offset 3.
In t2=11:At 10, the conversion row entry of storage PLID P1 is modified to PLID P10, PLID P10 reference B and
It is not E.Therefore, the entry of following the description is specified in cancel daily record addition:At time t2, the item from capable beginning offset 2
Mesh stores PLID P1;And Redo log has recorded:At time t2, the entry from capable beginning offset 2 stores PLID
10.At this point, affairs are ready to be submitted.
In some embodiments, change and need to be retracted(It may be due to conflicting with other affairs).Therefore, in Fig. 4 B
In, restore snapshot earlier using later snapshot.Known time is 11:10 and specified time be 11:00.To destination
The copy of carry out source state(That is, the reference to the identical data row comprising A, B, C and D carries out source PLID P0, P10, P2 and P9
Copy).Cancel daily record is scanned and restores destination data row A, B, C and D set to determine how.According to institute in Fig. 4 A
The cancel daily record shown restores second entry to P1 from P 10(So that the data content B of lower layer is reconditioned to E), and by
Four entries are restored from P9 to P3(So that data content D is reconditioned to F).The recovery is by obtaining old value from cancel daily record
PLID is simultaneously written into specified translation entries and is performed.Generate the destination caching of reference data row A, E, C and F.
In some embodiments, later state is generated using relatively early the snapshot of check point is set.This is illustrated in figure 4 c.
Know that the time is 11:00 and specified time be 11:10.It is located in the copy of carry out source PLID P0, P1, P2 and P3 in purpose.Counterweight
It is scanned as daily record and is gathered in destination data row A, E, C, F with that will change to re-apply, wherein fourth entry changes from P3
For P9(And data content changes into D from F), and the second data line changes into P10 from P1(And data content changes from E
For B).Generate PLID P0 of reference A, B, C and D, the destination caching of P10, P2, P9.
In some embodiments, the scanning of daily record(The 306 of process 300 or 310)Data in source are copied into purpose
It is carried out before ground.For each page(Or subpage frame), maintain sets of bits corresponding with data line, wherein each bit pair
Ying Yuhang.Known time when known to the state in memory block(Such as, the beginning of affairs)Place's resetting sets of bits.If daily record is remembered
It records the particular items indicated in indirect row to be changed, then corresponding bit is marked.Only labeled source PLID is not copied
Shellfish is to destination.Still application changes to export the expected data row in destination.Illustrated to use Fig. 4 B and 4C, using than
Special mask 0000 indicates the entry 0-3 at affairs beginning.At affairs ending, obtained bit-masks are 0101, this is
Since the PLID of reference second and the 4th data line is changed.First and third data line of source cache(PLID P0 and P2)No
Become, and therefore, corresponding bit is not labeled.These PLID are copied into the corresponding position in the caching of destination.Second He
Fourth entry is labeled due to the change recorded in daily record, and is not copied into the second of destination caching and the 4th and is counted
According to row.It replaces, the change according only to daily record is copied into the corresponding position in the caching of destination.In this example, according to
Which daily record is used, by the P1 and P3 of reference data row E and F(Fig. 4 B)Or the P10 and P9 of reference data row B and D(Fig. 4 C)
Second and the 4th position being copied in row.
Other degenerations or modification of the operation as temporary copy may be implemented in CCP.In some embodiments, CCP is realized
" simultaneously " of source to destination copies(That is, the temporary copy in the case of at the appointed time identical with known time), utilizing
Exact copies are carried out while PLID copies are as optimization relative to actual copy data.In some embodiments, CCP is realized
" removing " on memory block, as the optimization version for copying complete zero source section.In some embodiments, CCP may be implemented to remove
Movement on the memory block of each PLID in source region, as the part for being moved to purpose area, so as to avoid drawing
Provide with the expense for counting change and simultaneously " removing " in source region.
Merging-update copy
In some embodiments, CCP be configured to execute atom merge-update copy function(Also referred to as merging-update
Operation).It is the United States Patent (USP) Shen of HICAP004 in the attorney docket being incorporated to entirely through reference with it for all purposes
The details of the operation and its realization please be discussed in 12/804,901.Even if merging-update operation allow when exist with by difference
Also merge while updating when the conflict for the modification that thread or process carry out, as long as the conflict is logically consistent and can be solved
To reach predictable storage state.
In some embodiments, it updates process or thread and maintains initial data at the beginning of update operation or logic affairs
The copy of structure, and execute the update to copy.It, will information associated with initial data structure when completing to update(Such as
Pointer)It is compared with information associated with the current version of data structure.If it is directed toward identical structure, there is no punchings
Prominent update, and execute and compare and exchange(CAS)Operation is replaced original with the new modified version using data structure
Version.However, if initial data structure is different from current data structure, the update of current data structure can be merged into
In new modified version, as long as difference is logically consistent.In logic consistent difference be by different threads or into
Cheng Jinhang, can be solved and be changed while the storage state consistent with application semantics with reaching.When by multiple threads to storage
When the modification consistent in logic that structure carries out is merged, as per thread or process are in an atomic manner and independently right
Storage organization carries out its modification.As explained in further detail below, for different types of data, exist and determine that modification is
No logically consistent different modes.In some embodiments, using the logic one selected in the set of potential constraint
Cause property constrains to determine logical consistency.Once difference is merged, CAS operation is just retried.If difference is logically inconsistent,
Such as when two current process each seek to entry being added to the mapping with same keys, merging-update operation failure, and
And retry some operations.
In some embodiments, the entry in revocation/Redo log and by Current transaction between the current time
The affairs individually submitted it is corresponding to the update of memory block.CCP is configured to be changed by affairs simultaneously in copy memory block
Become the row of designated storage area, as long as these changes do not conflict with the more cenotype carried out by Current transaction.In some embodiments,
CCP is further configured to solve specific consistent conflict in logic.
Fig. 5 is the exemplary data graph for the embodiment for illustrating merging-update copy procedure.It is illustrated below and comes in conjunction with Fig. 5
It explains and the pseudocode illustrated is copied to merging-update.
As shown in Figure 5, in t0(Original state)Locate, the indirect row in memory block includes difference reference data row A, B and C
PLID P1, P2 and P3.Two affairs simultaneously have the copy of the snapshot of indirect row, and each affairs, which copy it, carries out it
The one group of change of itself.During the revision, the snapshot of each affairs shooting original state is related to creating reference identical data row
A, the indirect row copy of B and C.Correspondingly, the change carried out by an affairs is invisible for another affairs.
First process is incited somebody to action by changing PLID P1 to PLID P4 to change the data quoted from A to A '
PLID P3 change to P5 to change the data quoted from C to D, to change the first position in indirect row.At time t1
It submits and changes, and the indirect row formed by P4, P2 and P5 is known as to the copy of state currently submitted.
Meanwhile second process by changing PLID P2 to PLID P8(And by the data line quoted from B change to
B’)Change the second position in indirect row, and by changing PLID P3 to PLID 9(And the data line that will be quoted from C
Change to E)To change the third place.Not yet submit the change carried out by the second affairs(And therefore, reference is indicated by dotted line),
And the Current transaction that the indirect row formed by P1, P8 and P9 is known as to state copies.In time t2(It is later than t1)Place, second
Affairs need to submit its change.It is carried out by two affairs simultaneously due to changing, which undergoes merging-renewal process.
The pseudocode of C patterns is discussed below.In the pseudocode, following pointers are initially specified:Scp is initially pointed to snapshot
PLID corresponding with data line A is initially quoted in the first position of copy, therefore, * scp;What ccp was initially pointed to state works as premise
The first position of the copy of friendship, therefore * ccp initially quote PLID corresponding with data line A ';And ctp is initially pointed to state
Current transaction copy first position, therefore the initial reference PLID corresponding with data line A of * ctp.Each pointer is incremented by
By the pointer in advance to quote the PLID of next line.The pseudocode is specified:
For each position corresponding with the data line in the memory block,
If * ccp are changed relative to * scp
If * ctp are equal to * scp // therefore are not changed by Current transaction
* ccp are written to * ctp;
else
// processing write-in-write-in conflict
mergedLine=
lineMergeUpdate(*scp, *ccp, *ctp, mergeCategory);
If merges failure, returns to failure;
MergedLine is written to * ctp;
++scp; ++ccp; ++ctp。
With reference to Fig. 5, for the first data line, * ccp(PLID P4)By relative to * scp(PLID P1)Modification, but * ctp
(PLID P1)Equal to * scp(PLID P1).Therefore, which is only changed by an affairs, and * ccp are written to * ctp
(PLID P1 are changed to PLID P4).
For the second data line, * ccp(PLID P2)Not by relative to * scp(PLID P2)Modification, therefore, the row is again
By an at most affairs modification, and * ctp(PLID P8)It is constant.
For third data line, * ccp(PLID P5)By relative to * scp(PLID P3)Modification, and * ctp(PLID
P9)Not with * scp(PLID P3)It is identical.This be referred to as write-in-write-in conflict, due to two affairs seek to identical data into
Row changes.Therefore, lineMergeUpdate functions are called to determine whether write-in-write-in conflict is logically consistent, and
Merge the conflict under unanimous circumstances.Parameter mergeCategory indicates the form of merging to be used.
The default result of lineMergeUpdate is failure(Such as, situation shown in Fig. 5, the number of two of which difference letter D and E
Lead to write-in-write-in conflict that is inconsistent in logic and cannot being solved according to content).When lineMergeUpdate fails, in
The only current affairs that do not submit.However, specific other kinds of merging is admissible(That is, write-in-write-in conflict is logically
Unanimously).For example, if mergeCategory indicates that the value in the data line is considered as counter, lineMergeUpdate
Function will determine the difference between snap copy and Current transaction value, and the difference is added to the counter in the row, to provide
MergedLine, mergedLine provide the semanteme for solving conflict.MergeCategory can also specify particular constraints.
For example, in the case of monotonic increase counter, if the value after merging violate Counter Value must monotonic increase this constraint
(Such as when counter is reset by one of affairs), then merge-update operation failure.
In this example, the storage zone state of Current transaction is actually to be created at time t0, using will be in time t2
(The end time of Current transaction)Locate the various updates executed and the snapshot of state changed.Merging-update copy is actually simultaneously
Enter the update for having and being submitted to memory block between time t0 and time t2 by other affairs simultaneously.Specifically, if update
It can be merged(That is, if there is no conflict or if conflicting logically consistent), then these updates are merged.Cause
This, merging-update copy function may be implemented as having the temporary copy for given area of known time started t0 to operate
And terminate at t2 at fixed time.Temporary copy operation additionally detects write-in-write-in conflict(For example, by tracking whether
The identical positions PLID are changed in multiple daily records from different affairs), and execute union operation when possible.
In some embodiments, each Redo log entry include with the related information of corresponding affairs that is changed,
Allow merging-update copy function to determine submitted modification using Redo log and executes merging-update copy behaviour
Make.
In some embodiments, merging-is called more in the affairs submission for the modified memory block of each of affairs
New copy.Redo log is for detecting any submission conflict, them are solved in possibility and stopping affairs when impossible.It compares
Under, in existing system, needs affairs and explicitly examined to be directed to the write-in that whether there is from another affairs to same position
It tests, to detect write-in-write-in conflict, which causes a large amount of expenses.In the system for realizing the temporary copy of hardware supported, weight
It can be used for detecting write-in-write-in conflict when affairs will submit its change as daily record.In some embodiments, Redo log item
Mesh includes which affairs having carried out the related information of change with, and when affairs will submit its change, in Redo log
Can application entries positioned and checked to determine whether exist conflict.Identified conflict is solved when possible.If conflict
It can not possibly solve, then stop the affairs.
In some embodiments, only when same page by Current transaction and it is another submit while both affairs change
When, just call merging-update operation.This is because if the page is only changed by single affairs, there will be no conflict and not
It needs to merge.In some embodiments, each physical page includes indicating its metadata changed by multiple affairs, and be somebody's turn to do
Metadata information is used by operating system to determine whether to call merging-update operation for the physical page.
Daily record indicates
Fig. 6 A are the figures for the embodiment for illustrating the physical data row in memory.It is shown that physical storage
It is divided into subpage frame.Each subpage frame includes the data line of preset number(It is 32 in this example, but in other embodiment
In can use other numbers).The start address of subpage frame is represented as subpageAddr.It can be indicated using row mask
Row, wherein each bit in row corresponds to particular row.
In this example, row mask is 32 bit values that there is the bit in subpage frame unit often to go, wherein in mask
I-th of bit corresponds to the i-th row of subpage frame.Initially, row mask is arranged to default value, such as 0.If row is changed,
Its corresponding row mask bit value is arranged to 1.Therefore, it is possible to use the subpage frame more new record with following fields(SPUR)
To indicate information related with the position of particular data line and quote the PLID of the data line whether changed:
[subpageAddr, lineMask],
Wherein, subpageAddr is the address of row positioning thereon in subpage frame, and lineMask is row mask,
It includes the bit set for the modification state for being used to indicate corresponding row.
The size of subpage frame is multiplied by capable size to determine by the size of lineMask.Using 64 byte lines and 32 bits
In the embodiment of lineMask, subpage frame size is 2 kilobytes.
Fig. 6 B are the figures for illustrating the embodiment that the daily record that the data line based on Fig. 6 A indicates indicates.In this example, it removes
Pin daily record 602 is represented as the sequence of PLID values corresponding with the data line being overwritten.Similarly, 604 quilt of Redo log
Be expressed as with after the modification being written into or the sequence of the corresponding PLID values of new data line.
Each PLID is mapped to corresponding physical data line position and sets.In this example, physical message is stored in metadata
To save the memory needed for journal entries in daily record 606.With reference to Fig. 6 A, on each subpage frame, metadata daily record is expressed
For the sequence of SPUR.In each SPUR, i-th of bit corresponding with the i-th row on the subpage frame is arranged to designated value
(For example, 1), indicate that the row is switched.If row is switched, new PLID is in Redo log and previous PLID is in and removes
It sells in daily record.Therefore, identical metadata daily record can be used for generating both cancel daily record and Redo log.
In some embodiments, the size of subpage frame address and row mask field can be further optimized, and especially be existed
SPUR sizes be allowed in size be 2 power bit in the case of.The purpose of optimization is:It minimizes and needs to be swept
Retouch the amount of the data to execute the part that revocation processing is generated as consistency reading block.For example, 8 bit-masks the case where
Under, 0.5 kilobytes of each record covering, therefore in the case of 34 bit page address field, each SPUR is 42 bits,
And address the memory that can handle 8 terabytes.To this selection of parameter by the required storage tape for log access
Wide is that 64 bit SPUR 70 required substantially percent are used in the case where nearly all more new capital is uniline per-page.It can
With based on it is expected that the statistics of line number optimizes to newer per-page during operation.
By reserved instruction SPUR it can store metadata information rather than the special address of actual pages data update
Set, additional metadata information is stored in daily record.For example, can by using be reserved with indicate timestamp and not with son
Page address(For example, each individual bit is arranged to 1 address wherein)SPUR is written in corresponding address, come when storing
Between stab.This special address also referred to as indicates.Metadata information can be similarly processed, the beginnings of such as affairs, affairs
Ending etc..By the power address block for each this value reserved 2, can be come using the low-order bit of page address field
Enhance the low-order bit in mask field to store big value.For example, the block by using 256 addresses for time address, page
8 bits of low level of face address can be used for enhancing mask field to be directed to the time in the configuration using 16 bit-rows masks
Stamp provides 24 bits.
It can be by the way that this to the offset being stored as relative to Mr. Yu's basic value rather than be stored absolute value, to reduce these ginsengs
Several size requirements.For example, timestamp can be stored as to the offset relative to certain period basic value.Then, absolute timestamp
It can be that 24 bits of offset add 24 bits on period basis, be used for 48 bits in total of effective time stamp.Pass through
SPUR is written to daily record come more new period basic value using specialized page corresponding with period register address.
Using the expression, CCP maintains the pointer into revocation and recast PLID daily records, when reading SPUR by these pointers
Adjust the PLID numbers indicated in SPUR.Therefore, there is no need to that explicitly the correspondence is stored in daily record.
The SPUR expressions of fixed size also allow to read metadata daily record backward and forward.The expression is also convenient for by CCP
Easily produce revocation/Redo log.
Log recording
In some embodiments, one or more areas of the physical storage of application, which are indicated as being, is logged.This can
With by the operating system of setting specific configuration register or storage control progress, to indicate the position of this memory block and big
It is small.Then, each write operation to the memory block of log recording makes PLID be written together with storing in metadata daily record
SPUR copies log area to together.
Fig. 7 is the flow chart for the embodiment for illustrating the process for generating log information.Process 700 can by CCP and/
Or storage control executes.
At 702, the write operation of the physical storage area from CPU to log recording is detected.In some embodiments, Xiang Gao
Speed caching or practical basis storage system(For example, main memory)Write operation by the logic in storage control and/or CCP
Mark address associated with write operation is examined to detect in memory block by compareing log recording.It is as above begged for for using
The indirect storage of opinion indicates and the secondary indication of physical data row is changed in the memory block of expression, write operation(For example, PLID's is interior
Hold or which data PLID refers to), but the not data content of change data row itself.
At 704, one or more log recordings associated with write operation are recorded.Specifically, in cancel daily record
The old value for recording the content being changed records new value in Redo log, or records the two values in corresponding daily record.
In some embodiments, specified configuration information associated with memory block is update cancel daily record, Redo log or the two.
The associated identification information of secondary indication and identification information associated with physical data row of content to being changed are remembered
Record.In some embodiments, the PLID for being modified to reference different data row is inserted into current in PLID queues(Tail
Portion)In next entry at position in appropriate daily record.In addition, being generated based on reference data line corresponding with the change
SPUR, and the SPUR is written to metadata daily record.
In some embodiments, when occurring write operation, the set that revocation, recast and SPUR are recorded all is created.So
And log recording when being written every time may be inefficient, this is because identical stored fragments may be written into many times.Example
Such as, if the PLID in row quotes A first indirectly, then B, then C then carried for tracking before affairs are submitted
The purpose of the storage state of friendship, only value C are relevant.Therefore, in some embodiments, log recording is not generation write-in behaviour
What work was created that, but it is ready to what the when of submitting created in the affairs for being related to one or more write operations.Shooting is as used
Indirect storage organization and the snapshot of memory block that indicates complete this point.
In some embodiments, snapshot is shot(Copy)Include to the storage line in the interested memory block of secondary indication
PLID is copied.In some embodiments, the indirect during storage accesses means:It can be related to memory block by copying
Real data row that the PLID of connection rather than copy are quoted by PLID creates snapshot.In some embodiments, as specified
The degenerated form of temporary copy in the case of time is identical with current time, CCP should be asked and be executed and copied to this of PLID
Shellfish, also, there is no revocation or recasts, this is because no content will be changed.Capable reference count and these shared rows
Invariance means that copied PLID constitutes the snapshot of storage zone state, even if real data is not yet copied.
In some embodiments, at original state(That is, daily record is remembered before being changed in memory block and undergoing log recording
Record the beginning of time interval)Shoot the snapshot of memory block.However, the snapshot for shooting entire area may be computationally expensive.Cause
This when detecting first time write operation, answers demand and shoots snapshot in some embodiments.In some embodiments, when
It when detecting the first write operation for entire memory block, answers demand and generates snapshot, and if from will be not present daily record item
Mesh rises, and memory block is not changed, then does not need snapshot.In some embodiments, with the granularity of sub-district(Such as page)Shooting is fast
According to.The page being only actually written is shot during so that it is impinged upon log recording time interval soon.Specifically, it detects to the page
It is written for the first time and notifies operating system to create the snapshot of the page.Operating system can call CCP with the auxiliary establishing snapshot.
The snapshot of the page is created by the way that the PLID for quoting the data line of the page to be copied to the shade indirect arrangements of the page.Interested
Time interval during for being written for the first time to page each of and the repeatedly process, wherein with related information is written every time
It is recorded in snapshot data structure.If each PLID is 32 bits and corresponds to 64 bytes(512 bits)Row
Size, then be copied with the amount for creating the data of the snapshot of the page only the 1/16 of the size that can be the page.
Correspondingly, the complete snapshot of the current state of memory block is by the page that explicitly takes snapshot as described above and from working as
The page composition of preceding state not yet changed.
In the embodiment for supporting snapping technique discussed above, for the page changed, CCP can be by that will deposit
PLID in the current state of storage area is compared with PLID those of at the corresponding offset in original state snapshot and will be with day
The different current PLID of will is transmitted together with identification information, to create the data being switched during log recording time interval
Capable Redo log(In other words, the PLID which data line is just being cited is changed).It can be similarly by identical ratio
Cancel daily record relatively is created, to replace only preserves the correspondence PLID from original state snapshot.In some embodiments,
The time for executing the operation is the time for submitting affairs.
For lifting Fig. 4 A-4C, cancel daily record and Redo log can be generated using the technology.It is assumed that PLID P0-P3 draw
With the data line in same page.When first time write operation occurs on this page, the snapshot of parent page is shot, to
Replicate PLID values.When daily record to be generated, the PLID in the current state of memory block is compared with the PLID in snapshot, and
And the identification current PLID different from the PLID in original state snapshot and by information preservation to daily record.In addition, once generating
Redo log, so that it may to be cancelled by recording corresponding entry in Redo log and recording its respective value in snapshot to export
Daily record.For example, referring to Fig. 4 A, 11:At 10, it is assumed that Redo log includes the entry from offset 3 since capable(P9)And
Entry in original state snapshot at the offset has the value of P3, it may be determined that cancel daily record further includes the identical of storage value P3
Entry at position.Therefore, Redo log can be based on to record(It includes letter related with the position for the change being logged
Breath)And the old value of the corresponding position in original state snapshot come determine the cancel daily record in indirect row record.
The set of example pseudo-code for log information to be attached to revocation and Redo log at submission time is as follows:
Each subpage frame in for snapshots
Often row i in for subpage frames
I-th of PLID in if snapshots is different from i-th of PLID in current subpage frame
If Redo logs record, which is queued to Redo log;
If cancel daily records record, which is queued to cancel daily record;
I-th of the bit being such as arranged in lineMask is recorded in the SPUR of the page;
The SPUR of the subpage frame is queued into metadata daily record;
end。
In some embodiments, exist in indirect storage organization for " modification " mark maintained per PLID entries.It should
Mark is arranged when corresponding entry is changed, and can be eliminated under software/hardware control.For example, " modification " is marked
Will can be reset at the ending of interested affairs or period.For all purposes with its entirely through be incorporated by
This attorney docket is the example of modified logo described in the U.S. Patent application No. 13/712,878 of HICAP010.At this
In a little embodiments, those of CCP can be defined the PLID entries of memory block by scanning and will only be denoted as being changed PLID
Daily record is copied to create the Redo log for the row changed.
In some embodiments that the beginning of affairs carries out snapshot, in the instruction for the end for receiving affairs(Such as, it passes
Instruction is submitted in the preparation of system)When, it is provided to CCP and generates recast and non-log information and be attached to recast and revocation day
The instruction of will.When completing the log recording, to metadata daily record write-in end transaction instruction, end transaction instruction includes thing
Business id and timestamp.In some embodiments, affairs can be stopped.Therefore, it is submitted or is aborted according to the affairs, carry
For the instruction submitted or stopped.In the latter case, the instruction of the beginning of the log recording of the affairs is additionally provided(For example,
Timestamp, journal entries number).
In some embodiments, not all data line is all cited counting.For example, can essentially be deposited encountering duplication
This row is copied to newline position when data line in the overflow area in storage system.It then, will be related to the row copied
The PLID of connection is stored in daily record.
In various embodiments, its can be maintained initial raw with software completion in the case where causing minimum influence to performance
Daily record except and the temporary copy by CCP supports.Some in these features of available software support are described below.
In some embodiments, top-level cache is washed into cache when affairs are submitted with software realization or deposited
Reservoir so that daily record note is carried out to capable write-in to the part as affairs in a manner of in due course using the completion relative to affairs
Record.In some embodiments, as the part for submitting instruction, processor can execute the action.
In some embodiments, it is updating(Such as affairs)Beginning, run in CPU software transmission start affairs
And end transaction operation is transmitted at end to CCP, to indicate respectively the beginning and end of affairs.In the instruction for starting affairs
When, it distributes transaction identifiers and records current time stamp.
In some embodiments, the log recording generated is directly serialized into external input/output by CCP(I/O)If
It is standby(Such as network), rather than store the record into memory.Similarly, CCP will directly can also connect from I/O equipment
It receives and the Redo log of de-serialization record is applied to memory block, effectively to make the storage state of memory block shift to an earlier date in time
Associated storage state is recorded to Redo log.For example, the first calculate node(For example, computing device)Can effectively by
Its storage state sets check point to the second calculate node.Specifically, the first calculate node is by shooting the complete of its storage state
The storage state for setting check point is sent to the second calculate node by snapshot to set check point to its storage state.First calculates
Node is also recorded using its CCP to generate Redo log, and the record is transmitted to second node, the second section by network connection
These Redo logs record is applied to the state for setting check point received from the first calculate node by point, to cause minimum net
The newly copied of the storage state of first node is maintained while network and application processing expense.
In some cases, the application of operation is moved to from a network host using the high-efficiency network duplication technology
Another, while by copying the storage state for setting check point of the application and hereafter only copy the application sets inspection from previous
Test the interruption for the application that the row that state a little has changed is minimized to the operation.In some embodiments, log recording, set
Check point and update are executed from CCP before the network transmission into transmission buffer device, to ensure that CCP operations are not flow controls
, so as to the limitation of matching network, especially when the network is congested.
Periodically, the CCP of revocation, the recast and/or metadata daily record copied parts-generated can be converted by software
Record towards row/page is converted into traditional database form by the journal format of its own, typically then copies result
To persistence reservoir, database, disk etc..Example log format has following fields:
Record identifier | affairs id | offset | legacy data value | new data value
Wherein, field corresponds to the identifier of record, performs the inclined of more newer field in newer affairs, the record
It moves, the legacy data value of the field and the new data value for being written to the field.The daily record indicates not using PLID, this is because number
Do not have on the auxiliary reservoir of the access of same physical grade indirect arrangements according to that can be stored in
In some embodiments, software be used within a specified time maintain the mapping from the page to caching, therefore it can
With the binding of the determining given page by modification time to virtual memory address.For example, if physical page P needs are recorded
For memory block B in the period between ti to tj, then the CCP log informations generated can be converted into and physics by software
The unrelated form of storage address or the form for being adapted at least to the long-term persistent journal record carried out by data base management system.One
In a little embodiments, physical storage is mapped to higher plate number according to structure by higher level's software, and the record mapping letter in log recording
Breath so that higher level's application can more easily restore or reconstruct proper data with usage log record.For example, software determination is changed
The PLID of change corresponds to the record of the employee in employee's database of company, particularly, the length of service field in employee record
(years of service field).Therefore, log recording is by Software Create and conversion, to include indicating to change to betide to employ
The information of the length of service field of member.The application of usage log can be based on log recording and employee's database snapshot, pass through
The length of service field for changing employee according to the value of log recording, efficiently to restore or reconstruct employee's database.
In some embodiments, CCP is provided with logical block corresponding with given physical page or subpage frame and indicates
(LB), and the information is automatically recorded in daily record by CCP.
In embodiment, software management is in the part of the daily record in memory, and periodically by the portion of these daily records
Divide and is washed into non-volatile storage(Such as disk or FLASH memory), to provide persistently copy.Manage the software of these daily records
It is configured to determine in memory whether log buffer has to cache to cancel when receiving the request for snapshot and returns to institute
It takes time or the data needed for advanceing to the expected time.If it is not, then the additional daily record needed for being accessed from its permanent reserve position
Data and by required additional daily record data transmission to main memory to allow to execute operation.
Log recording and its to the hardware realization of the support of snapshot avoid for execute these action application expense, wrap
It includes in agitation treatment device cache to access cost when realizing associated code and data with the log recording.
Hardware realization is also reduced as a part for log processing and is synchronized with other application process(That is, reply
For the competition of log recording data structure)Expense.CCP can be by allowing to send out before the operation previously issued is completed
New copy function makes full use of storage system to support multiple while operating, to avoid as storage system itself
Performance bottleneck except performance limitation.
Disclose the temporary copy and log recording of the hardware supported of memory.Storage indicates to allow with preservation to refer to indirectly
Entire row is saved in daily record by the comparable room and time cost of needle, this is because the reference to the row is stored in daily record,
Rather than data itself.Indirectly storage indicate to allow by copy to capable reference rather than data itself, use space and when
Between efficient mode create storage snapshot.This is impinged upon soon than must be submitted revocation to provide applied to modified state
State it is more efficient at current time read general case during carry out " consistency reading ".It also allows with compared with low latitude
Between cost preserve the snapshot from the previous time, to reduce repetition consistency read affairs cost.
The technology, which also avoids, to be mediated using write-in about to memory, which originally will be
It is absorbed in L1/L2 caches.In other words, it only relies upon and is detected at the point for writing back the row from processor cache
Modification, for example, the modification can be compulsory at the ending of bout or affairs.
The technology additionally provides a kind of mode, and daily record note is carried out to row when lacking modified label for determination
Record, while avoiding to go being written to daily record until the ending at log recording interval.Write-in of the row to daily record is postponed until day
The ending of will intra-record slack byte avoid as to mutually colleague or it is identical(Son)The multiple daily record items for the result of the page being repeatedly written
Mesh, and avoid to force and be write out from processor cache.
The technology also allows to simplify daily record in the case of affairs, this is because log recording associated with affairs only exists
At the ending of affairs be written, thus, it is assumed that submission in the case of, daily record need not include associated with the affairs stopped
Log information.In other words, daily record is only when affairs are very likely to(If affirmative)It will just be written into when will submit.
(If not distributed transaction, then it can be affirmative.)This is feasible, and what reason was state impinges upon no daily record soon
Make revocation feasible in the case of support.
Snapshot also allows to export cancel daily record information as the difference between snapshot and Redo log.
Hardware log recording technique is also meant:Even if change be if execution by relatively incredible application code really
It protects and log recording is carried out to the change.This is because the execution of CCP and CPU operates independently, and therefore, even if using
Code improperly executes, and CCP can also carry out information operation of the log recording without influencing CPU.
Although previous embodiment has been described in detail for clarity of understanding, the present invention is unlimited
In the details provided.In the presence of many alternatives for realizing the present invention.The disclosed embodiments are illustrative rather than limit
Property processed.