CN1110755C

CN1110755C - Method and system for speculatively sourcing cache memory data within data-processing system

Info

Publication number: CN1110755C
Application number: CN98105763A
Authority: CN
Inventors: R·K·阿金米利; J·S·多特森; J·D·路易斯
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1997-04-14
Filing date: 1998-03-23
Publication date: 2003-06-04
Anticipated expiration: 2018-03-23
Also published as: KR19980079625A; JPH10301851A; CA2231361A1; KR100277446B1; TW386192B; CN1197956A; SG68034A1

Abstract

Methods and systems for caching data speculatively providing data from a processing unit to an intelligent input/output device in a data processing system are disclosed. According to the method and system of the present invention, a data processing system includes at least one processing unit (each having at least one cache memory) and at least one intelligent input/output device. In response to a request for data by an intelligent input/output device in the data processing system, an intervention response is sent from a processing unit in the data processing system having the requested data. The requested data is then read from the cache memory in the processing unit before the combined responses from all the processing units in the data processing system are returned to the processing unit.

Description

Share the method and system of high-speed buffer memory data

Background of invention

In general, the present invention relates to be used for sharing the method and system of high-speed buffer memory data.Specifically, it relates to the method and system that is used for being shared in the high-speed buffer memory data between data handling system processing unit and the input/output device.More particularly, the present invention relates to be used for data handling system from the processing unit predictive the method and system of (sourcing) high-speed buffer memory data to intelligent input/output device be provided.

Background technology

A data disposal system comprises a processing unit, a system storage and various input/output device at least.A processing unit can include some the multiple registers that are used for execution of program instructions and the processor core of performance element.In addition, processing unit also has one or more main cache memories (that is, the first order is the L1 cache memory in other words), for example instruction cache and/or data caching.They are realized with high-speed memory.In addition, processing unit also may comprise other cache memory, is commonly referred to as time cache memory (that is, the second level is the L2 cache memory in other words), is used to support aforesaid cache memory.

Generally, on system bus, be called as intervention to another processing unit or to input/output device transmission data without system storage from a processing unit.Intervene agreement by reducing the performance that the example number improves system.In example, system storage must be accessed to satisfy the request of reading (RWITM) of reading or have the intention revised of intrasystem any one processing unit or input/output device.

Put it briefly, when pending reading that proposes by input/output device/RWITM request, be attached on the data bus and in its cache memory, handle in other processing units of requested data any one data can both be provided (source) to the I/O device of request.Under legacy protocol, " combination " response of processing unit all processing units during will waiting for from system of resident data is arranged in the cache memory before its cache memory sends the data bus request that data are provided.

Simultaneously, tradition intervention agreement is also considered " trying again " mechanism, and any satisfied reading/RWITM that can be intervened asks also can be interrupted by " examination again " from any one processing unit on system bus.If respond with intervening at the next processing unit of the condition of suitably foundation rule, and another processing unit responds with " examination again ", this automatically makes the intervention invalid response then to try response again.As a result, if there is the pending request of examination again of any one processing unit on this system bus, the processing unit that then contains data will not send data bus request.

Therefore, a kind of improved data (sourcing) scheme that provides is provided in expectation, and wherein, intervening data will be seldom to be provided to the input/output device of request for " examination again " mode that is influenced from any processing unit in the data processing unit.

Summary of the invention

In view of above-mentioned, the purpose of this invention is to provide a kind of improved method and system that is used for sharing high-speed buffer memory data.

Another object of the present invention provides a kind of improved be used for the being shared in processing unit of data handling system and the method and system of the shared buffer-stored data between the input/output device.

Another object of the present invention provide a kind of improved in data handling system from a processing unit predictive the method and system of high-speed buffer memory data to an input/output device is provided.

The method according to this invention and system, a data disposal system comprises at least one processing unit and at least one input/output device, and each processing unit has a cache memory at least.Be the request of data of the input/output device of response in data handling system, of sending of the processing unit from a data handling system that request msg arranged intervenes response.The array response of all processing units in data handling system turns back to before this processing unit, is read out in the cache memory of the data of being asked from this processing unit.

In the explanation of writing in detail below, all purposes of the present invention, feature and advantage will become significantly.

Description of drawings

Can understand own and best use-pattern of the present invention, further purpose and advantage well with reference to being described in detail of embodiment of the property of the following describes when read in conjunction with the accompanying drawings.Accompanying drawing has:

Fig. 1 is a data number reason system block diagram, and wherein the present invention can be applied.

Fig. 2 is the example data disposal system block scheme that scheme is provided for the explanation prior art.

Fig. 3 is a high-level logic flowchart, in order to explanation in most preferred embodiment according to the present invention, in data handling system from the processing unit predictive the method for high-speed buffer memory data to input/output device be provided.

Embodiment

The present invention can apply to any data handling system that has a cache memory at least.Be understood that also feature of the present invention can be applicable to the data handling system that various its processors separately all have the multiprocessor of a main cache memory and one cache memory.

Referring now to accompanying drawing, Fig. 1 specifically, the block scheme of the data handling system 10 that a present invention can be applied has been described in the there.Data handling system 10 comprises many central processing units (CPU) 11a-11n, and among the CPU11a-11n each all contains a main cache memory.Go out as shown, CPU11a contains main cache memory 12a, and CPU11n contains main cache memory 12n.Among the main cache memory 12a-12n each can be one fen section cache memory.

Among the CPU11a-11n each all is linked among time cache memory 13a-13n each respectively.Among the inferior high-speed buffer 13a-13n each is one fen section high-speed buffer also.CPU11a-11n, main cache memory 12a-12n and time cache memory 13a-13n 15 are connected to each other and receive on the system storage 14 through interconnecting.Interconnection 15 can be bus or switch.Be attached to the intelligent input/output device 16a-16n that also has in the interconnection 15.The starting data of having the ability these intelligent input/output device 16a-16n transmit, and system storage 14 is carried out the data input and output.Intelligence input/output device 16a-16n can comprise various breakout boxs, is used for communicating with other data handling system through network (as intraconnected network or interconnection type network).

By most preferred embodiment of the present invention, the CPU11a that CPU, main cache memory and inferior cache memory such as Fig. 1 described, main cache memory 12a and inferior cache memory 13a can jointly be called as processing unit.Though the preferred embodiment of data handling system is illustrated, should understand that the present invention can be implemented in various system configuration in Fig. 1.For example, each among the CPU11a-11n can have the cache memory more than 2 grades.

Referring now to Table I,, illustrates the many relevant responses from a processing unit of under the intervention agreement of prior art, setting up there.An input/output device in multi-processor data process system make on the system bus read or revise the intention read (RWITM) request after, any processing unit in the system can send a response according to Table I after detection.

Relevant response	Priority	Definition
Relevant response	Priority	Definition	000	-	Keep
001	3	The intervention of sharing	000	-	Keep
001	3	The intervention of sharing	010	-	Keep
011	-	Keep	010	-	Keep
011	-	Keep	100	1	Examination again
101	2	The intervention of revising	100	1	Examination again
101	2	The intervention of revising	110	4	Share

111

5

Zero or remove

Table I

Describe as Table I, the form of 3 probe response signals is taked in relevant response, and the defined declaration of each relevant response is arranged.These signals are encoded with the result of detection after being illustrated in the address and occupying.In addition, the priority value responds correlative connection with each, and the independent probe response signal that will be returned to all processing units on system bus and input/output device with box lunch allows system logic to determine which relevant response should obtain preference when carrying out formulism.For example, if a processing unit has shared intervention response (priority 3), and another processing unit response examination response (priority 1) again, then there is again that processing unit of examination response will obtain preference, tries relevant response again to requesting processing and be attached to every other processing unit on the system bus so system logic will be returned.This system logic can reside in the various parts in the system, for example system control unit or Memory Controller.

Some mechanisms of knowing can be used to find out which cache memory (processing unit) requested data " owner ", thereby qualified these data that provide.Under the MESI of prior art agreement, if a cache memory remains on the data of all requests under " modification " or the exclusive state, this just means that this cache memory is that the valid data that contain unique in the system copy, and is its owner significantly.Yet if cache memory remains on the institute's request msg under " sharing " state, this just means that these data also must remain in another cache memory at least in this system.So, might 2 or more cache memory in any one these data can both be provided.In this case, can obtain some selection schemes and go to determine which cache memory should be carried out that data are provided.

Referring now to Fig. 2,, it has described the block scheme of an example data disposal system, is used for illustrating the scheme that provides under prior art.As shown in the figure, for example, intelligent input/output device 24 is wished to make on system bus 23 and is read or RWITM request, and the L2 cache memory of processing unit 21 contains and is transfused to/data that output unit 24 is asked.In addition, the L2 cache memory in the processing unit 20 is in the engineering noise state, and the L2 cache memory in the processing unit 21 is in " modification " state, and the L2 cache memory in the processing unit 22 does not contain this requested data.Then the L2 cache controller separately of each processing unit will be taked a series of actions, intervene so that carry out the river flowing from Guizhou Province through Hunan into Dongting Lake that prior art provided.

Make reading at input/output device 24/the RWITM request after, this reads/RWITM request processed unit 21, processing unit 22 and processing unit 23 from the system bus 23 " spy upon ".Carry out the inspection of L2 cache directory in processing unit 21-23 each, whether reside in its L2 cache memory to determine the data of being asked.Because the data that processing unit 21 is asked to some extent intervene the processed unit 21 of response meeting for one and send, and the finite state machine in the processing unit 21 can be sent out to control following action.If the data in the L2 of processing unit 21 cache memory are in " modification " state, one revises intervention relevant response meeting is sent by processing unit 21.In addition, if the data in the L2 of processing unit 21 cache memory are in " sharing " or " exclusive " state, one shares the processed unit 21 of intervention relevant response meeting sends.Because the L2 cache memory in the processing unit 20 is in " invalid " state, the L2 cache memory in the processing unit 22 does not contain requested data, so processing unit 20 and 22 can send the zero correlation response.

After the intervention response was sent, processing unit 21 was pendent for array response.In this example, array response consists essentially of from processing unit 21 itself with from the relevant response of processing unit 20,22 and input/output device 24.If the array response of returning is the intervention relevant response that is modified, processing unit 21 can begin to provide from its L2 cache memory the data of request.Under the intervention agreement of having set up, if processing unit 20 and/or processing unit 22 no matter what reason request tries again, provide data that " examination again " request (that is, providing data time sequence no longer to continue) must be provided.For example, processing unit 22 can be in and survey the queuing busy condition.

If since the detection effect begins, data in the L2 cache memory of processing unit 21 also are not modified or not resident in the L1 cache memory (promptly not comprising L1), processing unit 21 can begin to make system bus request (typically, the data of being asked must be read in the impact damper by the L2 cache controller) to system bus arbitrator before the system bus request can begin.Otherwise, the L1 cache memory of processing unit 21 will be rinsed and the inefficacy that becomes (that is, force the L1 cache memory that any modification data " are pushed back " the L2 cache memory and make the copy in the L1 cache memory invalid) before any system bus request is made.But,, just before making any data bus request, only require the L1 cache memory invalid if the L1 cache memory of processing unit 21 is in " sharing " state.

Processing unit 21 waiting system bus grant are returned then.Actual data to input/output device 24 provide after being received in the data bus mandate and begin.In case provide data to finish, the L2 cache memory of processing unit 21 will change to for " sharing " state of read request and " invalid " state of asking for RWITM from " modification " state.In the L2 of processing unit 20 and 22 cache memory without any state variation.

Referring now to Fig. 3,, describes the abstract high-level logic flowchart that high-speed buffer storage data from the processing unit to the input/output device in the data handling system is provided there according to most preferred embodiment of the present invention.In square frame 30 beginnings.Shown in square frame 31, one read/RWITM request surveyed by intrasystem all processing units by system bus.Shown in square frame 32, the L2 cache directory is checked, is undertaken to judge whether the data of being asked reside in its L2 cache memory by each processing unit.Shown in square frame 33, send a zero correlation response by all those processing units (as the processing unit 20 and 22 of Fig. 2) that do not have the data of being asked, and this process withdraws from square frame 99.On the other hand, a processing unit of the data of asking by having (as the processing unit 21 of Fig. 2) sends one and intervenes relevant response, shown in main frame 34.

After the intervention relevant response is sent, intervene processing unit and must carry out some cache memory housekeeping task, shown in square frame 35.These tasks comprise: if the copying data in the L1 cache memory is modified, flushing in intervening the L1 cache memory of processing unit copying data and make it invalid, if perhaps the copying data in the L1 cache memory also is not modified, only make the copying data in the L1 cache memory of intervening processing unit invalid simply.

Then, the data of asking towards the quilt of reading of memory buffer from the L2 height of intervening processing unit preferably are sent to an impact damper, and to system bus arbitrator Request System data bus, shown in square frame 36.Whether be authorized to judge with regard to system data bus, shown in square frame 37.If system data bus is also uncommitted, just carry out another judgement, just make up relevant response and whether return, shown in square frame 38.If the combination relevant response is not also returned, process turns back to square frame 37.

But,, provide requested date by driving the data of being asked to system bus, shown in square frame 39 from intervene handling if system bus is authorized to.Just make up relevant response and whether returned at this point and do another ruling, shown in square frame 40.If the combination relevant response is not also returned, process will remain on and continue to provide requested data to wait for that making up relevant response returns time system bus.

After the combination relevant response had been returned, whether be " again examination " make judgement, shown in square frame 41 if just making up relevant response.If the combination relevant response is to try again, then if system bus is also uncommitted, system bus request (from square frame 36) will be cancelled, and provide the data of being asked to stop immediately in other words, shown in square frame 42.Even provide data to finish at this point, the result also can be abandoned owing to trying relevant response again.On the contrary,, provide institute's request msg to continue if the combination relevant response is not to try again, if provide data not finish as yet will to proceed to it finish till.At last, the state of the L2 cache memory in intervening processing unit is correspondingly constantly revised, and shown in square frame 43, process withdraws from square frame 99.

As already described, the invention provides in the data disposal system from the abstract method of high-speed buffer memory data that provide of processing unit to intelligent input/output device.Especially, the intervention of openly having introduced a kind of novelty of the present invention is implemented, and wherein the data of being asked are read out from the L2 cache memory of intervening the unit before the combination relevant response is returned.

The present invention has the performance advantage that significantly surpasses prior art, and this is because the delay between the sampling of reading on the bus/RWITM request and array response may be several system bus clock cycles.Therefore, because the data that allow to be asked were read from the L2 cache memory of intervening processing unit before the combination relevant response is received, the intervention stand-by period is greatly reduced, so the overall performance of system is modified considerably.

Though the present invention done to show best and illustrate with reference to most preferred embodiment, those skilled in the art should be appreciated that under the premise without departing from the spirit and scope of the present invention, can make various in form and the variation on the details.

Claims

One kind be used for data handling system from the processing unit predictive the method for high-speed buffer memory data to input/output device is provided, said processing unit comprises a cache memory at least, said method is made up of these steps:

In order to respond request of data, send the intervention response by the processing unit that said requested data are arranged by said input/output device in the said data handling system;

Before the array response that all processing units from said data handling system come turned back to said processing unit, the cache memory from said processing unit was read said data of asking.
According to claim 1 in data handling system from the processing unit predictive the method for high-speed buffer memory data is provided, it is characterized in that reading step and comprise that further the cache memory from said processing unit read the step of said data of asking by the buffer zone controller.
According to claim 1 in data handling system from the processing unit predictive the method for high-speed buffer memory data is provided, it is characterized in that reading step and comprise that further the cache memory from said processing unit reads the step of said data of asking to impact damper.
According to claim 1 in data handling system from the processing unit predictive the method for high-speed buffer memory data is provided, it is characterized in that said request of data comprises read request or the read request of the intention revised is arranged.
According to claim 1 in data handling system from the processing unit predictive the method for high-speed buffer memory data is provided, it is characterized in that said intervention response is intervention response of revising or the intervention response of sharing.
According to claim 1 in data handling system from the processing unit predictive the method for high-speed buffer memory data is provided, it is characterized in that said method further comprises, if the said array response of returning is to try then stop said step that reading step.
According to claim 1 in data handling system from the processing unit predictive the method for high-speed buffer memory data is provided, it is characterized in that said method further is included in returns Request System bus before the said array response, and the step of said data of asking is provided by described processing unit.
According to claim 7 in data handling system from the processing unit predictive the method for high-speed buffer memory data is provided, it is characterized in that said method further is included in provides the step of said data of asking by said processing unit before said array response is returned.
9. a processing unit has the cache memory that can provide data to input/output device in data handling system predictive, and said processing unit comprises:

Such equipment is used for responding the request of the said input/output device of said data handling system to said requested data, and the processing unit from the said data handling system of data of request to some extent sends intervenes response;

And such equipment, it is used for reading said data of asking from the cache memory of said processing unit before the array response from all processing units turns back to said processing unit.
10. according to the processing unit that can provide data to the cache memory of input/output device that has of claim 9, it is characterized in that the said equipment that is used to read is the speed buffering controller predictive.
11. the processing unit that can provide data to the cache memory of input/output device that has according to claim 9 is characterized in that the said equipment that is used for reading further comprises the device that is used for from the cache memory of said processing unit said requested data being read an impact damper predictive.
12. the processing unit that can provide data to the cache memory of input/output device that has according to claim 9 is characterized in that the read request that said request of data comprises read request or has modification to be intended to predictive.
13., it is characterized in that the said intervention response from said processing unit is intervention response of revising or the intervention response of sharing according to the processing unit that can provide data to the cache memory of input/output device that has of claim 9 predictive.
14. the processing unit that can provide data to the cache memory of input/output device that has according to claim 9 predictive, it is characterized in that said processing unit further comprises a device, if be used for the said array response of returning is " examination again ", then stops said reading.
15. the processing unit that can provide data to the cache memory of input/output device that has according to claim 9 predictive, it is characterized in that said processing unit further is included in returns Request System bus before the said array response, and the device of said data of asking is provided by described processing unit.
16. the processing unit that can provide data to the cache memory of input/output device that has according to claim 15 is characterized in that said processing unit further is included in said array response is returned is provided said data of asking before by said processing unit device predictive.