Background technology
A data disposal system comprises a processing unit, a system storage and various input/output device at least.A processing unit can include some the multiple registers that are used for execution of program instructions and the processor core of performance element.In addition, processing unit also has one or more main cache memories (that is, the first order is the L1 cache memory in other words), for example instruction cache and/or data caching.They are realized with high-speed memory.In addition, processing unit also may comprise other cache memory, is commonly referred to as time cache memory (that is, the second level is the L2 cache memory in other words), is used to support aforesaid cache memory.
Generally, on system bus, be called as intervention to another processing unit or to input/output device transmission data without system storage from a processing unit.Intervene agreement by reducing the performance that the example number improves system.In example, system storage must be accessed to satisfy the request of reading (RWITM) of reading or have the intention revised of intrasystem any one processing unit or input/output device.
Put it briefly, when pending reading that proposes by input/output device/RWITM request, be attached on the data bus and in its cache memory, handle in other processing units of requested data any one data can both be provided (source) to the I/O device of request.Under legacy protocol, " combination " response of processing unit all processing units during will waiting for from system of resident data is arranged in the cache memory before its cache memory sends the data bus request that data are provided.
Simultaneously, tradition intervention agreement is also considered " trying again " mechanism, and any satisfied reading/RWITM that can be intervened asks also can be interrupted by " examination again " from any one processing unit on system bus.If respond with intervening at the next processing unit of the condition of suitably foundation rule, and another processing unit responds with " examination again ", this automatically makes the intervention invalid response then to try response again.As a result, if there is the pending request of examination again of any one processing unit on this system bus, the processing unit that then contains data will not send data bus request.
Therefore, a kind of improved data (sourcing) scheme that provides is provided in expectation, and wherein, intervening data will be seldom to be provided to the input/output device of request for " examination again " mode that is influenced from any processing unit in the data processing unit.
Embodiment
The present invention can apply to any data handling system that has a cache memory at least.Be understood that also feature of the present invention can be applicable to the data handling system that various its processors separately all have the multiprocessor of a main cache memory and one cache memory.
Referring now to accompanying drawing, Fig. 1 specifically, the block scheme of the data handling system 10 that a present invention can be applied has been described in the there.Data handling system 10 comprises many central processing units (CPU) 11a-11n, and among the CPU11a-11n each all contains a main cache memory.Go out as shown, CPU11a contains main cache memory 12a, and CPU11n contains main cache memory 12n.Among the main cache memory 12a-12n each can be one fen section cache memory.
Among the CPU11a-11n each all is linked among time cache memory 13a-13n each respectively.Among the inferior high-speed buffer 13a-13n each is one fen section high-speed buffer also.CPU11a-11n, main cache memory 12a-12n and time cache memory 13a-13n 15 are connected to each other and receive on the system storage 14 through interconnecting.Interconnection 15 can be bus or switch.Be attached to the intelligent input/output device 16a-16n that also has in the interconnection 15.The starting data of having the ability these intelligent input/output device 16a-16n transmit, and system storage 14 is carried out the data input and output.Intelligence input/output device 16a-16n can comprise various breakout boxs, is used for communicating with other data handling system through network (as intraconnected network or interconnection type network).
By most preferred embodiment of the present invention, the CPU11a that CPU, main cache memory and inferior cache memory such as Fig. 1 described, main cache memory 12a and inferior cache memory 13a can jointly be called as processing unit.Though the preferred embodiment of data handling system is illustrated, should understand that the present invention can be implemented in various system configuration in Fig. 1.For example, each among the CPU11a-11n can have the cache memory more than 2 grades.
Referring now to Table I,, illustrates the many relevant responses from a processing unit of under the intervention agreement of prior art, setting up there.An input/output device in multi-processor data process system make on the system bus read or revise the intention read (RWITM) request after, any processing unit in the system can send a response according to Table I after detection.
Relevant response | Priority | Definition |
000 | - | Keep |
001 | 3 | The intervention of sharing |
010 | - | Keep |
011 | - | Keep |
100 | 1 | Examination again |
101 | 2 | The intervention of revising |
110 | 4 | Share |
Table I
Describe as Table I, the form of 3 probe response signals is taked in relevant response, and the defined declaration of each relevant response is arranged.These signals are encoded with the result of detection after being illustrated in the address and occupying.In addition, the priority value responds correlative connection with each, and the independent probe response signal that will be returned to all processing units on system bus and input/output device with box lunch allows system logic to determine which relevant response should obtain preference when carrying out formulism.For example, if a processing unit has shared intervention response (priority 3), and another processing unit response examination response (priority 1) again, then there is again that processing unit of examination response will obtain preference, tries relevant response again to requesting processing and be attached to every other processing unit on the system bus so system logic will be returned.This system logic can reside in the various parts in the system, for example system control unit or Memory Controller.
Some mechanisms of knowing can be used to find out which cache memory (processing unit) requested data " owner ", thereby qualified these data that provide.Under the MESI of prior art agreement, if a cache memory remains on the data of all requests under " modification " or the exclusive state, this just means that this cache memory is that the valid data that contain unique in the system copy, and is its owner significantly.Yet if cache memory remains on the institute's request msg under " sharing " state, this just means that these data also must remain in another cache memory at least in this system.So, might 2 or more cache memory in any one these data can both be provided.In this case, can obtain some selection schemes and go to determine which cache memory should be carried out that data are provided.
Referring now to Fig. 2,, it has described the block scheme of an example data disposal system, is used for illustrating the scheme that provides under prior art.As shown in the figure, for example, intelligent input/output device 24 is wished to make on system bus 23 and is read or RWITM request, and the L2 cache memory of processing unit 21 contains and is transfused to/data that output unit 24 is asked.In addition, the L2 cache memory in the processing unit 20 is in the engineering noise state, and the L2 cache memory in the processing unit 21 is in " modification " state, and the L2 cache memory in the processing unit 22 does not contain this requested data.Then the L2 cache controller separately of each processing unit will be taked a series of actions, intervene so that carry out the river flowing from Guizhou Province through Hunan into Dongting Lake that prior art provided.
Make reading at input/output device 24/the RWITM request after, this reads/RWITM request processed unit 21, processing unit 22 and processing unit 23 from the system bus 23 " spy upon ".Carry out the inspection of L2 cache directory in processing unit 21-23 each, whether reside in its L2 cache memory to determine the data of being asked.Because the data that processing unit 21 is asked to some extent intervene the processed unit 21 of response meeting for one and send, and the finite state machine in the processing unit 21 can be sent out to control following action.If the data in the L2 of processing unit 21 cache memory are in " modification " state, one revises intervention relevant response meeting is sent by processing unit 21.In addition, if the data in the L2 of processing unit 21 cache memory are in " sharing " or " exclusive " state, one shares the processed unit 21 of intervention relevant response meeting sends.Because the L2 cache memory in the processing unit 20 is in " invalid " state, the L2 cache memory in the processing unit 22 does not contain requested data, so processing unit 20 and 22 can send the zero correlation response.
After the intervention response was sent, processing unit 21 was pendent for array response.In this example, array response consists essentially of from processing unit 21 itself with from the relevant response of processing unit 20,22 and input/output device 24.If the array response of returning is the intervention relevant response that is modified, processing unit 21 can begin to provide from its L2 cache memory the data of request.Under the intervention agreement of having set up, if processing unit 20 and/or processing unit 22 no matter what reason request tries again, provide data that " examination again " request (that is, providing data time sequence no longer to continue) must be provided.For example, processing unit 22 can be in and survey the queuing busy condition.
If since the detection effect begins, data in the L2 cache memory of processing unit 21 also are not modified or not resident in the L1 cache memory (promptly not comprising L1), processing unit 21 can begin to make system bus request (typically, the data of being asked must be read in the impact damper by the L2 cache controller) to system bus arbitrator before the system bus request can begin.Otherwise, the L1 cache memory of processing unit 21 will be rinsed and the inefficacy that becomes (that is, force the L1 cache memory that any modification data " are pushed back " the L2 cache memory and make the copy in the L1 cache memory invalid) before any system bus request is made.But,, just before making any data bus request, only require the L1 cache memory invalid if the L1 cache memory of processing unit 21 is in " sharing " state.
Processing unit 21 waiting system bus grant are returned then.Actual data to input/output device 24 provide after being received in the data bus mandate and begin.In case provide data to finish, the L2 cache memory of processing unit 21 will change to for " sharing " state of read request and " invalid " state of asking for RWITM from " modification " state.In the L2 of processing unit 20 and 22 cache memory without any state variation.
Referring now to Fig. 3,, describes the abstract high-level logic flowchart that high-speed buffer storage data from the processing unit to the input/output device in the data handling system is provided there according to most preferred embodiment of the present invention.In square frame 30 beginnings.Shown in square frame 31, one read/RWITM request surveyed by intrasystem all processing units by system bus.Shown in square frame 32, the L2 cache directory is checked, is undertaken to judge whether the data of being asked reside in its L2 cache memory by each processing unit.Shown in square frame 33, send a zero correlation response by all those processing units (as the processing unit 20 and 22 of Fig. 2) that do not have the data of being asked, and this process withdraws from square frame 99.On the other hand, a processing unit of the data of asking by having (as the processing unit 21 of Fig. 2) sends one and intervenes relevant response, shown in main frame 34.
After the intervention relevant response is sent, intervene processing unit and must carry out some cache memory housekeeping task, shown in square frame 35.These tasks comprise: if the copying data in the L1 cache memory is modified, flushing in intervening the L1 cache memory of processing unit copying data and make it invalid, if perhaps the copying data in the L1 cache memory also is not modified, only make the copying data in the L1 cache memory of intervening processing unit invalid simply.
Then, the data of asking towards the quilt of reading of memory buffer from the L2 height of intervening processing unit preferably are sent to an impact damper, and to system bus arbitrator Request System data bus, shown in square frame 36.Whether be authorized to judge with regard to system data bus, shown in square frame 37.If system data bus is also uncommitted, just carry out another judgement, just make up relevant response and whether return, shown in square frame 38.If the combination relevant response is not also returned, process turns back to square frame 37.
But,, provide requested date by driving the data of being asked to system bus, shown in square frame 39 from intervene handling if system bus is authorized to.Just make up relevant response and whether returned at this point and do another ruling, shown in square frame 40.If the combination relevant response is not also returned, process will remain on and continue to provide requested data to wait for that making up relevant response returns time system bus.
After the combination relevant response had been returned, whether be " again examination " make judgement, shown in square frame 41 if just making up relevant response.If the combination relevant response is to try again, then if system bus is also uncommitted, system bus request (from square frame 36) will be cancelled, and provide the data of being asked to stop immediately in other words, shown in square frame 42.Even provide data to finish at this point, the result also can be abandoned owing to trying relevant response again.On the contrary,, provide institute's request msg to continue if the combination relevant response is not to try again, if provide data not finish as yet will to proceed to it finish till.At last, the state of the L2 cache memory in intervening processing unit is correspondingly constantly revised, and shown in square frame 43, process withdraws from square frame 99.
As already described, the invention provides in the data disposal system from the abstract method of high-speed buffer memory data that provide of processing unit to intelligent input/output device.Especially, the intervention of openly having introduced a kind of novelty of the present invention is implemented, and wherein the data of being asked are read out from the L2 cache memory of intervening the unit before the combination relevant response is returned.
The present invention has the performance advantage that significantly surpasses prior art, and this is because the delay between the sampling of reading on the bus/RWITM request and array response may be several system bus clock cycles.Therefore, because the data that allow to be asked were read from the L2 cache memory of intervening processing unit before the combination relevant response is received, the intervention stand-by period is greatly reduced, so the overall performance of system is modified considerably.
Though the present invention done to show best and illustrate with reference to most preferred embodiment, those skilled in the art should be appreciated that under the premise without departing from the spirit and scope of the present invention, can make various in form and the variation on the details.