EP0875030A2 - Multi-port cache memory with address conflict detection - Google Patents
Multi-port cache memory with address conflict detectionInfo
- Publication number
- EP0875030A2 EP0875030A2 EP97940270A EP97940270A EP0875030A2 EP 0875030 A2 EP0875030 A2 EP 0875030A2 EP 97940270 A EP97940270 A EP 97940270A EP 97940270 A EP97940270 A EP 97940270A EP 0875030 A2 EP0875030 A2 EP 0875030A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- port
- bank
- cache
- banks
- ports
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 13
- 230000004044 response Effects 0.000 claims abstract description 6
- 230000001174 ascending effect Effects 0.000 claims abstract description 5
- 238000003491 array Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000009977 dual effect Effects 0.000 description 4
- 230000008520 organization Effects 0.000 description 3
- 101150005652 selO gene Proteins 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000000034 method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0846—Cache with multiple tag or data arrays being simultaneously accessible
- G06F12/0851—Cache with interleaved addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
Definitions
- the present invention relates to a processing system with a cache memory, and more particularly to a cache having multiple access ports.
- a cache is a small, fast memory placed between a processor and main memory in order to reduce the effective time required by a processor to access addresses, instructions or data that are normally stored in main memory. For example, when a processor reads a word from main memory, the word and neighbouring words are read as a block from main memory into the cache. Typically, there is a high probability that the processor will next attempt to access one of the neighbouring words within the block. Because of this locality of reference property, main memory bus traffic is reduced since the processor is likely to engage in subsequent data transactions directly with the cache. Cache accesses take less time than main memory accesses. Consequently, the use of a cache increases processor throughput.
- the processor may attempt to execute memory operations simultaneously. In those cases, the processor may require simultaneous access to multiple words stored within cache memory. Accordingly, the cache may include multiple ports, each port for conducting a separate data transaction.
- a multi-port cache may be implemented as a single multi-port SRAM. However, such a configuration is very slow in operation and occupies a relatively large chip area.
- a dual-port cache may be implemented with two single-port memory arrays, each corresponding to one of the cache ports. The two arrays have the same address space. This cumbersome arrangement requires complex data coherency circuitry to ensure that the arrays store the same data when data is modified at one of the cache ports. Further, the use of two arrays to store redundant copies of the same data occupies an unnecessarily large chip area.
- the present invention provides a multi-port cache memory.
- the multi- port cache operates in a microprocessor system, and includes multiple memory banks and multiple ports for enabling accesses to the banks.
- Conflict detection circuitry detects simultaneous addressing of a first memory bank through a first port and a second port, and stalls microprocessor operations for a predetermined number of clock cycles in response to the detection of simultaneous addressing.
- Conflict resolution circuitry allows access to the first bank through the first port during the stall, and allows access through the second port after the stall is complete.
- the conflict resolution circuitry allows access through ports that are attempting to access the first memory bank in order of ascending priority during successive clock cycles while the microprocessor is stalled.
- One or more of the ports attempting to access the first bank may be allowed access before or after the time the microprocessor is stalled.
- Each bank is single-ported. The banks have non overlapping address spaces, and are addressed so that words within a cache block are distributed among multiple banks.
- Figure 1 illustrates a computer system having a multi-port cache of the present invention.
- Figure 2 is a block diagram illustrating a processor coupled to a multi- port cache of the present invention.
- Figure 3A is a timing diagram illustrating cache timing in the absence of a bank conflict.
- Figure 3B is a timing diagram illustrating cache timing in the presence of a bank conflict.
- FIG. 1 illustrates a computer system having a multi-port CPU 100, a main memory 102, a main memory interface 104, and a multi-port cache 106 of the present invention.
- the main memory interface 104 manages the information exchange between the cache 106 and main memory 102 to maintain cache coherency when a CPU access misses the cache or when the CPU writes new data into the cache.
- the cache 106 is shown as having two ports, although those skilled in the art will recognize that the present invention is easily extended to a cache having any number of ports.
- the processor is capable of executing multiple parallel operations, and thus may require simultaneous access to more than one word stored within the cache.
- processors or other agents may each require access to a corresponding cache port.
- FIG. 2 is a detailed block diagram of a processor 100 coupled to an embodiment of the cache 106 of the present invention.
- the cache is a two- way set-associative cache.
- the cache of the present invention does not employ a dual-port SRAM or redundant single-port arrays that store the same data.
- the present invention employs multiple single-port memory banks, where each bank stores data for a non-overlapping address space.
- each bank may be accessed by any of the ports. As long as no two ports attempt to access the same bank, all ports can execute simultaneous accesses to the cache.
- the cache controls the timing of the accesses as described below.
- the CPU 100 can issue multiple accesses to the cache 106, represented as a first address A0 and a second address Al. These addresses correspond to the two ports 201 and 203 of the cache of this example.
- the cache itself comprises a first bank 200, bankO, and a second bank 202, bankl.
- each bank holds eight kilobytes (8 KB) of data, where four bytes comprise one 32-bit word.
- each bank stores 2K words.
- each cache block is two words long, and two blocks comprise one set of the two-way set-associative cache of this example.
- Each bank is coupled to a plurality of read buses 204 through a corresponding tri-state bus driver 206, each read bus 204 corresponding to one of the ports.
- Each bank is further coupled to a plurality of write buses 208 through a write multiplexer 210, each write bus 208 corresponding to one of the ports.
- the read and write busses 204, 208 are coupled to the input/output ports of the CPU 100 (the coupling is not shown to keep the figure simple).
- each port is coupled to a dual tag RAM 212, where each tag array 214 corresponds to a way of the two-way set-associative cache.
- the tag from each array is fed into a corresponding comparator, 216 which compares the tag to the tag field of the corresponding port address.
- the resulting hit signal is passed to a corresponding port input of a hit multiplexer 218 for each bank.
- the hit signal here is a two-bit "one hot" signal in which at most one bit may take on a logical one value.
- Each bank also is coupled to a row multiplexer 220 that receives the set index field of each port address.
- read/write control signals are passed from each CPU port to a corresponding input of a read/write multiplexer (not shown) for each bank to indicate whether a read or write memory operation is to be performed.
- a write enable signal from each port is passed to a corresponding input of a write multiplexer.
- a read enable signal from each port is passed to a corresponding input of a read multiplexer.
- the output of the multiplexers is coupled to write enable and read enable inputs, respectively, of the corresponding bank.
- the read and write multiplexers together are referred to herein as the "read/write multiplexer.”
- the address circuitry that is common to both ports includes conflict detect circuitry 222 that receives the bank address portion of the port addresses.
- each bank address passes through a 1:2 bank decoder 224, which produces a bank select signal in response. For example, if a zero bank address bit represents a selection of bankO, then the bank decoder 224 will output a one from its bankO output and a zero from its bankl output.
- the bank select signal (bd) from each port's decoder is fed into a corresponding conflict resolution circuit 226 for each bank.
- the output of the conflict resolution circuitry 226 controls the row multiplexer 220, the hit multiplexer 218 and the read/ write multiplexer (not shown) for each bank to determine which port will have access to the bank.
- the conflict resolution circuitry 226 also controls the tri-state drivers 206 for the read buses 204 ( Figure 2 assumes active high) and the write bus multiplexers 210 to assure access to the bus corresponding to the selected port.
- each bank stores 8 KB of data with each word comprising four bytes.
- Each cache block comprises two words.
- the memory contains IK sets with two blocks per set because the cache is a two-way set-associative cache. Bit 2 of the address selects the bank, whereas bits 3-12 select one of the sets. Bits 13-31 of the address are used in the tag comparison to indicate the presence of an addressed block in the cache.
- Figure 3A illustrates cache timing where there is no bank conflict.
- Figure 3B illustrates cache timing with a bank conflict.
- the CPU attempts to perform simultaneous accesses of the cache by issuing an address AO from a first CPU port 228 and an address Al from a second CPU port 230.
- the addresses are respectively received by a first cache port 201 and a second cache port 203 over an internal CPU bus 232.
- the second bits of the addresses are fed into the conflict detection circuitry 222 to determine whether both ports are attempting to access the same memory bank.
- the conflict resolution circuitry 226 determines which port input will be passed by the row multiplexer 220, the hit multiplexer 218 and the read/write multiplexer to each bank, and selects the proper read or write bus to communicate with the bank (depending upon whether a read or write operation is being performed) .
- the two-bit signal selO represents the two port- select signals for bankO
- the two-bit signal sell represents the two port-select signals for bankl. These combined signals select the appropriate port input to the multiplexers.
- the conflict resolution logic may be implemented by any circuitry that embodies the logic of Table 1.
- x/y indicates that the port select signal takes on a value of x in one clock cycle followed by a value of y in a subsequent clock cycle.
- the conflict resolution circuitry 226 determines which port communicates with each bank. This selection is based upon the bank address field of the port addresses, which is bit 2 in this example. The other bits are used to address a particular word within the banks. Bits 3-12 are the set index fed into the dual tag array for each port. In this example, a set comprises two blocks, with one block in each bank. Bits 13-31 comprise the tag address field that is compared to the tags from the dual tag array 212.
- the hit signal selects the word within the block.
- the miss is handled by loading the miss block into the cache. Operation resumes as if the miss did not occur, resulting in a hit. For example, if one instruction attempts two simultaneous accesses and one port hits while the other port misses, the miss is first handled. Then, the instruction is restarted, resulting in two hits with the conflict resolution circuitry operating as described herein.
- the set index and the hit signal are routed to the correct bank through the multiplexers controlled by the conflict resolution circuitry 226. Assume hits for both port addresses.
- the hit signal, hitO, from portO 201 is routed through the hit multiplexers 218 to the hit input of bankO 220, whereas the hit signal, hitl, from portl 203 is routed through the hit multiplexers 218 to the hit input of bankl 202.
- the data read from or written to portO 201 is represented by X
- the data read from or written to portl 203 is represented by Y.
- both of these ports are in communication with a bank.
- X data from portO 201 is read from or written to bankO 200
- Y data from portl 203 is read from or written to bankl 202.
- Figure 3B is a timing diagram illustrating the operation of the cache of the present invention in case of a bank conflict.
- the conflict detection circuitry 222 will stall the operations of the CPU 100 in the next cycle, i.e., cycle 1.
- the mechanism employed by the conflict detection circuitry 222 to stall the CPU can be implemented using circuitry similar to that employed by standard cache control logic to stall the CPU during a cache miss.
- the bank select signals for each bank are OR'ed together by an OR gate 250 having an output fed into a bank enable input. If no port attempts to access a bank, then the bank is not enabled. Here, bankl is not being accessed.
- sel_ctrl is asserted during the stall (cycle 1) so as to force selO to select portl during cycle 1. See Figure 3B and Table 1.
- the hit signal, hitl, from portl is routed through the hit multiplexers 218 to the hit_bank0 input, of bankO so that the data word Y can be outputted through portl during the next cycle, cycle 2.
- the result of a read operation for portO is latched on the read bus for portO by latching circuitry on the bus (not shown).
- data X read from portO and data Y from portl appear simultaneously during cycle 2. Because CPU operations are stalled during cycle 1 , it appears to the CPU that the dual port cache access occurs simultaneously in a cycle immediately following cycle 0.
- conflict resolution circuitry 226 grants priority access to portO in case of a conflict.
- the conflict resolution circuitry 226 may grant access to conflicting ports in any order of priority.
- the ports are numbered so that low-numbered ports correspond to those requiring high-priority access, whereas high-numbered ports can wait longer for access.
- sel_ctrll There are two selection control signals, shared by all banks, to override priorities of bank conflict resolution: sel_ctrll, sel_ctrl2. If sel_ctrll is asserted, then port 1 is selected. If sel_ctrl2 is asserted, then port 2 is selected. If neither sel_ctrll nor sel_ctrl2 is asserted, then port 0 has priority.
- bank conflicts are avoided in the compiler and application software by allocating variables in nearby instructions to addresses in different banks. Thus, it is highly unlikely that the same bank would be addressed in the same cycle. Further, the organization of the address space itself helps to reduce the chance of a bank conflict. By using lower order address bits, e.g., the second bit, to select the bank, adjacent words of the cache block are evenly distributed among all the banks. In this manner, the addressing of adjacent words will result in the addressing of different banks. Because of the locality of reference property, this organization thus reduces the chance of conflict.
- the cache can be organized as an eight-way set-associative cache of eight banks.
- address bits 6-10 act as the set index.
- Each set comprises two rows in each bank.
- Bit 5 selects one of the two rows, and bits 2-4 select the bank.
- the address bits 11-31 are used for the tag comparison.
- Bits 0-1 correspond to the byte within a word.
- the present invention can be applied to a pipelined cache.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A multi-port cache memory is disclosed. The multi-port cache operates in a microprocessor system, and includes multiple memory banks and multiple ports for enabling accesses to the banks. Conflict detection circuitry detects simultaneous addressing of a first memory bank through a first port and a second port, and stalls microprocessor operations for a predetermined number of clock cycles in response to the detection of simultaneous addressing. Conflict resolution circuitry allows access to the first bank through the first port during the stall, and allows access through the second port after the stall is complete. Generally, the conflict resolution circuitry allows access through ports that are attempting to access the first memory bank in order of ascending priority during successive clock cycles while the microprocessor is stalled. One or more of the ports attempting to access the first bank may be allowed access before or after the time the microprocessor is stalled. Each bank is single-ported. The banks have non overlapping address spaces, and are addressed so that words within a cache block are distributed among multiple banks.
Description
MULTI-PORT CACHE MEMORY WITH ADDRESS CONFLICT DETECTION
The present invention relates to a processing system with a cache memory, and more particularly to a cache having multiple access ports.
A cache is a small, fast memory placed between a processor and main memory in order to reduce the effective time required by a processor to access addresses, instructions or data that are normally stored in main memory. For example, when a processor reads a word from main memory, the word and neighbouring words are read as a block from main memory into the cache. Typically, there is a high probability that the processor will next attempt to access one of the neighbouring words within the block. Because of this locality of reference property, main memory bus traffic is reduced since the processor is likely to engage in subsequent data transactions directly with the cache. Cache accesses take less time than main memory accesses. Consequently, the use of a cache increases processor throughput.
Many modern microprocessors execute multiple instructions within the same processor clock cycle. In some instances, the processor may attempt to execute memory operations simultaneously. In those cases, the processor may require simultaneous access to multiple words stored within cache memory. Accordingly, the cache may include multiple ports, each port for conducting a separate data transaction.
A multi-port cache may be implemented as a single multi-port SRAM. However, such a configuration is very slow in operation and occupies a relatively large chip area. Alternatively, as described in U.S. Patent No. 5,359,557, issued to Aipperspach et al., a dual-port cache may be implemented with two single-port memory arrays, each corresponding to one of the cache ports. The two arrays have the same address space. This cumbersome arrangement requires complex data coherency circuitry to ensure that the arrays store the same data when data is modified at one of the cache ports. Further, the use of two arrays to store redundant copies of the same data occupies an unnecessarily large chip area.
Accordingly, there is a desire to find a smaller, more efficient means of implementing a multi-port cache memory.
The present invention provides a multi-port cache memory. The multi- port cache operates in a microprocessor system, and includes multiple memory banks and
multiple ports for enabling accesses to the banks. Conflict detection circuitry detects simultaneous addressing of a first memory bank through a first port and a second port, and stalls microprocessor operations for a predetermined number of clock cycles in response to the detection of simultaneous addressing. Conflict resolution circuitry allows access to the first bank through the first port during the stall, and allows access through the second port after the stall is complete. Generally, the conflict resolution circuitry allows access through ports that are attempting to access the first memory bank in order of ascending priority during successive clock cycles while the microprocessor is stalled. One or more of the ports attempting to access the first bank may be allowed access before or after the time the microprocessor is stalled. Each bank is single-ported. The banks have non overlapping address spaces, and are addressed so that words within a cache block are distributed among multiple banks.
The objects, features and advantages of the present invention will be apparent to one skilled in the art in light of the detailed description in which the following figures provide examples of the structure and operation of the invention:
Figure 1 illustrates a computer system having a multi-port cache of the present invention.
Figure 2 is a block diagram illustrating a processor coupled to a multi- port cache of the present invention. Figure 3A is a timing diagram illustrating cache timing in the absence of a bank conflict.
Figure 3B is a timing diagram illustrating cache timing in the presence of a bank conflict.
The present invention provides a multi-port cache memory having multiple memory banks. In the following description, numerous details are set forth in order to enable a thorough understanding of the present invention. However, it will be understood by those of ordinary skill in the art that these specific details are not required in order to practice the invention. Further, well-known elements, devices, process steps and the like are not set forth in detail in order to avoid obscuring the present invention. Figure 1 illustrates a computer system having a multi-port CPU 100, a main memory 102, a main memory interface 104, and a multi-port cache 106 of the present invention. The main memory interface 104 manages the information exchange between the cache 106 and main memory 102 to maintain cache coherency when a CPU access misses the cache or when the CPU writes new data into the cache. The cache 106 is shown as having
two ports, although those skilled in the art will recognize that the present invention is easily extended to a cache having any number of ports.
Preferably, the processor is capable of executing multiple parallel operations, and thus may require simultaneous access to more than one word stored within the cache. In another configuration (not shown), separate processors or other agents may each require access to a corresponding cache port.
Figure 2 is a detailed block diagram of a processor 100 coupled to an embodiment of the cache 106 of the present invention. In this example, the cache is a two- way set-associative cache. Unlike the prior art, the cache of the present invention does not employ a dual-port SRAM or redundant single-port arrays that store the same data. Instead, the present invention employs multiple single-port memory banks, where each bank stores data for a non-overlapping address space. Preferably, each bank may be accessed by any of the ports. As long as no two ports attempt to access the same bank, all ports can execute simultaneous accesses to the cache. In the event of a bank conflict, i.e. , when two ports attempt to access the same bank, the cache controls the timing of the accesses as described below.
According to the present invention, the CPU 100 can issue multiple accesses to the cache 106, represented as a first address A0 and a second address Al. These addresses correspond to the two ports 201 and 203 of the cache of this example. In this example, the cache itself comprises a first bank 200, bankO, and a second bank 202, bankl. Here, each bank holds eight kilobytes (8 KB) of data, where four bytes comprise one 32-bit word. Thus, each bank stores 2K words. Further, each cache block is two words long, and two blocks comprise one set of the two-way set-associative cache of this example. Those skilled in the art will recognize that the present invention is applicable to other memory configurations, and that, in particular, the number of banks need not necessarily equal the number of ports.
Each bank is coupled to a plurality of read buses 204 through a corresponding tri-state bus driver 206, each read bus 204 corresponding to one of the ports. Each bank is further coupled to a plurality of write buses 208 through a write multiplexer 210, each write bus 208 corresponding to one of the ports.
The read and write busses 204, 208 are coupled to the input/output ports of the CPU 100 (the coupling is not shown to keep the figure simple).
The circuitry for addressing the banks is divided into address circuitry dedicated to a corresponding port and address circuitry common to both ports. In this
embodiment, each port is coupled to a dual tag RAM 212, where each tag array 214 corresponds to a way of the two-way set-associative cache. The tag from each array is fed into a corresponding comparator, 216 which compares the tag to the tag field of the corresponding port address. The resulting hit signal is passed to a corresponding port input of a hit multiplexer 218 for each bank. The hit signal here is a two-bit "one hot" signal in which at most one bit may take on a logical one value. Each bank also is coupled to a row multiplexer 220 that receives the set index field of each port address. Further, read/write control signals are passed from each CPU port to a corresponding input of a read/write multiplexer (not shown) for each bank to indicate whether a read or write memory operation is to be performed. In one embodiment, a write enable signal from each port is passed to a corresponding input of a write multiplexer. Similarly, a read enable signal from each port is passed to a corresponding input of a read multiplexer. The output of the multiplexers is coupled to write enable and read enable inputs, respectively, of the corresponding bank. The read and write multiplexers together are referred to herein as the "read/write multiplexer." The address circuitry that is common to both ports includes conflict detect circuitry 222 that receives the bank address portion of the port addresses. In this example, each bank address passes through a 1:2 bank decoder 224, which produces a bank select signal in response. For example, if a zero bank address bit represents a selection of bankO, then the bank decoder 224 will output a one from its bankO output and a zero from its bankl output. The bank select signal (bd) from each port's decoder is fed into a corresponding conflict resolution circuit 226 for each bank. The output of the conflict resolution circuitry 226 controls the row multiplexer 220, the hit multiplexer 218 and the read/ write multiplexer (not shown) for each bank to determine which port will have access to the bank. The conflict resolution circuitry 226 also controls the tri-state drivers 206 for the read buses 204 (Figure 2 assumes active high) and the write bus multiplexers 210 to assure access to the bus corresponding to the selected port.
In one example of the memory organization of the cache of Figure 2, each bank stores 8 KB of data with each word comprising four bytes. Each cache block comprises two words. The memory contains IK sets with two blocks per set because the cache is a two-way set-associative cache. Bit 2 of the address selects the bank, whereas bits 3-12 select one of the sets. Bits 13-31 of the address are used in the tag comparison to indicate the presence of an addressed block in the cache.
The operation of the cache of the present invention will be described with respect to the timing diagrams of Figures 3A and 3B. Figure 3A illustrates cache timing
where there is no bank conflict. Figure 3B illustrates cache timing with a bank conflict. In both cases, the CPU attempts to perform simultaneous accesses of the cache by issuing an address AO from a first CPU port 228 and an address Al from a second CPU port 230. The addresses are respectively received by a first cache port 201 and a second cache port 203 over an internal CPU bus 232. In this example, during cycle 0, the second bits of the addresses are fed into the conflict detection circuitry 222 to determine whether both ports are attempting to access the same memory bank. Here, assume that A0[2] = 0 and Al[2] = 1. In that case, the bank address decoder 224 for portO will output a bank select signal bd[0][0] = 1 to the conflict resolution circuitry 226 for bankO 200 and a bank select signal bd[0][l] = 0 to the conflict resolution circuitry 226 for bankl 202. The bank address decoder 224 for portl 201 will output a bank select signal bd[l][0] = 0 to the conflict resolution circuitry 226 for bankO 200 and a bank select signal bd[l][l] = 1 to the conflict resolution circuitry 226 for bankl 202. In cycle 0, the conflict resolution circuitry 226 determines which port input will be passed by the row multiplexer 220, the hit multiplexer 218 and the read/write multiplexer to each bank, and selects the proper read or write bus to communicate with the bank (depending upon whether a read or write operation is being performed) .
For this two-port example, the conflict resolution circuitry 226 implements the following logic equations: sel[0][i] = bd[0][i] AND NOT (sel_ctri[l]) sel[l][i] = (NOT (bd[0][i]) AND bd[l][i]) OR sel_ctrl[l] where the port select signal sel[j][i] gives input port j access to bank i if sel[j][i] = 1. When a bank conflict occurs, the conflict resolution circuitry first allows the lower-numbered port, portO, to access the addressed bank. In that clock cycle sel_ctrl[l] =0. In the next cycle, the override signal sel_ctrl[l] takes on a value of 1 to give priority of access to port 1.
In Figure 2, the two-bit signal selO represents the two port- select signals for bankO, and the two-bit signal sell represents the two port-select signals for bankl. These combined signals select the appropriate port input to the multiplexers. Alternatively, the conflict resolution logic may be implemented by any circuitry that embodies the logic of Table 1.
TABLE 1
In the table, "x/y" indicates that the port select signal takes on a value of x in one clock cycle followed by a value of y in a subsequent clock cycle. In this example, bd[0][0] = 1 and bd[l][0] = 0, whereas bd[0][l] = 0 and bd[l][l] = 1. Thus, in cycle 0 of Figure 3A, sel[0][0] = 1 sel[0][l] = 0 sel[l][0] = 0 sel[l][l] = 1
In the absence of a conflict, the sel_ctrl override signal is inoperative. As a result, bankO is accessible to portO and bankl is accessible to portl .
In sum, the conflict resolution circuitry 226 determines which port communicates with
each bank. This selection is based upon the bank address field of the port addresses, which is bit 2 in this example. The other bits are used to address a particular word within the banks. Bits 3-12 are the set index fed into the dual tag array for each port. In this example, a set comprises two blocks, with one block in each bank. Bits 13-31 comprise the tag address field that is compared to the tags from the dual tag array 212.
If the tag comparison results in a hit in one of the arrays, the hit signal selects the word within the block. In case of a cache miss for any one of the ports, the miss is handled by loading the miss block into the cache. Operation resumes as if the miss did not occur, resulting in a hit. For example, if one instruction attempts two simultaneous accesses and one port hits while the other port misses, the miss is first handled. Then, the instruction is restarted, resulting in two hits with the conflict resolution circuitry operating as described herein.
The set index and the hit signal are routed to the correct bank through the multiplexers controlled by the conflict resolution circuitry 226. Assume hits for both port addresses. During cycle 0, the hit signal, hitO, from portO 201 is routed through the hit multiplexers 218 to the hit input of bankO 220, whereas the hit signal, hitl, from portl 203 is routed through the hit multiplexers 218 to the hit input of bankl 202. The data read from or written to portO 201 is represented by X, whereas the data read from or written to portl 203 is represented by Y. During cycle 1, both of these ports are in communication with a bank. Here, X data from portO 201 is read from or written to bankO 200, and Y data from portl 203 is read from or written to bankl 202.
Figure 3B is a timing diagram illustrating the operation of the cache of the present invention in case of a bank conflict. In this example, assume that the second bits of both port addresses equal zero, i.e., both ports attempt to access bankO. In response, the conflict detection circuitry 222 will stall the operations of the CPU 100 in the next cycle, i.e., cycle 1. The mechanism employed by the conflict detection circuitry 222 to stall the CPU can be implemented using circuitry similar to that employed by standard cache control logic to stall the CPU during a cache miss.
In this example A0[2] = Al[2] = 0. Thus, bd[0][0] = 1 and bd[l][0] = 1 , whereas bd[0][l] = 0 and bd[l][l] = 0. Accordingly, sel[0][0] = 1 AND NOT (sel_ctrll) sel[0][l] = 0 AND NOT (sel_ctrll) sel[l][0] = (NOT (1) AND 1) OR sel_ctrll sel[l][l] = (NOT (1) AND 0) OR sel_ctrll
The bank select signals for each bank are OR'ed together by an OR gate 250 having an output fed into a bank enable input. If no port attempts to access a bank, then the bank is not enabled. Here, bankl is not being accessed. Consequently, the signal sel[l] (i.e., sel[0][l] and sel[l][l]) for bankl has no effect. However, both ports are attempting to read from bankO. Assume hits for both port addresses. During cycle 0, the hit signal, hitO, from portO is routed through the hit multiplexers 218 to the hit_bankO input of bankO so that the data word X can be output from portO during cycle 1.
Second, sel_ctrl is asserted during the stall (cycle 1) so as to force selO to select portl during cycle 1. See Figure 3B and Table 1. As a result, during the stall cycle 1, the hit signal, hitl, from portl is routed through the hit multiplexers 218 to the hit_bank0 input, of bankO so that the data word Y can be outputted through portl during the next cycle, cycle 2. Further, during the stall cycle, the result of a read operation for portO is latched on the read bus for portO by latching circuitry on the bus (not shown). As a result, data X read from portO and data Y from portl appear simultaneously during cycle 2. Because CPU operations are stalled during cycle 1 , it appears to the CPU that the dual port cache access occurs simultaneously in a cycle immediately following cycle 0.
Note that the conflict resolution circuitry 226 grants priority access to portO in case of a conflict. Those skilled in the art will recognize that the conflict resolution circuitry 226 may grant access to conflicting ports in any order of priority. In the examples described herein, the ports are numbered so that low-numbered ports correspond to those requiring high-priority access, whereas high-numbered ports can wait longer for access.
Further, the conflict resolution circuitry is not limited to resolving conflicts between only two ports. For a cache having K ports, if N 2 ports attempt to access the same bank, then access may first be given to the lowest numbered port, and the CPU stalled for N-l cycles to allow access by the remaining conflicting ports in ascending order by port number. For example, for a cache with K=3 ports, for each bank i, there are three bank select signals per bank, bd[0][i], bd[l][i], bd[2][i], one for each port. There are three port select signals sel[0][i], sel[l][i], sel[2][i], indicating that port 0, 1 or 2 is selected to address the bank.
There are two selection control signals, shared by all banks, to override priorities of bank conflict resolution: sel_ctrll, sel_ctrl2. If sel_ctrll is asserted, then port 1 is selected. If sel_ctrl2 is asserted, then port 2 is selected. If neither sel_ctrll nor sel_ctrl2 is asserted, then port 0 has priority.
For each bank i: sel[0][i] = bdO[i] AND NOT (sel_ctrll OR sel_ctrl2) sel[l][i] = (NOT (bdO[i]) AND bdl[i]) OR sel_ctrll sel[2][i] = (NOT (bdO[i]) AND NOT (bdl[i]) AND bd2[i]) OR sel_ctrl2 In general, for K ports, where K>3, for each bank i: for each port j (0 j < K): for each bank select signal bdfj][i] of port j in bank i: for each selection control signal sel_ctrl[m](m-=l,. . ., K-l): the conflict resolution circuitry generates output select signals sel[j][i] as follows: sel[0][i]=bd[0][i] AND NOT (sel_ctrl[l] OR sel_ctrl[2] OR . . . OR sel_ctrl[K-l] sel[l][i] =(NOT (bd[0][i]) AND bd[l][i]) OR sel_ctrl[l] sel[2][i]=(NOT (bd[0][i]) AND NOT (bd[l][i]) AND bd[2][i]) OR sel_ctrl[2] sellj][i] = (NOT (bd[0][i]) AND NOT (bd[l][i])
AND NOT (bd[j-l][i]) AND (bd[j][i])) OR sel_ctrlij] One can see that a large number of bank conflicts would give rise to many stall cycles that would hinder overall performance. Thus, it is advantageous to limit the number of bank conflicts within the same CPU cycle. According to one embodiment of the present invention, bank conflicts are avoided in the compiler and application software by allocating variables in nearby instructions to addresses in different banks. Thus, it is highly unlikely that the same bank would be addressed in the same cycle. Further, the organization of the address space itself helps to reduce the chance of a bank conflict. By using lower order address bits, e.g., the second bit, to select the bank, adjacent words of the cache block are evenly distributed among all the banks. In this manner, the addressing of adjacent words will result in the addressing of different banks. Because of the locality of reference property, this organization thus reduces the chance of conflict.
Although the invention has been described in conjunction with particular embodiments, it will be appreciated that various modifications and alterations may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, the cache can be organized as an eight-way set-associative cache of eight banks. In
that configuration, address bits 6-10 act as the set index. Each set comprises two rows in each bank. Bit 5 selects one of the two rows, and bits 2-4 select the bank. The address bits 11-31 are used for the tag comparison. Bits 0-1 correspond to the byte within a word. Further, the present invention can be applied to a pipelined cache.
Claims
1. A microprocessor system with a multi-port cache comprising: a plurality of memory banks; a plurality of ports for enabling accesses to the banks; and conflict detection circuitry for detecting simultjaneous addressing of a first memory bank through a first port and a second port, and for stalling processor operations for a predetermined time in response to the detection of simultaneous addressing.
2. The processor system of claim 1, further comprising: conflict resolution circuitry for allowing access to the first memory bank through the first port during the stall and for allowing access to the first memory bank through the second port after the stall is complete.
3. The processor system of claim 1, wherein each bank is single-ported.
4. The processor system of claim 1, wherein the banks are addressed so that words within a cache block are distributed among multiple banks.
5. The processor system of claim 1 , wherein the banks have non overlapping address spaces.
6. A processor system according to Claim 1, 3, 4 or 5, comprising conflict resolution circuitry for allowing access to the first memory bank through ports that are attempting to access the first memory bank in order to ascending priority during successive clock cycles while the processor is stalled.
7. A multiport memory comprising a plurality of memory banks; a plurality of ports for enabling accesses to the banks; and conflict detection circuitry for detecting simultaneous addressing of a first memory bank through a first port and a second port, and an output for a signal to stall processor operations for a predetermined time in response to the detection of simultaneous addressing.
8. A multiport memory according to Claim 7, conflict resolution circuitry for allowing access to the first memory bank through the first port during the stall and for allowing access to the first memory bank through the second port after the stall is complete.
9. A multiport memory according to Claim 8, comprising conflict resolution circuitry for allowing access to the first memory bank through ports that are attempting to access the first memory bank in order of ascending priority during successive clock cycles while the processor is stalled.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US71960996A | 1996-09-25 | 1996-09-25 | |
US719609 | 1996-09-25 | ||
PCT/IB1997/001146 WO1998013763A2 (en) | 1996-09-25 | 1997-09-23 | Multiport cache memory with address conflict detection |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0875030A2 true EP0875030A2 (en) | 1998-11-04 |
Family
ID=24890679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP97940270A Withdrawn EP0875030A2 (en) | 1996-09-25 | 1997-09-23 | Multi-port cache memory with address conflict detection |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP0875030A2 (en) |
JP (1) | JP2000501539A (en) |
KR (1) | KR19990071554A (en) |
WO (1) | WO1998013763A2 (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19809640A1 (en) * | 1998-03-06 | 1999-09-09 | Pact Inf Tech Gmbh | Speed-optimized cache system |
US6539457B1 (en) * | 2000-02-21 | 2003-03-25 | Hewlett-Packard Company | Cache address conflict mechanism without store buffers |
US6557078B1 (en) * | 2000-02-21 | 2003-04-29 | Hewlett Packard Development Company, L.P. | Cache chain structure to implement high bandwidth low latency cache memory subsystem |
US6606684B1 (en) | 2000-03-31 | 2003-08-12 | Intel Corporation | Multi-tiered memory bank having different data buffer sizes with a programmable bank select |
US6446181B1 (en) | 2000-03-31 | 2002-09-03 | Intel Corporation | System having a configurable cache/SRAM memory |
US7073026B2 (en) | 2002-11-26 | 2006-07-04 | Advanced Micro Devices, Inc. | Microprocessor including cache memory supporting multiple accesses per cycle |
US7769950B2 (en) * | 2004-03-24 | 2010-08-03 | Qualcomm Incorporated | Cached memory system and cache controller for embedded digital signal processor |
US7613065B2 (en) | 2005-09-29 | 2009-11-03 | Hynix Semiconductor, Inc. | Multi-port memory device |
KR100780621B1 (en) * | 2005-09-29 | 2007-11-29 | 주식회사 하이닉스반도체 | Multi Port Memory Device |
KR100754359B1 (en) * | 2006-03-29 | 2007-09-03 | 엠텍비젼 주식회사 | Multi-port memory device containing a plurality of shared blocks |
KR101635395B1 (en) | 2010-03-10 | 2016-07-01 | 삼성전자주식회사 | Multi port data cache device and Method for controlling multi port data cache device |
KR101788245B1 (en) | 2011-02-25 | 2017-11-16 | 삼성전자주식회사 | Multi-port cache memory apparatus and method for operating multi-port cache memory apparatus |
CN102622192B (en) * | 2012-02-27 | 2014-11-19 | 北京理工大学 | A Weak Correlation Multiport Parallel Memory Controller |
US9171594B2 (en) * | 2012-07-19 | 2015-10-27 | Arm Limited | Handling collisions between accesses in multiport memories |
KR102346629B1 (en) * | 2014-12-05 | 2022-01-03 | 삼성전자주식회사 | Method and apparatus for controlling access for memory |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0668735B2 (en) * | 1987-02-09 | 1994-08-31 | 日本電気アイシーマイコンシステム株式会社 | Cache memory |
US5276850A (en) * | 1988-12-27 | 1994-01-04 | Kabushiki Kaisha Toshiba | Information processing apparatus with cache memory and a processor which generates a data block address and a plurality of data subblock addresses simultaneously |
JP2822588B2 (en) * | 1990-04-30 | 1998-11-11 | 日本電気株式会社 | Cache memory device |
US5434989A (en) * | 1991-02-19 | 1995-07-18 | Matsushita Electric Industrial Co., Ltd. | Cache memory for efficient access with address selectors |
-
1997
- 1997-09-23 WO PCT/IB1997/001146 patent/WO1998013763A2/en not_active Application Discontinuation
- 1997-09-23 KR KR1019980703828A patent/KR19990071554A/en not_active Application Discontinuation
- 1997-09-23 EP EP97940270A patent/EP0875030A2/en not_active Withdrawn
- 1997-09-23 JP JP10515453A patent/JP2000501539A/en active Pending
Non-Patent Citations (1)
Title |
---|
See references of WO9813763A2 * |
Also Published As
Publication number | Publication date |
---|---|
JP2000501539A (en) | 2000-02-08 |
WO1998013763A2 (en) | 1998-04-02 |
WO1998013763A3 (en) | 1998-06-04 |
KR19990071554A (en) | 1999-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5640534A (en) | Method and system for concurrent access in a data cache array utilizing multiple match line selection paths | |
US5247649A (en) | Multi-processor system having a multi-port cache memory | |
US5239642A (en) | Data processor with shared control and drive circuitry for both breakpoint and content addressable storage devices | |
US4805098A (en) | Write buffer | |
US6192458B1 (en) | High performance cache directory addressing scheme for variable cache sizes utilizing associativity | |
US6665774B2 (en) | Vector and scalar data cache for a vector multiprocessor | |
CN100367257C (en) | SDRAM controller for parallel processor architecture | |
US6321296B1 (en) | SDRAM L3 cache using speculative loads with command aborts to lower latency | |
JPH06309216A (en) | Data processor with cache memory capable of being used as linear ram bank | |
US5251310A (en) | Method and apparatus for exchanging blocks of information between a cache memory and a main memory | |
EP0407119B1 (en) | Apparatus and method for reading, writing and refreshing memory with direct virtual or physical access | |
US6157980A (en) | Cache directory addressing scheme for variable cache sizes | |
US5668972A (en) | Method and system for efficient miss sequence cache line allocation utilizing an allocation control cell state to enable a selected match line | |
WO1998013763A2 (en) | Multiport cache memory with address conflict detection | |
JPH05173837A (en) | Data processing system wherein static masking and dynamic masking of information in operand are both provided | |
US6381686B1 (en) | Parallel processor comprising multiple sub-banks to which access requests are bypassed from a request queue when corresponding page faults are generated | |
US6988167B2 (en) | Cache system with DMA capabilities and method for operating same | |
US5761714A (en) | Single-cycle multi-accessible interleaved cache | |
EP0340668B1 (en) | Multi-processor system having a multi-port cache memory | |
US5809537A (en) | Method and system for simultaneous processing of snoop and cache operations | |
EP0611026A1 (en) | Dual-port data cache memory | |
US5161219A (en) | Computer system with input/output cache | |
JP2003256275A (en) | Bank conflict determination | |
JPH06318174A (en) | Cache memory system and method for performing cache for subset of data stored in main memory | |
US20020108021A1 (en) | High performance cache and method for operating same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): DE FR GB |
|
17P | Request for examination filed |
Effective date: 19981204 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Withdrawal date: 20020404 |