US8015361B2 - Memory-centric page table walker - Google Patents
Memory-centric page table walker Download PDFInfo
- Publication number
- US8015361B2 US8015361B2 US11/956,625 US95662507A US8015361B2 US 8015361 B2 US8015361 B2 US 8015361B2 US 95662507 A US95662507 A US 95662507A US 8015361 B2 US8015361 B2 US 8015361B2
- Authority
- US
- United States
- Prior art keywords
- data
- cache
- page table
- memory
- address
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 230000015654 memory Effects 0.000 claims abstract description 69
- 238000012545 processing Methods 0.000 claims abstract description 6
- 238000013519 translation Methods 0.000 claims description 15
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000000034 method Methods 0.000 claims description 7
- 239000004065 semiconductor Substances 0.000 claims description 4
- 230000014616 translation Effects 0.000 description 13
- 241001272996 Polyphylla fullo Species 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 238000013329 compounding Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1009—Address translation using page tables, e.g. page table structures
- G06F12/1018—Address translation using page tables, e.g. page table structures involving hashing techniques, e.g. inverted page tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
- G06F2212/1021—Hit rate improvement
Definitions
- the present invention relates to computer memory management, particular to page tables in such memories and more particularly to page table walkers.
- Memory addressing in the computer's main memory i.e. the fast semiconductor storage (RAM) directly connected to the computer processor, conventionally uses paging to implement virtual memory.
- the virtual address space is divided into fixed size units or blocks called pages. Each page can be mapped to any physical address corresponding to a hardware location available in the system.
- a memory management unit MMU operates a selected paging algorithm to determine and maintain the current mappings from the virtual to physical addresses using one or more page tables.
- the MMU When an address is received from an execution unit in the processor, the MMU will translate virtual to physical address using the page tables.
- the page tables are conventionally stored in the main memory, and page table walker is invoked to access the page tables, and provide appropriate translation.
- the computer memory management art is always seeking implementations for improving speed and efficiency of page table walkers.
- the present invention provides an implementation for improving the speed and effectiveness of page table walkers.
- FIG. 1 is a generalized representation of a conventional computer memory system using page tables 101 and a page table walker 102 .
- the memory includes several levels of cache 103 - 104 , a memory management unit 105 for address translation, system bus 106 , memory controller 107 , and main memory (DRAM) 108 .
- DRAM main memory
- the processor 110 executes memory access instructions (e.g. load, store), it presents an “Effective Address” to the data cache L1 103 .
- the Memory Management Unit (MMU) 105 converts the “Effective Address” into a “Physical Address” required for accessing the data (including in some systems, an intermediate “Virtual Address”).
- the SLB (Segment Look-aside Buffer) 111 supports translation from Effective to Virtual Addresses
- the TLB (Translation Look-aside Buffer) 112 supports translation from Virtual to Real Addresses.
- ERATs Effective-to-Real Translation
- caches 113 and 114 cache a limited number of previous Effective-to-Real translations in anticipation of their reuse.
- the process within the MMU 105 can be bypassed.
- a similar process occurs when the processor fetches new instructions for execution.
- the physical address is determined, it may be used to validate an entry found in the L1 instruction cache 115 or if no match is found in the L1 cache 115 , the physical address is presented to the L2 cache 104 . In cases where there is also no match found in the L2 104 cache, the physical address is propagated to the memory subsystem to access the required data.
- a unique address translation is required for each memory page; a page may contain 4 KBytes, 64 KBytes, or other larger amounts of DRAM 108 storage.
- the TLB 112 contains an entry for each of the most recently required translations, but occasionally an address will be presented to the MMU 105 that doesn't have a matching translation in the TLB 112 . When this happens, a TLB miss is declared, and the Page Table Walker 101 is activated to search the complete Page Table stored in DRAM 108 .
- the page table walker 101 typically includes a hash function, followed by one or more memory accesses, and the processing individual PTEs (page table entries) in the resulting data to locate the required PTE.
- this new PTE is used to complete the required address translation, and the pending memory access process continues as with normal accesses.
- the new PTE displaces another PTE within the TLB 112 , based on time since last use.
- An LRU (last recently used) mechanism similar to that used in caches determines which previous TLB 112 entry to displace.
- page table walkers 102 typically retrieves a full cache line of data from the page table 101 in DRAM, even though the required PTE is a fraction of that size. For example in the Power PCTM architecture, as many as eight PTE's fit within a 128 byte cache line. Moving eight times the required data across system buses from memory 108 to the MMU 105 results in unproductive power dissipation.
- each cache line fetched by the page table walker displaces some other cache line in the L2 cache 104 , even though it is highly unlikely that the page table data will be used again while it is still in the cache.
- the present invention provides a solution which reduces the undesirable effects described above.
- this invention involves the recognition that moving the page table walker from its conventional location in the memory management unit to a location in main memory i.e. the main memory controller, many of the above described effects could be minimized.
- an implementation is provided wherein the processing of requests for data could selectively avoid or bypass cumbersome caches associated with the data processor.
- the present invention provides a computer system comprising a data processor unit connected to a main memory in which the data processor unit includes a memory management unit for controlling the conversion of an address of requested data received from a processor into a physical address of said requested data; and in which, the main memory includes apparatus for storing the data being accessed in pages at the physical addresses, a page table accessed by the memory management unit for converting to said page addresses, and the page table walker for proceeding through the entries in said page table.
- the main memory includes a random access memory (RAM), preferably DRAM and a memory controller for controlling said random access memory; and the memory controller contains the page table walker.
- RAM random access memory
- DRAM dynamic random access memory
- memory controller contains the page table walker.
- the data processor further includes at least one data cache for storing recently requested data, and apparatus in its associated memory management unit for checking received data requests against data stored in said cache.
- the present invention provides apparatus, the memory management unit for selectively bypassing the cache so that a data request is connected directly to said page table walker in the memory controller for address conversion. This selective bypassing involves deciding whether a data request checks for the requested data in the cache or if a data request bypasses said cache and is connected directly to page table walker for conversion. This decision may be based upon whether there is a flag in the address of the requested data.
- the present invention enables a plurality of said processor connected to one main memory, and use the same page table walker in the main memory.
- FIG. 1 shows a generalized view of a conventional main memory and an associated processor unit in the prior art.
- FIG. 2 shows a generalized embodiment of the main memory and an associated processor unit in the present invention.
- FIG. 1 showing the prior art has been described hereinabove in the background of the invention.
- FIG. 2 shows a generalized embodiment of the present invention.
- the following elements perform the same functions in the embodiment of FIG. 2 that their corresponding items marked Inn etc. perform in the prior art embodiment described hereinabove with respect to FIG. 1 : Processor Core 210 , Data ERAT 213 , Instr ERAT 214 , L1 Data Cache 203 , L1 Instruction 215 , L2 Cache 204 , System Bus 206 , Memory Cntrlr 207 , DRAM 208 , and Page Table 201 . Comparing FIG. 2 with FIG. 1 , it can be seen that the Page Table Walker 202 has been removed from the MMU 205 , and placed within the Memory Controller 207 .
- the MMU (memory management unit) 205 When a TLB (translation look aside buffer) 212 “miss” is detected, the MMU (memory management unit) 205 generates a non-cacheable read using the Virtual Address (or Effective Address if there is no SLB (segment look aside buffer) 211 of the pending memory access as the address of the non-cacheable read.
- This request may be flagged via a special command code, inserted into the data request, as a Page Table only search. This will result in the routing to the cache bypass via the NCU 217 and System Bus 206 to the Page Table Walker 202 within the Memory controller 207 subsystem.
- any virtual address is hashed, a block of memory is accessed, and that data is scanned for a PTE (page table entry) that matches the virtual Address.
- PTE page table entry
- the entry is returned as the response to the request via data line 225 , bus 206 , data line 222 , NCU 217 , and data line 221 .
- the page table walker embodiment shown in FIG. 2 may be adapted to a multi-processor system, wherein a single page table is shared among all processors in order to avoid conflicting uses of memory segments. Such an arrangement would enable multiple processors to share a single page table walker. Even in large systems with multiple memory controllers, a page table can be fit within a single DRAM, and thus the page table walker need only to be included within the one memory controller for the DRAM containing the Page Table.
- processor 210 MMU 205 , NCU 217 and all of the caches may be integrated into a semiconductor chip separate from the semiconductor chip incorporating memory controller 207 and DRAM 208 .
- a full-function processor may control multiple special purpose processors.
- the complexity of a full MMU memory management unit
- the full-function processor takes on the responsibility of handling TLB updates on the special purpose devices via appropriate software. This adds significant latency and overhead.
- the present invention may enable these special purpose processors to update their TLBs by using the main processor's memory table walker. This enables the special purpose processors to remain simple, but at the same time avoids the latency of a software update.
- the page table walker may include an enhanced function to anticipate the need of the next sequential page, and complete the page table walk to access the corresponding PTE (page table entry).
- PTE page table entry
- Such an anticipated PTE could be cached in a single entry cache within the page table walker. In the case of a page table walker supporting multiple processors, this PTE cache could include one entry for each processor.
- This pre-fetch action could be configured to always acquire the next sequential PTE (i.e. via setting a configuration bit), or it could be triggered by detecting two consecutive page table walks from the same core that has accessed PTE's for sequential pages. It should be noted that fast access should be possible most of the time to the PTEG (page table entry group) containing the PTE for the next sequential page since the hash used for the page table should place PTE's for sequential pages in sequential PTEG positions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims (7)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/956,625 US8015361B2 (en) | 2007-12-14 | 2007-12-14 | Memory-centric page table walker |
US12/109,671 US7984263B2 (en) | 2007-12-14 | 2008-04-25 | Structure for a memory-centric page table walker |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/956,625 US8015361B2 (en) | 2007-12-14 | 2007-12-14 | Memory-centric page table walker |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/109,671 Continuation-In-Part US7984263B2 (en) | 2007-12-14 | 2008-04-25 | Structure for a memory-centric page table walker |
Publications (2)
Publication Number | Publication Date |
---|---|
US20090157975A1 US20090157975A1 (en) | 2009-06-18 |
US8015361B2 true US8015361B2 (en) | 2011-09-06 |
Family
ID=40754800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/956,625 Expired - Fee Related US8015361B2 (en) | 2007-12-14 | 2007-12-14 | Memory-centric page table walker |
Country Status (1)
Country | Link |
---|---|
US (1) | US8015361B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11449444B2 (en) * | 2017-02-08 | 2022-09-20 | Texas Instruments Incorporated | Apparatus and mechanism to bypass PCIe address translation by using alternative routing |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9256550B2 (en) * | 2012-03-28 | 2016-02-09 | International Business Machines Corporation | Hybrid address translation |
US8938602B2 (en) | 2012-08-02 | 2015-01-20 | Qualcomm Incorporated | Multiple sets of attribute fields within a single page table entry |
US9436616B2 (en) | 2013-05-06 | 2016-09-06 | Qualcomm Incorporated | Multi-core page table sets of attribute fields |
KR102432754B1 (en) * | 2013-10-21 | 2022-08-16 | 에프엘씨 글로벌 리미티드 | Final level cache system and corresponding method |
KR101994952B1 (en) | 2015-03-27 | 2019-07-01 | 후아웨이 테크놀러지 컴퍼니 리미티드 | Data processing method, memory management unit and memory control device |
CN114238176B (en) * | 2021-12-14 | 2023-03-10 | 海光信息技术股份有限公司 | Processor, address translation method for processor and electronic equipment |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5960463A (en) * | 1996-05-16 | 1999-09-28 | Advanced Micro Devices, Inc. | Cache controller with table walk logic tightly coupled to second level access logic |
US6012132A (en) | 1997-03-31 | 2000-01-04 | Intel Corporation | Method and apparatus for implementing a page table walker that uses a sliding field in the virtual addresses to identify entries in a page table |
US6088780A (en) | 1997-03-31 | 2000-07-11 | Institute For The Development Of Emerging Architecture, L.L.C. | Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address |
US20020065989A1 (en) * | 2000-08-21 | 2002-05-30 | Gerard Chauvel | Master/slave processing system with shared translation lookaside buffer |
US20030079103A1 (en) * | 2001-10-24 | 2003-04-24 | Morrow Michael W. | Apparatus and method to perform address translation |
US6741258B1 (en) * | 2000-01-04 | 2004-05-25 | Advanced Micro Devices, Inc. | Distributed translation look-aside buffers for graphics address remapping table |
US20060136680A1 (en) * | 2004-12-17 | 2006-06-22 | International Business Machines Corporation | Capacity on demand using signaling bus control |
US20060224815A1 (en) | 2005-03-30 | 2006-10-05 | Koichi Yamada | Virtualizing memory management unit resources |
US20060259734A1 (en) | 2005-05-13 | 2006-11-16 | Microsoft Corporation | Method and system for caching address translations from multiple address spaces in virtual machines |
US20060277357A1 (en) * | 2005-06-06 | 2006-12-07 | Greg Regnier | Inter-domain data mover for a memory-to-memory copy engine |
US20070038839A1 (en) * | 2005-08-12 | 2007-02-15 | Advanced Micro Devices, Inc. | Controlling an I/O MMU |
US20070038840A1 (en) * | 2005-08-12 | 2007-02-15 | Advanced Micro Devices, Inc. | Avoiding silent data corruption and data leakage in a virtual environment with multiple guests |
US20070168644A1 (en) | 2006-01-17 | 2007-07-19 | Hummel Mark D | Using an IOMMU to Create Memory Archetypes |
US7353445B1 (en) * | 2004-12-10 | 2008-04-01 | Sun Microsystems, Inc. | Cache error handling in a multithreaded/multi-core processor |
US7363491B2 (en) * | 2004-03-31 | 2008-04-22 | Intel Corporation | Resource management in security enhanced processors |
US20080209130A1 (en) * | 2005-08-12 | 2008-08-28 | Kegel Andrew G | Translation Data Prefetch in an IOMMU |
US20090158003A1 (en) * | 2007-12-14 | 2009-06-18 | Sathaye Sumedh W | Structure for a memory-centric page table walker |
-
2007
- 2007-12-14 US US11/956,625 patent/US8015361B2/en not_active Expired - Fee Related
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5960463A (en) * | 1996-05-16 | 1999-09-28 | Advanced Micro Devices, Inc. | Cache controller with table walk logic tightly coupled to second level access logic |
US6012132A (en) | 1997-03-31 | 2000-01-04 | Intel Corporation | Method and apparatus for implementing a page table walker that uses a sliding field in the virtual addresses to identify entries in a page table |
US6088780A (en) | 1997-03-31 | 2000-07-11 | Institute For The Development Of Emerging Architecture, L.L.C. | Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address |
US6741258B1 (en) * | 2000-01-04 | 2004-05-25 | Advanced Micro Devices, Inc. | Distributed translation look-aside buffers for graphics address remapping table |
US20020065989A1 (en) * | 2000-08-21 | 2002-05-30 | Gerard Chauvel | Master/slave processing system with shared translation lookaside buffer |
US20030079103A1 (en) * | 2001-10-24 | 2003-04-24 | Morrow Michael W. | Apparatus and method to perform address translation |
US7363491B2 (en) * | 2004-03-31 | 2008-04-22 | Intel Corporation | Resource management in security enhanced processors |
US7353445B1 (en) * | 2004-12-10 | 2008-04-01 | Sun Microsystems, Inc. | Cache error handling in a multithreaded/multi-core processor |
US20060136680A1 (en) * | 2004-12-17 | 2006-06-22 | International Business Machines Corporation | Capacity on demand using signaling bus control |
US20060224815A1 (en) | 2005-03-30 | 2006-10-05 | Koichi Yamada | Virtualizing memory management unit resources |
US20060259734A1 (en) | 2005-05-13 | 2006-11-16 | Microsoft Corporation | Method and system for caching address translations from multiple address spaces in virtual machines |
US20060277357A1 (en) * | 2005-06-06 | 2006-12-07 | Greg Regnier | Inter-domain data mover for a memory-to-memory copy engine |
US20070038840A1 (en) * | 2005-08-12 | 2007-02-15 | Advanced Micro Devices, Inc. | Avoiding silent data corruption and data leakage in a virtual environment with multiple guests |
US20070038839A1 (en) * | 2005-08-12 | 2007-02-15 | Advanced Micro Devices, Inc. | Controlling an I/O MMU |
US20080209130A1 (en) * | 2005-08-12 | 2008-08-28 | Kegel Andrew G | Translation Data Prefetch in an IOMMU |
US20070168644A1 (en) | 2006-01-17 | 2007-07-19 | Hummel Mark D | Using an IOMMU to Create Memory Archetypes |
US20090158003A1 (en) * | 2007-12-14 | 2009-06-18 | Sathaye Sumedh W | Structure for a memory-centric page table walker |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11449444B2 (en) * | 2017-02-08 | 2022-09-20 | Texas Instruments Incorporated | Apparatus and mechanism to bypass PCIe address translation by using alternative routing |
US12056073B2 (en) | 2017-02-08 | 2024-08-06 | Texas Instruments Incorporated | Apparatus and mechanism to bypass PCIE address translation by using alternative routing |
Also Published As
Publication number | Publication date |
---|---|
US20090157975A1 (en) | 2009-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102663356B1 (en) | Address translation cache | |
KR102448124B1 (en) | Cache accessed using virtual addresses | |
EP1941375B1 (en) | Caching memory attribute indicators with cached memory data | |
JP3278748B2 (en) | Method and apparatus for saving memory space | |
US8966219B2 (en) | Address translation through an intermediate address space | |
US20040117588A1 (en) | Access request for a data processing system having no system memory | |
US20040117587A1 (en) | Hardware managed virtual-to-physical address translation mechanism | |
US6073226A (en) | System and method for minimizing page tables in virtual memory systems | |
US8015361B2 (en) | Memory-centric page table walker | |
US9110825B2 (en) | Uncached static short address translation table in the cache coherent computer system | |
TW201617886A (en) | Instruction cache translation management | |
US6766434B2 (en) | Method for sharing a translation lookaside buffer between CPUs | |
US7984263B2 (en) | Structure for a memory-centric page table walker | |
US7017024B2 (en) | Data processing system having no system memory | |
US20050055528A1 (en) | Data processing system having a physically addressed cache of disk memory | |
US20040117590A1 (en) | Aliasing support for a data processing system having no system memory | |
JPH0371355A (en) | Apparatus and method for retrieving cache | |
US6859868B2 (en) | Object addressed memory hierarchy | |
US7293157B1 (en) | Logically partitioning different classes of TLB entries within a single caching structure | |
US20040117583A1 (en) | Apparatus for influencing process scheduling in a data processing system capable of utilizing a virtual memory processing scheme | |
US20040117589A1 (en) | Interrupt mechanism for a data processing system having hardware managed paging of disk data | |
JPH03252745A (en) | Microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CPRPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SATHAYE, SUMEDTH W;DAVIS, GORDON T;REEL/FRAME:020659/0371;SIGNING DATES FROM 20070101 TO 20071213 Owner name: INTERNATIONAL BUSINESS MACHINES CPRPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SATHAYE, SUMEDTH W;DAVIS, GORDON T;SIGNING DATES FROM 20070101 TO 20071213;REEL/FRAME:020659/0371 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20190906 |