[go: up one dir, main page]

CN114840445B - Memory access method and device - Google Patents

Memory access method and device Download PDF

Info

Publication number
CN114840445B
CN114840445B CN202210200142.7A CN202210200142A CN114840445B CN 114840445 B CN114840445 B CN 114840445B CN 202210200142 A CN202210200142 A CN 202210200142A CN 114840445 B CN114840445 B CN 114840445B
Authority
CN
China
Prior art keywords
address
tlb
cpu core
memory access
physical address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210200142.7A
Other languages
Chinese (zh)
Other versions
CN114840445A (en
Inventor
郭凯杰
罗犇
彭开桓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210200142.7A priority Critical patent/CN114840445B/en
Publication of CN114840445A publication Critical patent/CN114840445A/en
Priority to PCT/CN2023/075635 priority patent/WO2023165317A1/en
Application granted granted Critical
Publication of CN114840445B publication Critical patent/CN114840445B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0292User address space allocation, e.g. contiguous or non contiguous base addressing using tables or multilevel address translation means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种内存访问方法和装置,应用于CPU核心,该方法包括:响应于内存访问请求,在TLB中查找内存访问请求携带的虚拟地址对应的物理地址;在未查找到虚拟地址对应的物理地址的情况下,发送地址探测请求,地址探测请求中携带所述虚拟地址,以供接收到地址探测请求的CPU核心在其TLB中查找虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,地址探测响应中携带查找到的物理地址;响应于接收到的地址探测响应,将地址探测响应中携带的物理地址与虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。

A memory access method and device, applied to a CPU core, the method comprising: in response to a memory access request, searching in a TLB for a physical address corresponding to a virtual address carried in the memory access request; in the case where the physical address corresponding to the virtual address is not found, sending an address detection request, the address detection request carrying the virtual address, so that the CPU core that receives the address detection request searches in its TLB for the physical address corresponding to the virtual address, and in the case where the corresponding physical address is found, returning an address detection response, the address detection response carrying the found physical address; in response to the received address detection response, storing the mapping relationship between the physical address carried in the address detection response and the virtual address in the TLB, and performing memory access based on the physical address.

Description

Memory access method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a memory access method and apparatus.
Background
The CPU (Central Processing Unit ) is the computer system operation and control core, and is the final execution unit for information processing and program running. The CPU may include one or more CPU cores (cores). A CPU core typically includes a TLB (Translation Look-aside Buffer), which is a high-speed hardware Buffer for buffering the mapping relationship between virtual addresses and physical addresses, so as to increase the address Translation rate.
For a multi-core CPU, different threads of the same process may run on different CPU cores, and the process or thread may also be scheduled to run in other CPU cores. In the scenario of running across CPU cores, how to increase the address conversion rate has become a technical problem to be solved.
Disclosure of Invention
In view of this, the present specification provides a memory access method and apparatus.
Specifically, the specification is realized by the following technical scheme:
A memory access method for memory access of a computer system, the computer system comprising a central processing unit CPU, the CPU comprising a plurality of CPU cores, the CPU cores comprising a bypass translation cache TLB for caching mappings between virtual addresses and physical addresses, the method being applied to the CPU cores, comprising:
Responding to a memory access request, and searching a physical address corresponding to a virtual address carried by the memory access request in a TLB (TLB);
Under the condition that the physical address corresponding to the virtual address is not found, an address detection request is sent, wherein the address detection request carries the virtual address, so that a CPU core receiving the address detection request searches a physical address corresponding to the virtual address in a TLB (TLB), and under the condition that the corresponding physical address is found, an address detection response is returned, and the address detection response carries the found physical address;
And responding to the received address detection response, storing the mapping relation between the physical address and the virtual address carried in the address detection response into a TLB, and performing memory access based on the physical address.
Optionally, the sending an address probe request includes:
The broadcast transmission address probe request.
Optionally, the sending an address probe request includes:
reading a target core identifier of a target CPU core from a register;
And sending an address detection request to the target CPU core based on the target core identifier.
Optionally, the target CPU core is a CPU core where other threads in the process to which the thread that initiates the memory access request belongs are located.
Optionally, the target CPU core is a CPU core where a thread that initiates the memory access request is located before being scheduled to the present CPU core.
Optionally, the target core identification is written by a scheduler.
Optionally, the method further comprises:
Receiving a TLB (TLB) invalidation instruction, wherein the TLB invalidation instruction is sent after a thread initiating the memory access request is scheduled to other CPU cores;
In response to the TLB invalidation instruction, marking a mapping relationship between a virtual address and a physical address specified by the TLB invalidation instruction in the TLB as an invalidation state;
The method further comprises the steps of:
After receiving address detection requests sent by other CPU cores, searching physical addresses corresponding to virtual addresses in a valid state and an invalid state in the TLB.
Optionally, the method further comprises:
and under the condition that the address detection response is not received, the process page table based on the memory searches the physical address corresponding to the virtual address.
A memory access device for memory access of a computer system, the computer system comprising a central processing unit CPU, the CPU comprising a plurality of CPU cores, the CPU cores comprising a bypass translation cache TLB for caching mappings between virtual addresses and physical addresses, the device being applied to the CPU cores, comprising:
An address searching unit, responding to a memory access request, and searching a physical address corresponding to a virtual address carried by the memory access request in a TLB;
an address detection unit, configured to send an address detection request, where the address detection request carries a virtual address, where a CPU core that receives the address detection request searches in a TLB of the CPU core for a physical address corresponding to the virtual address, and returns an address detection response, where the physical address corresponding to the virtual address is found, where the address detection response carries the found physical address;
And the memory access unit responds to the received address detection response, stores the mapping relation between the physical address and the virtual address carried in the address detection response into the TLB, and performs memory access based on the physical address.
Optionally, the address detection unit reads a target core identifier of a target CPU core from the register, and sends an address detection request to the target CPU core based on the target core identifier.
A central processing unit, CPU, comprising a plurality of CPU cores including a bypass translation cache, TLB, for caching mappings between virtual addresses and physical addresses, the CPU cores configured to:
Responding to a memory access request, and searching a physical address corresponding to a virtual address carried by the memory access request in a TLB (TLB);
Under the condition that the physical address corresponding to the virtual address is not found, an address detection request is sent, wherein the address detection request carries the virtual address, so that a CPU core receiving the address detection request searches a physical address corresponding to the virtual address in a TLB (TLB), and under the condition that the corresponding physical address is found, an address detection response is returned, and the address detection response carries the found physical address;
And responding to the received address detection response, storing the mapping relation between the physical address and the virtual address carried in the address detection response into a TLB, and performing memory access based on the physical address.
By adopting the embodiment, when the CPU core does not store the physical address corresponding to the virtual address in the TLB, the CPU core sends an address detection request to other CPU cores, and the other CPU cores search whether the physical address corresponding to the virtual address is stored in the respective TLB or not, and can add the searched physical address to the address detection response to return. The CPU core may then store the mapping between the physical address and the virtual address in its TLB and perform memory accesses.
By adopting the embodiment, TLB cache sharing among CPU cores can be realized, repeated process page table inquiry by the CPU cores under the scene of running across the CPU cores is greatly reduced, waste of CPU core processing resources is reduced, time consumption of address translation under the condition of TLB Miss is effectively shortened, and IO performance is improved. On the other hand, the technical scheme provided by the specification can be realized based on the existing hardware and the cache detection protocol, does not need to add new hardware, and has low cost and high feasibility.
Drawings
Fig. 1 is a flow chart illustrating a memory access method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a block diagram of a CPU shown in an exemplary embodiment of the present specification.
Fig. 3 is a flow chart illustrating another memory access method according to an exemplary embodiment of the present disclosure.
Fig. 4 is a flow chart illustrating another memory access method according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram of a memory access device according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present description as detailed in the accompanying claims.
The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present description. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
The CPU core may typically include an MMU (Memory Management Unit ) that may translate virtual addresses to physical addresses required for memory access and may store the mapping between virtual addresses and physical addresses in the TLB of the CPU core.
When an application program running on a CPU core performs memory access, the MMU can firstly inquire a physical address corresponding to the virtual address in the TLB, and then perform memory access based on the inquired physical address. If the TLB does not store the physical address corresponding to the virtual address, the MMU needs to search the physical address corresponding to the virtual address based on a process page table of the memory.
To isolate virtual addresses used by different applications, ASID (ADDRESS SPACE ID, address space identifier) is introduced, and after an application is initialized, the operating system may generate ASID for the application and bind the ASID to an application Process Identification (PID). ASID is different for different applications, so that the same virtual address can be used for different applications. The TLB stores a mapping relationship among the ASID, the virtual address, and the physical address.
Where a process is a carrier of application runs, one application run typically corresponds to one process. The thread is the minimum unit for executing operation in the process, and is included in the process and is the actual operation unit in the process. A process may typically include multiple threads that share the memory space of an application. When a process or a thread accesses a memory, an ASID corresponding to the process is used, and the MMU can search a physical address corresponding to the ASID and the virtual address in the TLB based on the ASID and the virtual address.
For multi-core CPUs, there are often many scenarios running across the CPU cores.
For example, a process of a certain application program is scheduled by a scheduler from CPU core 1 to CPU core 2.
For another example, multiple threads of a process run in different CPU cores.
In these scenarios running across CPU cores, they may share the same memory space because they are all processes or threads of the same application. However, in the scenario of running across CPU cores, the MMU of each CPU core needs to search the physical address corresponding to the virtual address based on the process page table of the memory, and then store the mapping relationship into the TLB, which results in repeated process page table search, wastes processing resources of the CPU core, and affects the IO performance.
The present specification provides a memory access scheme of a computer system, which can improve IO performance under a cross-CPU core operation scene and save processing resources of the CPU core.
Fig. 1 is a flow chart illustrating a memory access method according to an exemplary embodiment of the present disclosure.
Referring to fig. 1, the memory access method can be used for memory access of a computer system, the computer system includes a CPU, the CPU includes a plurality of CPU cores, the CPU cores include TLBs, the method is applied to the CPU cores, for example, MMU applicable to the CPU cores, and includes the following steps:
Step 102, in response to a memory access request, searching a physical address corresponding to a virtual address carried by the memory access request in a TLB.
In this specification, a process or a thread running on a CPU core may initiate a memory access request (hereinafter, both described as a thread initiating a memory access request because the thread is an actual operation unit in the process), where the memory access request typically carries a virtual address that needs to be accessed. The ASID of the application program is also carried in the memory access request in the CPU supporting the ASID, and the present specification will describe taking payment ASID as an example.
In response to the memory access request, the MMU of the CPU core may first query the TLB of the CPU core, and query whether the TLB stores the physical address corresponding to the virtual address. For example, the TLB is queried for a physical address corresponding to the ASID and the virtual address specified by the memory access request.
If the corresponding physical address (TLB Hit) is found in the TLB, memory access may be performed based on the found physical address.
If the corresponding physical address (TLB Miss) is not found in the TLB, step 104 may be performed as follows.
It should be noted that, in the related art, if the corresponding physical address is not found in the TLB, the physical address may be found based on the process page table of the memory. By adopting the memory access scheme provided by the specification, if the corresponding physical address is not found in the TLB, the following step 104 can be directly executed, the physical address is not found on the basis of the process page table of the memory, or the physical address is found on the basis of the process page table of the memory in parallel when the following step 104 is executed, and the specification is not particularly limited.
Step 104, sending an address detection request, where the address detection request carries the virtual address, so that a CPU core receiving the address detection request searches a TLB for the physical address corresponding to the virtual address, and returns an address detection response, where the physical address corresponding to the virtual address is found, where the address detection response carries the found physical address.
Based on the lookup result of the foregoing step 102, in the case where the ASID and the physical address corresponding to the virtual address are not cached in the TLB, the CPU core may construct an address probe request, and add the ASID and the virtual address to the address probe request.
The address detection request can be constructed based on a cache detection Protocol (Snoop Protocol), which is a strategy for solving cache consistency of the multi-core processor in a hardware mode. Of course, in other examples of the specification, the address probe request may be constructed based on other protocols, which is not particularly limited in this specification.
In one example, the CPU core may broadcast the constructed address probe request to all CPU cores.
In another example, the CPU core may also send a constructed address probe request to the designated CPU core. For example, the CPU core may first read a target core identification of a target CPU core from a specified register, and then send the address probe request to the target CPU core based on the target core identification.
The target CPU core may be a CPU core where other threads are located in a process where the thread which initiates the memory access request belongs, or a CPU core where the thread which initiates the memory access request is located before being scheduled to the current CPU core. The target core identification of the target CPU core may be written to the specified register by a scheduler.
In the case of sending the address probe request to a designated CPU core, the target core identification may be added as a parameter to the address probe request after being read from a register.
In this specification, after receiving address probe requests sent by other CPU cores, the CPU core may query in its TLB whether to cache the ASID carried by the address probe request and the physical address corresponding to the virtual address.
In the case where the physical address corresponding to the ASID and virtual address is cached in its TLB, the physical address may be added to the address probe response and returned to the CPU core that sent the address probe request.
In the case that the physical address corresponding to the ASIM and the virtual address is not cached in the TLB, an address probe response may also be returned to the CPU core that sent the address probe request, where the address probe response does not carry the physical address.
Referring to the CPU block diagram shown in fig. 2, the CPU cores are connected by a Bus, such as a Ring Bus (Ring Bus), a MESH network Bus, etc. The transmission of address probe requests and address probe responses can be realized among the CPU cores through buses.
It should be noted that the transmission directions of the address probe request/address probe response shown in fig. 2 are only exemplary, and represent that the address probe request and the address probe response are transmitted between the CPU cores, and do not represent an actual transmission path.
And step 106, responding to the received address detection response, storing the mapping relation between the physical address and the virtual address carried in the address detection response into a TLB, and performing memory access based on the physical address.
In this specification, after receiving an address probe response to an address probe request sent by the CPU core, the CPU core may extract the physical address from the address probe response, and then store the mapping relationship among the physical address, the virtual address, and the ASID in the TLB. Alternatively, memory access may be based on the physical address.
The CPU core may find a physical address corresponding to the virtual address based on a process page table of the memory, in a case where an address probe response to an address probe request sent by the CPU core is not received.
As can be seen from the above description, in the case where the CPU core in the present specification does not store the physical address corresponding to the virtual address in the TLB, the CPU core sends an address probe request to the other CPU cores, and the other CPU cores find whether the physical address corresponding to the virtual address is stored in the respective TLB, and may add the found physical address to the address probe response to return. The CPU core may then store the mapping between the physical address and the virtual address in its TLB and perform memory accesses.
By adopting the technical scheme provided by the specification, TLB cache sharing among CPU cores can be realized, repeated process page table inquiry by the CPU cores in a scene of running across the CPU cores is greatly reduced, waste of CPU core processing resources is reduced, time consumption of address translation under the condition of TLB Miss is effectively shortened, and IO performance is improved.
On the other hand, the technical scheme provided by the specification can be realized based on the existing hardware and the cache detection protocol, does not need to add new hardware, and has low cost and high feasibility.
The specific implementation of the present specification is described in detail below based on the aforementioned two scenarios running across CPU cores, respectively.
1. Multithreading of the same process runs on different CPU cores
In this specification, multiple threads of the same process share memory space, and use the same ASID, i.e., all ASIDs to which the process to which it belongs is bound.
TABLE 1
Referring to the example of Table 1, assume that a process includes 4 threads, thread 1-thread 4, where thread 1 and thread 2 run in CPU core 8, thread 3 and thread 4 run in CPU core 12, and ASIDs used by threads 1-4 are ASID 7.
Assuming that the thread 1 performs a memory access, the virtual address of the access is 0x800000, the MMU of the cpu core 8 searches the TLB 8, the physical addresses corresponding to the virtual addresses 0x800000 and ASID 7 are not stored in the TLB 8, and further the corresponding physical addresses are searched based on the process page table of the memory, and the searched physical addresses and the mapping relationship between the virtual addresses and ASID are stored in the TLB 8. CPU core 8 may then perform a memory access based on the queried physical address.
ASID 7
Virtual address 0x800000
Physical address 0x2000
TABLE 2
Also, assuming that the physical address being queried is 0x2000, the TLB 8 may store the TLB entries shown in Table 2, above. It is noted that table 2 is merely an exemplary illustration, and in actual implementations, TLB entries may also include other fields for access rights (read or write), page type, etc.
If the thread 3 also needs to perform a memory access, the virtual address to be accessed is also 0x800000, the MMU of the CPU core 12 searches the TLB 12, and the physical addresses corresponding to the virtual addresses 0x800000 and ASID 7 are not stored in the TLB 12, in the related art, the CPU core 12 performs a physical address query based on the process page table of the memory. To avoid such repeated queries, using the solution provided in this specification, CPU core 12 may construct an address probe request to which virtual addresses 0x800000 and ASID 7 are added.
In one example, CPU core 12 may broadcast the address probe request to all CPU cores over a bus. Under the architecture of the multi-core CPU, the bus design can be realized by adopting an MESH network, and the delay is smaller.
In another example, referring to FIG. 3, CPU core 12 may send the address probe request to CPU core 8 where threads 1-2 belonging to the same process as thread 3 are located.
In this example, the CPU core 12 may first read the core identification 8 of the CPU core 8 from the specified register, and then add the core identification 8 as a parameter to the address probe request as well. Taking the Snoop protocol as an example, the address probe request is sent to a Snoop Agent, and the Snoop Agent may send the address probe request to the CPU core 8 according to the core identifier 8 carried in the address probe request.
Wherein the core identification 8 in the register is writable by the scheduler. The scheduler is aware of all threads under the same process, and the CPU cores that each thread runs, and the scheduler may write the core identification of the CPU core that each thread runs under the process in the specified registers of these CPU cores.
Still taking the case shown in table 1 as an example, the thread under the process runs in two CPU cores, i.e., CPU core 8 and CPU core 12, the scheduler may write core identification 8 into the specified registers of CPU core 12, and may write core identification 12 into the specified registers of CPU core 8. Of course, the current CPU core may not be excluded, and the core identifier 8 and the core identifier 12 may be written into the specified registers of the CPU core 8 and the CPU core 12, respectively. It is noted that in the example of table 1, the process runs in two CPU cores, and in other examples, 3 or more CPU cores may be run, which is not particularly limited in this specification.
In this specification, please continue to refer to fig. 3, after receiving the address probe message sent by the CPU core 12, the CPU core 8 searches the TLB 8 for the physical address 0x800000 and the physical address 0x2000 corresponding to the ASID 7, and then adds the physical address 0x2000, the virtual address 0x800000 and the ASID 7 to the address probe response and returns the address probe response to the CPU core 12, and the CPU core 12 may store the mapping relationship between the physical address 0x2000, the virtual address 0x800000 and the ASID 7 in the TLB 12, that is, also form the TLB table entry shown in table 2. CPU core 12 may also have memory access based on physical address 0x 2000.
In the present specification, the CPU core 8 may add only the physical address 0x2000 to the address probe response, which is not particularly limited in the present specification.
In the process of implementing physical address detection based on the Snoop protocol, forwarding of an address detection request and an address detection response is typically implemented by a Snoop Agent, for example, after receiving address detection responses returned by different CPU cores, the Snoop Agent filters out address detection responses that do not carry a physical address, and may perform deduplication on address detection responses that carry the same search result and that are returned by different CPU cores, for example, return an address detection response carrying the physical address to the CPU core that sends the address detection request.
As can be seen from the above description, in the scenario that multiple threads of the same process run on different CPU cores, when the CPU cores do not store physical addresses corresponding to virtual addresses in the TLB, an address probe request may be sent to other CPU cores, and the other CPU cores may search whether the physical addresses corresponding to virtual addresses are stored in the respective TLB, and may add the searched physical addresses to the address probe response and return the same. The CPU core may then store the mapping between the physical address and the virtual address in its TLB and perform memory accesses.
By adopting the technical scheme provided by the specification, TLB cache sharing among CPU cores can be realized, repeated process page table inquiry by the CPU cores under the scene that multithreading of the same process operates on different CPU cores is greatly reduced, waste of CPU core processing resources is reduced, time consumption of address translation under the condition of TLB Miss is effectively shortened, and IO performance is improved.
On the other hand, the technical scheme provided by the specification can be realized based on the existing hardware and the cache detection protocol, does not need to add new hardware, and has low cost and high feasibility.
2. Process migration
In the related art, TLB entries may have three states, valid, state, and Invalid, respectively.
Valid indicates that the corresponding TLB entry is Valid;
State indicates that the corresponding TLB entry is temporarily invalidated and can be reactivated to Valid state;
Invalid indicates that in the event that the memory corresponding to the TLB entry is released, such as a process destroying, the corresponding TLB entry is destroyed, and the destroyed TLB entry cannot be re-activated.
After generating the TLB table entry, the state of the TLB table entry is Valid. In the process of process switching, if the process is swapped out, the TLB entry corresponding to the ASID bound by the swapped out process is set to be in an invalid state. For example, the operating system sends a TLB invalidation instruction after the process is swapped out, where the TLB invalidation instruction specifies an ASID bound to the swapped out process, and based on the TLB invalidation instruction, the TLB entry corresponding to the specified ASID is set from the Valid state to the invalid state. After the process is switched back, when the memory access is performed, the MMU queries the TLB entry in the hit state, and then can set the state of the hit TLB entry from the invalid state to the Valid.
After the process is destroyed, the operating system may send a TLB destroy instruction (TLB Shootdown), where the TLB destroy instruction specifies an ASID bound to the destroyed process, and based on the TLB destroy instruction, a TLB entry corresponding to the specified ASID (including a TLB entry in a Valid state and a TLB entry in an Invalid state) may be completely destroyed, for example, the TLB entry may be deleted, so as to be in an Invalid state.
In this specification, the scheduler may perform process scheduling based on the load condition of each CPU core, for example, a certain process is run in a first CPU core, the process is scheduled to run in a second CPU core, and so on. Where scheduling a process generally refers to scheduling all threads under the process.
In the related art, under the condition of process scheduling, an operating system sends a TLB destroy instruction to a first CPU core, so as to thoroughly destroy a TLB entry corresponding to an ASID bound by the process in a TLB of the first CPU core. After the process is scheduled to the second CPU core, the second CPU core still needs address translation based on the process page table of the memory when performing the memory access.
In order to avoid repeated inquiry, the technical scheme provided by the specification is adopted, under the condition of process scheduling, on one hand, the operating system sends a TLB invalidation instruction to replace a TLB destroying instruction so as to avoid thoroughly destroying relevant TLB entries. On the other hand, the second CPU core may construct an address probe request under the TLB Miss condition, requesting the other CPU cores to assist in the physical address lookup.
Referring to fig. 4, in a process scheduling scenario, the memory access method provided in the present disclosure may include the following steps:
In step 402, the first CPU core receives a TLB invalidation instruction, and sets a TLB entry bound by the calling process in the first TLB to an invalidated state.
In this embodiment, after a process is scheduled out of the first CPU core, unlike the related art, the operating system does not send a TLB destroy instruction to the first CPU core, but sends a TLB invalidate instruction to the first CPU core, where an ASID to which the scheduled out process is bound is specified.
In response to the TLB invalidation instruction, the first CPU core sets a TLB entry in the first TLB (i.e., the TLB of the first CPU core) corresponding to the ASID from a Valid state to an invalid state.
In other words, by adopting the technical scheme provided by the specification, during process scheduling, the TLB entries in the CPU core where the process is originally located are not thoroughly destroyed, but are put into a temporary invalid state.
In step 404, the second CPU core, in response to the memory access request of the called process, searches the second TLB for the physical address corresponding to the virtual address.
In this embodiment, when the process or the thread under the process scheduled to run by the second CPU core performs the memory access, a memory access request is initiated to the second CPU core. The second CPU core then first looks up the ASID and the physical address corresponding to the virtual address in the second TLB (i.e., the TLB of the second CPU core). If a corresponding physical address (TLB Hit) is found, memory access may be performed based on the physical address. If the corresponding physical address (TLB Miss) is not found, step 406 may be performed as follows.
In step 406, the second CPU core sends an address probe request to the first CPU core if the physical address is not found.
Based on the query result of the foregoing step 404, the second CPU core may construct an address probe request without finding the physical address, and add the virtual address to be accessed and the ASID to which the process is bound to the address probe request.
The second CPU core sends the address probe request. For example, the address probe request may be broadcast and sent, or the address probe request may be sent to the CPU core where the process is located before being scheduled, i.e., the first CPU core.
In this embodiment, the construction and transmission of the address detection request may refer to the specific implementation process of the foregoing embodiment, which is not described herein in detail.
It should be noted that, in the case where the second CPU core sends the address probe request to the first CPU core based on the first core identifier of the first CPU core in the register, the first core identifier in the specified register may be written by the scheduler after the process is scheduled.
In step 408, the first CPU core responds to the address detection request and searches the first TLB for the physical address corresponding to the virtual address.
In this embodiment, the first CPU core searches, in response to the address probe request, TLB entries in a valid state and an invalid state in the first TLB, so as to perform a query of a physical address.
If the query hits the TLB entry in the valid state, it may be stated that different threads with a high probability of being the same process run in different CPU cores, that is, the thread initiating the memory access request in the second CPU core and some threads running in the first CPU core belong to the same process.
If the TLB entry in the invalid state is searched, a scenario that the process is scheduled and migrated with high probability can be described, namely, the process is originally operated in the first CPU core and then is migrated to the second CPU core by the scheduler.
In other words, for the CPU core that receives the address probe request, when querying the TLB entry, it is required to query both the TLB entry in the valid state and the TLB entry in the invalid state, and after the query hits, the physical address that is found can be returned, and the CPU core does not need to pay attention to a specific application scenario.
In step 410, the first CPU core adds the found physical address to the address probe response and returns the address probe response to the second CPU core.
In step 412, after receiving the address probe response, the second CPU core stores the mapping relationship between the physical address and the virtual address in the second TLB, and performs a memory access based on the physical address.
In this embodiment, the implementation of steps 410-412 may be described with reference to the previous embodiments.
In this embodiment, if the second CPU core does not receive the address probe response carrying the physical address, for example, the second CPU core does not receive the address probe response carrying the physical address within a preset duration, the second CPU core may perform the query of the physical address based on the process page table of the memory.
As can be seen from the above description, in the scenario of process migration, when the CPU core does not store the physical address corresponding to the virtual address in the TLB, the address probe request may be sent to other CPU cores, and the other CPU cores may search whether the physical address corresponding to the virtual address is stored in the respective TLB, and may add the searched physical address to the address probe response and return the same. The CPU core may then store the mapping between the physical address and the virtual address in its TLB and perform memory accesses.
By adopting the technical scheme provided by the specification, TLB cache sharing among CPU cores can be realized, repeated process page table inquiry by the CPU cores in a process migration scene is greatly reduced, waste of CPU core processing resources is reduced, address translation time consumption under the condition of TLB Miss is effectively shortened, and IO performance is improved.
On the other hand, the technical scheme provided by the specification can be realized based on the existing hardware and the cache detection protocol, does not need to add new hardware, and has low cost and high feasibility.
Corresponding to the foregoing embodiments of the memory access method, the present disclosure further provides embodiments of the memory access device.
Embodiments of the memory access device of the present specification may be applied in a CPU core of a computer system, the CPU core including a TLB for caching mappings between virtual addresses and physical addresses. Referring to fig. 5, the memory access device 500 includes an address searching unit 501, an address detecting unit 502, a memory access unit 503 and a status marking unit 504.
The address searching unit 501 responds to a memory access request and searches a physical address corresponding to a virtual address carried by the memory access request in the TLB;
An address detection unit 502, when no physical address corresponding to the virtual address is found, sends an address detection request, where the address detection request carries the virtual address, so that a CPU core that receives the address detection request searches a TLB of the physical address corresponding to the virtual address, and returns an address detection response when the corresponding physical address is found, where the address detection response carries the found physical address;
The memory access unit 503 responds to the received address detection response, stores the mapping relationship between the physical address and the virtual address carried in the address detection response into the TLB, and performs memory access based on the physical address.
Optionally, the address detection unit 502 reads a target core identifier of a target CPU core from a register, and sends an address detection request to the target CPU core based on the target core identifier.
Optionally, the address detection unit 502 broadcasts and sends an address detection request.
Optionally, the target CPU core is a CPU core where other threads in the process to which the thread that initiates the memory access request belongs are located.
Optionally, the target CPU core is a CPU core where a thread that initiates the memory access request is located before being scheduled to the present CPU core.
Optionally, the target core identification is written by a scheduler.
Optionally, the method further comprises:
a state marking unit 504, configured to receive a TLB invalidation instruction, where the TLB invalidation instruction is sent after a thread that initiates the memory access request is scheduled to another CPU core;
In response to the TLB invalidation instruction, marking a mapping relationship between a virtual address and a physical address specified by the TLB invalidation instruction in the TLB as an invalidation state;
The method further comprises the steps of:
The address searching unit 501 searches the TLB for a physical address corresponding to the virtual address in the valid state and the invalid state after receiving the address detection request sent by the other CPU core.
Optionally, the address lookup unit 501, in a case that the address probe response is not received, looks up a physical address corresponding to the virtual address based on a process page table of the memory.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In correspondence with the foregoing embodiment of the memory access method, the present disclosure further provides a CPU core, where the CPU core includes a bypass translation cache TLB, where the TLB is configured to cache a mapping relationship between a virtual address and a physical address, and the CPU core is configured to:
Responding to a memory access request, and searching a physical address corresponding to a virtual address carried by the memory access request in a TLB (TLB);
Under the condition that the physical address corresponding to the virtual address is not found, an address detection request is sent, wherein the address detection request carries the virtual address, so that a CPU core receiving the address detection request searches a physical address corresponding to the virtual address in a TLB (TLB), and under the condition that the corresponding physical address is found, an address detection response is returned, and the address detection response carries the found physical address;
And responding to the received address detection response, storing the mapping relation between the physical address and the virtual address carried in the address detection response into a TLB, and performing memory access based on the physical address.
Optionally, the sending an address probe request includes:
The broadcast transmission address probe request.
Optionally, the sending an address probe request includes:
reading a target core identifier of a target CPU core from a register;
And sending an address detection request to the target CPU core based on the target core identifier.
Optionally, the target CPU core is a CPU core where other threads in the process to which the thread that initiates the memory access request belongs are located.
Optionally, the target CPU core is a CPU core where a thread that initiates the memory access request is located before being scheduled to the present CPU core.
Optionally, the target core identification is written by a scheduler.
Optionally, the method further comprises:
Receiving a TLB (TLB) invalidation instruction, wherein the TLB invalidation instruction is sent after a thread initiating the memory access request is scheduled to other CPU cores;
In response to the TLB invalidation instruction, marking a mapping relationship between a virtual address and a physical address specified by the TLB invalidation instruction in the TLB as an invalidation state;
The CPU core is further configured to:
After receiving address detection requests sent by other CPU cores, searching physical addresses corresponding to virtual addresses in a valid state and an invalid state in the TLB.
Optionally, the method further comprises:
and under the condition that the address detection response is not received, the process page table based on the memory searches the physical address corresponding to the virtual address.
Corresponding to the foregoing embodiments of the memory access method, the present disclosure further provides a computer readable storage medium having a computer program stored thereon, the program when executed by the CPU core implementing the steps of:
Responding to a memory access request, and searching a physical address corresponding to a virtual address carried by the memory access request in a TLB (TLB);
Under the condition that the physical address corresponding to the virtual address is not found, an address detection request is sent, wherein the address detection request carries the virtual address, so that a CPU core receiving the address detection request searches a physical address corresponding to the virtual address in a TLB (TLB), and under the condition that the corresponding physical address is found, an address detection response is returned, and the address detection response carries the found physical address;
And responding to the received address detection response, storing the mapping relation between the physical address and the virtual address carried in the address detection response into a TLB, and performing memory access based on the physical address.
Optionally, the sending an address probe request includes:
The broadcast transmission address probe request.
Optionally, the sending an address probe request includes:
reading a target core identifier of a target CPU core from a register;
And sending an address detection request to the target CPU core based on the target core identifier.
Optionally, the target CPU core is a CPU core where other threads in the process to which the thread that initiates the memory access request belongs are located.
Optionally, the target CPU core is a CPU core where a thread that initiates the memory access request is located before being scheduled to the present CPU core.
Optionally, the target core identification is written by a scheduler.
Optionally, the method further comprises:
Receiving a TLB (TLB) invalidation instruction, wherein the TLB invalidation instruction is sent after a thread initiating the memory access request is scheduled to other CPU cores;
In response to the TLB invalidation instruction, marking a mapping relationship between a virtual address and a physical address specified by the TLB invalidation instruction in the TLB as an invalidation state;
The method further comprises the steps of:
After receiving address detection requests sent by other CPU cores, searching physical addresses corresponding to virtual addresses in a valid state and an invalid state in the TLB.
Optionally, the method further comprises:
and under the condition that the address detection response is not received, the process page table based on the memory searches the physical address corresponding to the virtual address.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing description of the preferred embodiments is provided for the purpose of illustration only, and is not intended to limit the scope of the disclosure, since any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the disclosure are intended to be included within the scope of the disclosure.

Claims (7)

1.一种内存访问方法,用于计算机系统的内存访问,所述计算机系统包括中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,所述方法应用于CPU核心,包括:1. A memory access method for memory access of a computer system, wherein the computer system comprises a central processing unit (CPU), the CPU comprises a plurality of CPU cores, the CPU core comprises a translation bypass cache (TLB), the TLB is used to cache a mapping relationship between a virtual address and a physical address, and the method is applied to the CPU core, comprising: 响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;In response to a memory access request, searching the TLB for a physical address corresponding to a virtual address carried in the memory access request; 在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;In the case where the physical address corresponding to the virtual address is not found, an address detection request is sent, the address detection request carries the virtual address, so that the CPU core that receives the address detection request searches for the physical address corresponding to the virtual address in its TLB, and in the case where the corresponding physical address is found, an address detection response is returned, the address detection response carries the found physical address; 响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问;In response to the received address detection response, store the mapping relationship between the physical address and the virtual address carried in the address detection response in the TLB, and perform memory access based on the physical address; 所述发送地址探测请求,包括:The sending of the address detection request comprises: 从寄存器中读取目标CPU核心的目标核心标识;Read the target core ID of the target CPU core from the register; 基于所述目标核心标识发送地址探测请求至所述目标CPU核心;Sending an address detection request to the target CPU core based on the target core identifier; 其中,所述目标CPU核心是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心,或者所述目标CPU核心是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。The target CPU core is the CPU core where other threads in the process to which the thread initiating the memory access request belongs are located, or the target CPU core is the CPU core where the thread initiating the memory access request was located before being scheduled to the current CPU core. 2.根据权利要求1所述的方法,所述发送地址探测请求,包括:2. The method according to claim 1, wherein sending an address detection request comprises: 广播发送地址探测请求。Broadcast address detection request. 3.根据权利要求1所述的方法,所述目标核心标识由调度器写入。3 . The method according to claim 1 , wherein the target core identifier is written by a scheduler. 4.根据权利要求1所述的方法,还包括:4. The method according to claim 1, further comprising: 接收TLB无效指令,所述TLB无效指令是在发起所述内存访问请求的线程被调度至其他CPU核心后发送;receiving a TLB invalidation instruction, wherein the TLB invalidation instruction is sent after the thread initiating the memory access request is scheduled to another CPU core; 响应于所述TLB无效指令,将TLB中所述TLB无效指令指定的虚拟地址与物理地址之间的映射关系标记为无效状态;In response to the TLB invalidation instruction, marking a mapping relationship between a virtual address and a physical address specified by the TLB invalidation instruction in the TLB as an invalid state; 所述方法还包括:The method further comprises: 在接收到其他CPU核心发送的地址探测请求后,在TLB中查找处于有效状态和无效状态下的虚拟地址对应的物理地址。After receiving the address detection request sent by other CPU cores, the physical address corresponding to the virtual address in the valid state and the invalid state is searched in the TLB. 5.根据权利要求1所述的方法,还包括:5. The method according to claim 1, further comprising: 在未接收到所述地址探测响应的情况下,基于内存的进程页表查找所述虚拟地址对应的物理地址。In the case where the address detection response is not received, the physical address corresponding to the virtual address is searched based on the process page table of the memory. 6.一种内存访问装置,用于计算机系统的内存访问,所述计算机系统包括中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,所述装置应用于CPU核心,包括:6. A memory access device for memory access of a computer system, wherein the computer system comprises a central processing unit (CPU), the CPU comprises a plurality of CPU cores, the CPU core comprises a translation bypass buffer (TLB), the TLB is used to cache a mapping relationship between a virtual address and a physical address, and the device is applied to the CPU core, comprising: 地址查找单元,响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;An address search unit, in response to a memory access request, searches the TLB for a physical address corresponding to a virtual address carried in the memory access request; 地址探测单元,在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;The address detection unit sends an address detection request when the physical address corresponding to the virtual address is not found, and the address detection request carries the virtual address, so that the CPU core that receives the address detection request searches for the physical address corresponding to the virtual address in its TLB, and returns an address detection response when the corresponding physical address is found, and the address detection response carries the found physical address; 内存访问单元,响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问;A memory access unit, in response to a received address detection response, stores a mapping relationship between the physical address and the virtual address carried in the address detection response into the TLB, and performs memory access based on the physical address; 所述地址探测单元发送地址探测请求包括:从寄存器中读取目标CPU核心的目标核心标识;基于所述目标核心标识发送地址探测请求至所述目标CPU核心;The address detection unit sending the address detection request includes: reading a target core identifier of a target CPU core from a register; and sending the address detection request to the target CPU core based on the target core identifier; 其中,所述目标CPU核心是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心,或者所述目标CPU核心是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。The target CPU core is the CPU core where other threads in the process to which the thread initiating the memory access request belongs are located, or the target CPU core is the CPU core where the thread initiating the memory access request was located before being scheduled to the current CPU core. 7.一种中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,所述CPU核心被配置为:7. A central processing unit (CPU), the CPU comprising a plurality of CPU cores, the CPU core comprising a translation bypass buffer (TLB), the TLB being used to cache a mapping relationship between a virtual address and a physical address, the CPU core being configured as follows: 响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;In response to a memory access request, searching the TLB for a physical address corresponding to a virtual address carried in the memory access request; 在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;In the case where the physical address corresponding to the virtual address is not found, an address detection request is sent, the address detection request carries the virtual address, so that the CPU core that receives the address detection request searches for the physical address corresponding to the virtual address in its TLB, and in the case where the corresponding physical address is found, an address detection response is returned, the address detection response carries the found physical address; 响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问;In response to the received address detection response, store the mapping relationship between the physical address and the virtual address carried in the address detection response in the TLB, and perform memory access based on the physical address; 所述发送地址探测请求包括:从寄存器中读取目标CPU核心的目标核心标识;基于所述目标核心标识发送地址探测请求至所述目标CPU核心;The sending of the address detection request comprises: reading a target core identifier of a target CPU core from a register; and sending the address detection request to the target CPU core based on the target core identifier; 其中,所述目标CPU核心是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心,或者所述目标CPU核心是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。The target CPU core is the CPU core where other threads in the process to which the thread initiating the memory access request belongs are located, or the target CPU core is the CPU core where the thread initiating the memory access request was located before being scheduled to the current CPU core.
CN202210200142.7A 2022-03-02 2022-03-02 Memory access method and device Active CN114840445B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210200142.7A CN114840445B (en) 2022-03-02 2022-03-02 Memory access method and device
PCT/CN2023/075635 WO2023165317A1 (en) 2022-03-02 2023-02-13 Memory access method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210200142.7A CN114840445B (en) 2022-03-02 2022-03-02 Memory access method and device

Publications (2)

Publication Number Publication Date
CN114840445A CN114840445A (en) 2022-08-02
CN114840445B true CN114840445B (en) 2025-01-28

Family

ID=82561573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210200142.7A Active CN114840445B (en) 2022-03-02 2022-03-02 Memory access method and device

Country Status (2)

Country Link
CN (1) CN114840445B (en)
WO (1) WO2023165317A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840445B (en) * 2022-03-02 2025-01-28 阿里巴巴(中国)有限公司 Memory access method and device
CN116049051A (en) * 2023-02-22 2023-05-02 联想(北京)有限公司 Data transmission method and electronic equipment
CN116644006B (en) * 2023-07-27 2023-11-03 浪潮电子信息产业股份有限公司 A memory page management method, system, device, equipment and computer medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017190266A1 (en) * 2016-05-03 2017-11-09 华为技术有限公司 Method for managing translation lookaside buffer and multi-core processor

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6105113A (en) * 1997-08-21 2000-08-15 Silicon Graphics, Inc. System and method for maintaining translation look-aside buffer (TLB) consistency
US7188229B2 (en) * 2004-01-17 2007-03-06 Sun Microsystems, Inc. Method and apparatus for memory management in a multi-processor computer system
CN102662726B (en) * 2012-04-01 2015-07-29 龙芯中科技术有限公司 The analogy method of virtual machine and computer equipment
WO2013186694A2 (en) * 2012-06-11 2013-12-19 Stefanos Kaxiras System and method for data classification and efficient virtual cache coherence without reverse translation
US10922137B2 (en) * 2016-04-27 2021-02-16 Hewlett Packard Enterprise Development Lp Dynamic thread mapping
US10296465B2 (en) * 2016-11-29 2019-05-21 Board Of Regents, The University Of Texas System Processor using a level 3 translation lookaside buffer implemented in off-chip or die-stacked dynamic random-access memory
US11030117B2 (en) * 2017-07-14 2021-06-08 Advanced Micro Devices, Inc. Protecting host memory from access by untrusted accelerators
US11392508B2 (en) * 2017-11-29 2022-07-19 Advanced Micro Devices, Inc. Lightweight address translation for page migration and duplication
US11270201B2 (en) * 2017-12-29 2022-03-08 Intel Corporation Communication optimizations for distributed machine learning
CN112965921B (en) * 2021-02-07 2024-04-02 中国人民解放军军事科学院国防科技创新研究院 TLB management method and system in multi-task GPU
CN114064524A (en) * 2021-11-22 2022-02-18 浪潮商用机器有限公司 Server, method and device for improving performance of server and medium
CN114840445B (en) * 2022-03-02 2025-01-28 阿里巴巴(中国)有限公司 Memory access method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017190266A1 (en) * 2016-05-03 2017-11-09 华为技术有限公司 Method for managing translation lookaside buffer and multi-core processor

Also Published As

Publication number Publication date
CN114840445A (en) 2022-08-02
WO2023165317A1 (en) 2023-09-07

Similar Documents

Publication Publication Date Title
CN114840445B (en) Memory access method and device
TWI531912B (en) Processor having translation lookaside buffer for multiple context comnpute engine, system and method for enabling threads to access a resource in a processor
US6647466B2 (en) Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
US8074026B2 (en) Scatter-gather intelligent memory architecture for unstructured streaming data on multiprocessor systems
US20200057729A1 (en) Memory access method and computer system
US7363462B2 (en) Performing virtual to global address translation in processing subsystem
KR102448124B1 (en) Cache accessed using virtual addresses
US8285969B2 (en) Reducing broadcasts in multiprocessors
US7290116B1 (en) Level 2 cache index hashing to avoid hot spots
US7765381B2 (en) Multi-node system in which home memory subsystem stores global to local address translation information for replicating nodes
TWI245969B (en) Access request for a data processing system having no system memory
US8417915B2 (en) Alias management within a virtually indexed and physically tagged cache memory
US20130080709A1 (en) System and Method for Performing Memory Operations In A Computing System
US20250199963A1 (en) Memory access method and input/output memory management unit
CN115168248B (en) Cache memory supporting SIMT architecture and corresponding processor
US20030115402A1 (en) Multiprocessor system
CN115048142A (en) Cache access command processing system, method, device, equipment and storage medium
US7360056B2 (en) Multi-node system in which global address generated by processing subsystem includes global to local translation information
CN115934367A (en) Buffer processing method, snoop filter, multiprocessor system, and storage medium
US20100332763A1 (en) Apparatus, system, and method for cache coherency elimination
JP5976225B2 (en) System cache with sticky removal engine
CN111273860B (en) Distributed memory management method based on network and page granularity management
US11741017B2 (en) Power aware translation lookaside buffer invalidation optimization
US20210397560A1 (en) Cache stashing system
US20240045805A1 (en) Core-aware caching systems and methods for multicore processors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant