KR20250060052A

KR20250060052A - Method and apparatus for mapping memory for beam search

Info

Publication number: KR20250060052A
Application number: KR1020230180607A
Authority: KR
Inventors: 한정호
Original assignee: 리벨리온 주식회사
Priority date: 2023-10-25
Filing date: 2023-12-13
Publication date: 2025-05-07
Anticipated expiration: 2043-12-13
Also published as: KR102909802B1

Abstract

빔 서치를 위한 메모리 매핑 방법 및 장치가 제공된다. 본 발명의 일 실시예에 따른 메모리 매핑방법은, 가상 주소를 사용하여 물리적 메모리에 접근하기 위해 가상 주소에서 물리적 주소로의 매핑을 사용하는 컴퓨팅 시스템에서의 빔 서치를 위한 메모리 매핑방법으로서, 가상 주소 상에서 정의된 제1 어레이를 제2 어레이로 복사할 때, 제2 어레이에 매핑된 물리적 메모리 중 복사할 데이터를 저장하게 될 부분에 대한 매핑 정보를 제1 어레이에 매핑된 물리적 메모리 영역을 가리키도록 변경하는 단계를 구비한다.A memory mapping method and device for beam search are provided. A memory mapping method according to one embodiment of the present invention is a memory mapping method for beam search in a computing system that uses mapping from a virtual address to a physical address to access a physical memory using a virtual address, the method comprising: when copying a first array defined on a virtual address to a second array, changing mapping information for a portion of the physical memory mapped to the second array that is to store data to be copied to point to a physical memory area mapped to the first array.

Description

{Method and apparatus for mapping memory for beam search}

본 발명은 빔 서치를 위한 메모리 매핑 방법 및 장치에 관한 것이다.The present invention relates to a memory mapping method and device for beam search.

최근 자연어 처리를 위한 NLP(Natural Language Processing)의 추론 성능이 높아지면서 다양한 분야에서 NLP가 적용되고 있다. Open-AI의 GPT-3의 경우, chat GPT로 서비스가 되고 있는데 사람이 입력한 질문 또는 지시에 대해 전문 지식을 가진 사람이 대답하는 것과 같이 자연스러운 표현으로 문장을 생성해줌으로써 많은 관심을 받고 있으며 다양한 분야에 응용이 이루어지고 있다.Recently, as the inference performance of NLP (Natural Language Processing) for natural language processing has improved, NLP is being applied in various fields. In the case of Open-AI's GPT-3, it is being serviced as a chat GPT, and it is receiving a lot of attention because it generates sentences with natural expressions as if a person with expert knowledge is answering questions or instructions entered by a person, and it is being applied in various fields.

대표적인 NLP 알고리즘인 GPT에서는, 사용자가 “It is nice”라는 문장을 입력하면, “It is nice”라는 문장으로부터 다음 단어를 생성하기 위해 각 단어의 점수 값을 계산하기 위한 사전 연산을 진행한다. 사전 연산을 통해 얻어진 정보를 단어사전(어휘 세트)에 등록되어 있는 각 단어들에 적용하여 각각의 단어에 대해 점수 값을 계산한다. 계산된 점수 값 중에서 가장 높은 점수 값을 가지는 단어를 선택해서 출력한다.In GPT, a representative NLP algorithm, when a user inputs a sentence such as “It is nice,” a dictionary operation is performed to calculate the score value of each word in order to generate the next word from the sentence “It is nice.” The information obtained through the dictionary operation is applied to each word registered in the word dictionary (vocabulary set) to calculate the score value for each word. The word with the highest score value among the calculated score values is selected and output.

예를 들면, “It is nice”라는 문장으로부터 모든 단어에 대해 점수 값을 적용했을 때 “to”라는 단어가 가장 높은 점수 값을 가지게 되어 “to”를 출력하게 된다. “to”다음에 올 수 있는 단어를 추측하기 위해 현재 출력된 “to”라는 단어를 토큰 임베딩(token embedding) 한 후 동일한 과정을 반복하여 다음 단어인 “meet”를 출력하게 되고, 동일한 과정을 반복해서 “meet”단어로부터 “you”를 생성하게 된다. 이 과정은 출력 토큰으로 문장의 끝을 나타내는 토큰이 선택될 까지 반복되게 된다.For example, when applying the score values to all words in the sentence “It is nice”, the word “to” has the highest score value, so “to” is output. In order to guess the word that can come after “to”, the currently output word “to” is token embedding, and the same process is repeated to output the next word “meet”, and the same process is repeated to generate “you” from the word “meet”. This process is repeated until the token indicating the end of the sentence is selected as the output token.

이렇게 출력 문장을 반복해서 한 단어씩 생성하는 과정에서는 현재 단어보다 다음 위치에 오게 되는 단어에 의한 영향을 고려하지 않게 되기 때문에, 전체 문장을 생성하고 난 다음에는 모든 경우의 수를 고려해서 문장을 생성했을때에 비해 성능이 저하되는 문제가 발생한다. 개별 단어 선택 시에 가장 높은 점수 값을 가지는 단어를 선택하더라도 이 단어 선택으로 가지게 되는 점수 값이 글로벌 맥시멈(global maximum) 값이 아니라 로컬 맥시멈(local maximum)일 수 있기 때문이다.In this process of generating output sentences one word at a time by repeatedly doing so, the influence of the word that comes next after the current word is not considered, so after generating the entire sentence, the problem of performance degradation occurs compared to when generating sentences by considering all cases. This is because even if the word with the highest score is selected when selecting an individual word, the score value obtained by selecting this word may not be the global maximum but the local maximum.

그러나 모든 가능한 조합을 다 따져서 문장을 생성하는 것은 연산량이 비현실적으로 증가한다. 예를 들면 변수 값이 공개된 GPT-2를 보면 50,257개의 단어가 어휘 세트에 등록되어 있다. 따라서 한 단어를 출력하기 위해서 50,257개의 단어 각각에 대해서 점수 값을 계산한 후 가장 높은 점수 값을 가지는 단어를 출력 값으로 선택하는 과정을 거쳐야 한다.However, generating sentences by considering all possible combinations increases the computational amount unrealistically. For example, if you look at GPT-2, whose variable values are public, 50,257 words are registered in the vocabulary set. Therefore, in order to output a word, you have to calculate the score value for each of the 50,257 words and then go through the process of selecting the word with the highest score value as the output value.

이러한 이유로, 개별 단어 생성시마다 출력 단어를 한 개만 생성하지 않고, 가장 높은 점수 값을 가지는 B개의 단어들을 출력 후보군으로 유지하는 빔 서치(beam search) 방식이 적용되고 있다. For this reason, instead of generating only one output word for each individual word generation, a beam search method is applied that maintains the B words with the highest score values as output candidates.

빔 서치를 적용하면 개별 단어의 위치에 대해 한개의 단어만을 출력하는 것이 아니라, 어휘 세트에 속한 단어들 중에서 가장 높은 점수를 가지는 k 개의 단어를 출력하고 k개의 서로 다른 경로를 가지는 문장에 대해 각각 다음 단어 위치에서 k개의 단어를 생성한 뒤 최종 누적 인식 성능(점수) 값이 가장 높은 k 개의 문장을 출력 후보군으로 유지하게 된다. When beam search is applied, instead of outputting only one word for each word position, it outputs k words with the highest scores among the words in the vocabulary set, generates k words at the next word positions for each sentence with k different paths, and then retains k sentences with the highest final cumulative recognition performance (score) value as output candidates.

즉, 빔 폭(beam width)이 B인 빔 서치를 적용하게 되면, 매 단어 생성시마다 이전에 선택된 B개의 후보 문장들 각각으로부터 현재 위치에서 선택될 수 있는 B개의 가장 높은 점수 값을 가지는 단어들을 후보군으로 선택하고, 전체 B²개의 후보들 중에서 가장 높은 B개의 문장을 선택해서 출력 문장으로 선택하게 된다. 이처럼 저장해야 하는 데이터의 양이 빔 폭의 제곱에 비례해서 늘어난다.That is, when applying beam search with beam width B, at each word generation, the words with the B highest score values that can be selected at the current position are selected as candidates from each of the B previously selected candidate sentences, and the B sentences with the highest score among the total B ^two candidates are selected as the output sentence. In this way, the amount of data that needs to be stored increases in proportion to the square of the beam width.

종래에는 빔 서치를 적용할 때 선택된 개별 문장들에 필요한 특징 데이터를 각각 메모리에 저장해두는 방식을 적용하고 있다. 이 경우 소프트웨어 구현이 용이한 장점이 있지만, 생성된 여러 후보 문장들 사이에 동일한 단어의 조합으로 이루어진 부분이 존재하더라도 해당 위치에 해당되는 특징 데이터를 공유하지 않고 메모리의 독립적인 위치에 각각 값을 저장하기 때문에 메모리 읽기/쓰기(read/write)를 위한 메모리 버스 대역폭 사용량이 높은 문제가 발생한다. In the past, when applying beam search, the feature data required for each selected individual sentence was stored in memory. This case has the advantage of easy software implementation, but even if there is a part consisting of the same combination of words among several generated candidate sentences, the feature data corresponding to that location is not shared, and each value is stored in an independent location in the memory, so there is a problem of high memory bus bandwidth usage for memory read/write.

이러한 문제를 개선하기 위해서 링크드 리스트(linked list) 형태의 데이터 구조를 사용하고 각 단어마다 해당 단어로 인해 추가되는 특징 데이터 들을 관리하는 방법을 사용할 수도 있다. 그러나 이러한 방식으로 구현할 경우, 여러 프로세서를 이용해서 병렬 처리를 구현하는 소프트웨어의 구현이 복잡해지며 범용 DMA(Direct Memory Access)에서 지원하는 데이터 구조 형태와 다른 형태로 저장되어 있기 때문에 범용 DMA를 이용해서 외부 메모리로부터 가져오는 과정이 복잡해지거나 구현상 제한을 받게 되는 문제가 발생하게 된다.To improve these problems, a linked list type data structure can be used and a method can be used to manage the feature data added to each word due to the word. However, if implemented in this way, the implementation of software that implements parallel processing using multiple processors becomes complicated, and since the data structure is stored in a different form from the one supported by general-purpose DMA (Direct Memory Access), the process of retrieving it from external memory using general-purpose DMA becomes complicated or there are implementation restrictions.

가상주소를 사용하는 컴퓨팅 시스템에서는 메모리를 일정한 크기의 블록(페이지)으로 나누고 이 블록 단위로 가상주소(virtual address) 영역과 물리적 주소(physical address) 영역을 매핑하게 된다. 가상주소를 페이지 단위로 나누었을 때의 페이지 번호를 VPN (virtual page number), 물리적 메모리를 페이지 단위로 나누었을 때의 페이지 번호를 PPN (physical page number)이라 한다. In a computing system that uses virtual addresses, memory is divided into blocks (pages) of a certain size, and the virtual address area and the physical address area are mapped for each block. When a virtual address is divided into pages, the page number is called a VPN (virtual page number), and when physical memory is divided into pages, the page number is called a PPN (physical page number).

가상주소 공간 상에서 빔 폭 만큼의 메모리 공간을 사용할 때, 각각의 메모리 영역을 실제 메모리에 매핑하여야 한다. 컴퓨팅 시스템에서 가상주소 공간에 대한 어드레스 값으로 메모리에 읽기/쓰기(read/write) 동작을 시도하는 경우에, 주어진 가상주소가 VPN에서 PPN으로의 매핑 테이블(VPN to PPN mapping table)을 보고 물리적 메모리에서 해당되는 페이지의 시작 어드레스로 변환한 후에, 페이지 내에서의 어드레스 오프셋을 이용해서 실제 메모리에 접근해야 하는 주소를 생성하고 필요한 작업을 수행할 수 있다. When using a memory space as large as the beam width in the virtual address space, each memory area must be mapped to the actual memory. When a computing system attempts to perform a read/write operation on memory using an address value for the virtual address space, the given virtual address is converted to the start address of the corresponding page in the physical memory by referring to the VPN to PPN mapping table, and then the address that needs to access the actual memory is generated using the address offset within the page and the necessary operation can be performed.

종래 기술에서는 여러 개의 빔들이 특징 데이터(feature data) 중 공통된 값을 가지는 부분들을 가지고 있어도 실제 물리 메모리의 다른 공간에 같은 값들을 복사해서 가지고 있는 형태로 운영된다. 따라서 메모리 공간 및 메모리 버스 대역폭을 비효율적으로 사용하게 된다.In the conventional technology, even if multiple beams have parts of feature data that have common values, they are operated in a form where the same values are copied and stored in different spaces of the actual physical memory. Therefore, the memory space and memory bus bandwidth are used inefficiently.

본 발명은 가상주소 공간을 사용하는 컴퓨팅 시스템에서 빔 서치를 적용한 NLP를 처리할 때 복수의 후보 문장들 사이에서 동일한 데이터를 저장하고 있는 페이지들에 대해 동일한 데이터를 여러 번 복사하는 비효율성을 제거할 수 있는 빔 서치를 위한 메모리 매핑 방법 및 장치를 제공하는 것을 목적으로 한다.The present invention aims to provide a memory mapping method and device for beam search, which can eliminate the inefficiency of copying the same data multiple times for pages storing the same data among a plurality of candidate sentences when processing NLP using beam search in a computing system using a virtual address space.

본 발명은 빔 서치 적용과정에서 특징 데이터를 외부 메모리에 저장할 때 외부메모리의 대역폭 제한으로 인해 NLP의 처리 성능 저하를 최소화할 수 있는 빔 서치를 위한 메모리 매핑 방법 및 장치를 제공하는 것을 목적으로 한다.The purpose of the present invention is to provide a memory mapping method and device for beam search that can minimize degradation of NLP processing performance due to bandwidth limitations of external memory when storing feature data in an external memory during a beam search application process.

본 발명의 일 실시예에 따른 메모리 매핑방법은, 가상 주소를 사용하여 물리적 메모리에 접근하기 위해 가상 주소에서 물리적 주소로의 매핑을 사용하는 컴퓨팅 시스템에서의 빔 서치를 위한 메모리 매핑방법으로서, 가상 주소 상에서 정의된 제1 어레이를 제2 어레이로 복사할 때, 제2 어레이에 매핑된 물리적 메모리 중 복사할 데이터를 저장하게 될 부분에 대한 매핑 정보를 제1 어레이에 매핑된 물리적 메모리 영역을 가리키도록 변경하는 단계를 구비한다.A memory mapping method according to one embodiment of the present invention is a memory mapping method for beam searching in a computing system that uses mapping from a virtual address to a physical address to access a physical memory using a virtual address, the method comprising: when copying a first array defined on a virtual address to a second array, changing mapping information for a portion of the physical memory mapped to the second array that is to store data to be copied to point to a physical memory area mapped to the first array.

가상 주소에서 물리적 메모리로의 맵핑이 페이지 단위로 이루어지고 제1 어레이의 시작 주소 값이 페이지 크기의 배수로 제한된 경우에, 본 발명의 일 실시예에 따른 메모리 매핑방법은, 제1 어레이에 매핑된 페이지들 중 동일한 값을 가지고 있는 페이지에 대해서는 VPN에서 PPN으로의 매핑 정보(VPN to PPN mapping information)만을 갱신하고, 한 페이지 내에 일부 데이터라도 다른 부분이 존재하는 경우에는 제1 어레이에 매핑된 물리적 페이지에서 값을 읽어서 제2 어레이에 매핑된 물리적 페이지로 복사하는 단계를 포함한다.In a case where mapping from a virtual address to a physical memory is performed on a page basis and the start address value of the first array is limited to a multiple of the page size, a memory mapping method according to an embodiment of the present invention includes a step of updating only VPN to PPN mapping information for pages having the same value among pages mapped to the first array, and, in a case where even some data in a page is different, reading the value from a physical page mapped to the first array and copying it to a physical page mapped to the second array.

가상 주소에서 물리적 메모리로의 맵핑이 페이지 단위로 이루어지고 제1 어레이의 시작 주소 값이 페이지 크기의 배수로 제한된 경우에, 본 발명의 일 실시예에 따른 메모리 매핑방법은, 제1 어레이에 매핑된 페이지들 중 마지막 페이지를 제외한 페이지들에 대해서는 VPN에서 PPN으로의 매핑 정보(VPN to PPN mapping information)만을 갱신하고, 마지막 페이지에 대해서는, 마지막 페이지에 데이터가 페이지 크기만큼 차 있는 경우에는 VPN에서 PPN으로의 매핑 정보만을 갱신하고, 그렇지 않은 경우에는 마지막 페이지에 저장되어 있는 데이터에 대해서만 제1 어레이에 매핑된 물리적 페이지에서 값을 읽어서 제2 어레이에 매핑된 물리적 페이지로 복사하는 단계를 포함할 수 있다.In a case where mapping from a virtual address to a physical memory is performed on a page basis and the start address value of the first array is limited to a multiple of the page size, a memory mapping method according to an embodiment of the present invention may include a step of updating only VPN to PPN mapping information for pages except the last page among the pages mapped to the first array, and, for the last page, if the last page has data as full as the page size, updating only the VPN to PPN mapping information, and otherwise, reading values from a physical page mapped to the first array and copying them to a physical page mapped to the second array only for data stored in the last page.

상기 컴퓨팅 시스템은 하나의 PPN에 매핑된 VPN의 갯수에 대한 정보를 저장하는 PPN 정보 테이블을 포함할 수 있다. 본 발명의 일 실시예에 따른 메모리 매핑방법은, 특정 PPN에 복사된 배열들 중 더 이상 사용하지 않게 된 배열에 대해서, 상기 PPN 정보 테이블의 상기 특정 PPN에 매핑된 VPN의 갯수를 1씩 감소시키는 단계와, 상기 감소의 결과로 PPN에 매핑된 VPN의 갯수가 0이 되면 해당 PPN을 현재 사용하고 있지 않는 페이지들을 관리하는 free PPN 리스트에 추가하는 단계를 포함할 수 있다.The computing system may include a PPN information table that stores information about the number of VPNs mapped to one PPN. A memory mapping method according to one embodiment of the present invention may include a step of decreasing the number of VPNs mapped to a specific PPN in the PPN information table by 1 for an array that is no longer used among arrays copied to a specific PPN, and a step of adding the corresponding PPN to a free PPN list that manages pages that are not currently in use when the number of VPNs mapped to the PPN becomes 0 as a result of the decrease.

컴퓨팅 시스템이 다양한 페이지 크기를 지원하는 경우에, 본 발명의 일 실시예에 따른 메모리 매핑방법은, 제1 어레이의 마지막 페이지가 큰 크기의 페이지로 설정되어 있지만 마지막 페이지에 들어 있는 데이터의 크기가 작은 경우에, 제1 어레이에 매핑된 페이지의 크기를 작은 크기의 페이지로 변환하여 제2 어레이에 할당된 페이지로 데이터를 복사하는 단계를 포함할 수 있다.In a case where the computing system supports various page sizes, a memory mapping method according to one embodiment of the present invention may include a step of converting the size of a page mapped to the first array to a page of a small size and copying the data to a page allocated to the second array, when the last page of the first array is set to a large-sized page but the size of data contained in the last page is small.

컴퓨팅 시스템이 다양한 페이지 크기를 지원하는 경우에, 본 발명의 일 실시예에 따른 메모리 매핑방법은, 상기 컴퓨팅 시스템에서 NLP (Natural Language Processing)의 빔 서치를 적용한 연산을 수행할 때, 콘텍스트 단계에서 생성된 특징 데이터를 빔의 수만큼 복사하는 과정에서 맨 마지막 페이지에 대해서는 상대적으로 작은 크기의 페이지를 할당하는 단계를 포함할 수 있다.In a case where a computing system supports various page sizes, a memory mapping method according to an embodiment of the present invention may include a step of allocating a page of a relatively small size to the last page in a process of copying feature data generated in a context stage as many times as the number of beams when performing an operation applying beam search of NLP (Natural Language Processing) in the computing system.

컴퓨팅 시스템이 다양한 페이지 크기를 지원하는 경우에, 본 발명의 일 실시예에 따른 메모리 매핑방법은, 생성 단계(generation phase)에서 한 단어 생성시마다 추가되는 특징 데이터(feature data)의 크기와 동일한 페이지 또는 그 배수에 해당되는 크기를 가지는 페이지가 있으면 해당되는 페이지 크기를 선택해서 사용하는 단계를 포함할 수 있다.In a case where a computing system supports various page sizes, a memory mapping method according to an embodiment of the present invention may include a step of selecting and using a page size having a size equal to or a multiple thereof of the size of feature data added for each word generated in a generation phase.

가상 주소를 사용하여 물리적 메모리에 접근하기 위해 가상 주소에서 물리적 주소로의 매핑을 사용하는 컴퓨팅 시스템에서 본 발명의 일 실시예에 따른 빔 서치를 위한 메모리 매핑장치는, 상기 매칭장치는 가상 주소 상에서 정의된 제1 어레이를 제2 어레이로 복사할 때, 제2 어레이에 매핑된 물리적 메모리 중 복사할 데이터를 저장하게 될 부분에 대한 매핑 정보를 제1 어레이에 매핑된 물리적 메모리 영역을 가리키도록 변경할 수 있다.In a computing system using a mapping from a virtual address to a physical address to access a physical memory using a virtual address, a memory mapping device for beam search according to one embodiment of the present invention is configured such that, when copying a first array defined on a virtual address to a second array, the matching device can change mapping information regarding a portion of the physical memory mapped to the second array that is to store data to be copied to point to a physical memory area mapped to the first array.

일 실시예에서 상기 메모리 매핑장치는 VPN에서 PPN으로의 매핑 정보를 저장하기 위한 VPN에서 PPN으로의 매핑 테이블(VPN to PPN mapping table)을 구비한다. 가상 주소에서 물리적 메모리로의 맵핑이 페이지 단위로 이루어지고 제1 어레이의 시작 주소 값이 페이지 크기의 배수로 제한된 경우에, 상기 메모리 매핑장치는 제1 어레이에 매핑된 페이지들 중 동일한 값을 가지고 있는 페이지에 대해서는 VPN에서 PPN으로의 매핑 정보(VPN to PPN mapping information)만을 갱신하고, 한 페이지 내에 일부 데이터라도 다른 부분이 존재하는 경우에는 제1 어레이에 매핑된 물리적 페이지에서 값을 읽어서 제2 어레이에 매핑된 물리적 페이지로 복사한다.In one embodiment, the memory mapping device has a VPN to PPN mapping table for storing mapping information from a VPN to a PPN. When mapping from a virtual address to a physical memory is performed in units of pages and a start address value of a first array is limited to a multiple of a page size, the memory mapping device updates only the VPN to PPN mapping information for pages having the same value among pages mapped to the first array, and when there is a part of data that is different in one page, the memory mapping device reads the value from the physical page mapped to the first array and copies it to the physical page mapped to the second array.

일 실시예에서 상기 메모리 매핑장치는 하나의 PPN에 매핑된 VPN의 갯수에 대한 정보를 저장하는 PPN 정보 테이블과, 현재 사용하고 있지 않는 페이지들을 관리하는 프리 PPN 리스트(free PPN list)를 포함한다. 상기 메모리 매핑장치는 특정 PPN에 복사된 배열들 중 더 이상 사용하지 않게 된 배열에 대해서, 상기 PPN 정보 테이블의 상기 특정 PPN에 매핑된 VPN의 갯수를 1씩 감소시키며, 상기 감소의 결과로 PPN에 매핑된 VPN의 갯수가 0이 되면 상기 메모리 매핑장치는 해당 PPN을 상기 프리 PPN 리스트에 추가할 수 있다.In one embodiment, the memory mapping device includes a PPN information table that stores information about the number of VPNs mapped to one PPN, and a free PPN list that manages pages that are not currently in use. The memory mapping device decreases the number of VPNs mapped to a specific PPN in the PPN information table by 1 for an array that is no longer in use among the arrays copied to the specific PPN, and when the number of VPNs mapped to the PPN becomes 0 as a result of the decrease, the memory mapping device can add the corresponding PPN to the free PPN list.

다양한 페이지 크기를 지원하는 컴퓨팅 시스템에서, 제1 어레이의 마지막 페이지가 큰 크기의 페이지로 설정되어 있지만 마지막 페이지에 들어 있는 데이터의 크기가 작은 경우에, 본 발명의 일 실시예에 따른 메모리 매핑장치는 제1 어레이에 매핑된 페이지의 크기를 작은 크기의 페이지로 변환하여 제2 어레이에 할당된 페이지로 데이터를 복사할 수 있다.In a computing system supporting various page sizes, when the last page of the first array is set to a large-sized page but the size of data contained in the last page is small, a memory mapping device according to an embodiment of the present invention can convert the size of a page mapped to the first array to a small-sized page and copy the data to a page allocated to the second array.

다양한 페이지 크기를 지원하는 컴퓨팅 시스템에서, 상기 컴퓨팅 시스템에서 NLP (Natural Language Processing)의 빔 서치를 적용한 연산을 수행할 때, 본 발명의 일 실시예에 따른 메모리 매핑장치는 다양한 페이지 크기를 지원하며,콘텍스트 단계에서 생성된 특징 데이터를 빔의 수만큼 복사하는 과정에서 맨 마지막 페이지에 대해서는 상대적으로 작은 크기의 페이지를 할당할 수 있다.In a computing system supporting various page sizes, when performing an operation applying beam search of NLP (Natural Language Processing) in the computing system, a memory mapping device according to an embodiment of the present invention supports various page sizes, and in the process of copying feature data generated in a context stage by the number of beams, a page of a relatively small size can be allocated to the last page.

다양한 페이지 크기를 지원하는 컴퓨팅 시스템에서, 본 발명의 일 실시예에 따른 메모리 매핑장치는, 생성 단계(generation phase)에서 한 단어 생성시마다 추가되는 특징 데이터(feature data)의 크기와 동일한 페이지 또는 그 배수에 해당되는 크기를 가지는 페이지가 있으면 해당되는 페이지 크기를 선택해서 사용할 수 있다.In a computing system that supports various page sizes, a memory mapping device according to one embodiment of the present invention can select and use a page size that is the same size as or a multiple of the size of feature data added for each word generated in a generation phase if there is a page.

일 실시예에서, 상기 제1 어레이는 콘텍스트 단계에서 생성된 특징 데이터이며, 상기 제2 어레이는 생성 단계에서의 특징 데이터일 수 있다.In one embodiment, the first array may be feature data generated in the context phase, and the second array may be feature data in the generation phase.

본 발명의 일 실시예에 따르면, 가상주소를 물리적 주소로 변환하는 매핑 테이블의 정보만을 갱신함으로써 불필요한 메모리 읽기/쓰기 과정을 줄일 수 있다. According to one embodiment of the present invention, unnecessary memory read/write processes can be reduced by only updating information in a mapping table that converts a virtual address into a physical address.

본 발명의 일 실시예에 따르면, 빔 서치 적용과정에서 특징 데이터를 외부 메모리에 저장할 때, 페이지 테이블 매핑 정보 갱신을 통해 공유 가능한 데이터에 대해서는 외부메모리 접근을 하지 않고, 공유 불가능한 영역에 대해서만 외부 메모리 접근을 함으로써 외부메모리의 대역폭 제한으로 인해 NLP의 처리 성능이 저하되는 것을 최소화할 수 있다.According to one embodiment of the present invention, when storing feature data in an external memory during a beam search application process, external memory access is not performed for shareable data through page table mapping information update, and external memory access is performed only for non-shareable areas, thereby minimizing degradation of NLP processing performance due to bandwidth limitations of the external memory.

본 발명의 일 실시예에 따르면, 외부 메모리 대역폭 제한으로 인해 AI 가속기 칩의 처리 속도가 저하되는 현상을 최소화 할 수 있기 때문에 AI 가속기 칩을 통해서 단위시간당 처리할 수 있는 데이터 처리 성능이 개선된다.According to one embodiment of the present invention, since the phenomenon of the processing speed of the AI accelerator chip being reduced due to external memory bandwidth limitations can be minimized, the data processing performance that can be processed per unit time through the AI accelerator chip is improved.

도 1은 가상주소 공간을 사용하는 AI 가속기(100)의 하드웨어 구성을 보여주는 블록도이다.
도 2은 종래의 컴퓨팅 시스템에서 GPT에 빔 서치(beam search)를 적용하였을 때, 가상 주소(virtual address) 상에서 B개의 메모리 영역이 실제 물리적 메모리(physical memory)에 매핑된 예를 보여준다.
도 3은 본 발명의 방법에 따른 메모리 운영 방법을 보여주는 도면이다.
도 4는 가상주소 공간을 지원하는 컴퓨팅 시스템에서 페이지 단위로 자원 사용 및 매핑 정보를 관리하고자 할 때, 기존 방식과 본 발명에서 제안하는 방법을 적용하고자 할 때의 차이를 보여준다.
도 5는 본 발명의 일 실시예에서 개별 물리적 페이지 별로 매핑되어 있는 VPN의 수를 관리하는 동작을 보여주는 흐름도이다.
도 6은 본 발명의 일 실시예에서 특징 데이터를 시퀀스 정보 테이블로 복사할 때 효율적으로 데이터 공유가 가능해질 수 있도록 페이지의 크기를 가변적으로 조절하는 것을 설명하기 위한 도면이다.Figure 1 is a block diagram showing the hardware configuration of an AI accelerator (100) using a virtual address space.
Figure 2 shows an example of B memory areas being mapped to actual physical memory on a virtual address when beam search is applied to GPT in a conventional computing system.
FIG. 3 is a drawing showing a memory operating method according to the method of the present invention.
FIG. 4 shows the difference between applying the existing method and the method proposed in the present invention when managing resource usage and mapping information on a page basis in a computing system supporting a virtual address space.
FIG. 5 is a flowchart showing an operation of managing the number of VPNs mapped to each individual physical page in one embodiment of the present invention.
FIG. 6 is a drawing for explaining how to variably adjust the size of a page so that efficient data sharing can be achieved when copying feature data to a sequence information table in one embodiment of the present invention.

이하, 본 개시의 일부 실시예들을 예시적인 도면을 이용해 상세하게 설명한다. 각 도면의 구성 요소들에 참조 부호를 부가함에 있어서, 동일한 구성 요소들에 대해서는 비록 다른 도면 상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present disclosure will be described in detail using exemplary drawings. When adding reference numerals to components of each drawing, it should be noted that the same numerals are used for identical components as much as possible even if they are shown in different drawings. In addition, when describing the present disclosure, if it is determined that a specific description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description thereof will be omitted.

본 개시에 따른 실시예의 구성요소를 설명하는 데 있어서, 제1, 제2, i), ii), a), b) 등의 부호를 사용할 수 있다. 이러한 부호는 그 구성요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 부호에 의해 해당 구성요소의 본질 또는 차례나 순서 등이 한정되지 않는다. 명세서에서 어떤 부분이 어떤 구성요소를 '포함' 또는 '구비'한다고 할 때, 이는 명시적으로 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한 명세서에 기재된 '부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In describing components of embodiments according to the present disclosure, symbols such as first, second, i), ii), a), b), etc. may be used. These symbols are only for distinguishing the components from other components, and the nature, order, or sequence of the corresponding components are not limited by the symbols. When a part in the specification is said to 'include' or 'provide' a component, this does not mean that other components are excluded, but rather that other components can be further included, unless explicitly stated to the contrary. In addition, terms such as 'part' and 'module' described in the specification mean a unit that processes at least one function or operation, and this can be implemented by hardware, software, or a combination of hardware and software.

첨부된 도면과 함께 이하에 개시될 발명의 설명은 본 발명의 예시적인 실시 형태를 설명하고자 하는 것이며, 본 발명이 실시될 수 있는 유일한 실시 형태를 나타내고자 하는 것이 아니다.The following description of the invention, together with the accompanying drawings, is intended to explain exemplary embodiments of the invention and is not intended to represent the only embodiments in which the invention may be practiced.

가상 주소를 사용하여 물리적 메모리에 접근하기 위해 가상 주소에서 물리적 주소로의 매핑을 사용하는 컴퓨팅 시스템에서, 가상 주소 상에서 정의된 제1 어레이를 제2 어레이로 복사할 때, 제1 어레이에 저장된 값을 메모리에서 읽어서 제2 어레이에 매핑된 메모리에 저장하는 방법 대신에 본 발명에서는 제2 어레이에 매핑된 물리적 메모리 중 복사할 데이터를 저장하게 될 부분에 대한 매핑 정보를 제1 어레이에 매핑된 물리적 메모리 영역을 가리키도록 변경한다. 이러한 방식은 어레이B를 복사한 후에 이용할 때에 제1 어레이에서 복사된 부분에 대해서는 데이터의 수정이 이루어지지 않는다는 조건이 성립하는 경우에만 실행하도록 구성할 수 있다.In a computing system that uses a mapping from a virtual address to a physical address to access a physical memory using a virtual address, when copying a first array defined on a virtual address to a second array, instead of reading a value stored in the first array from memory and storing it in a memory mapped to the second array, the present invention changes mapping information for a portion of the physical memory mapped to the second array where data to be copied is to be stored to point to a physical memory area mapped to the first array. This method can be configured to be executed only when a condition is met that no data is modified for the portion copied from the first array when using it after copying array B.

일 실시예에서, 가상 주소에서 물리적 메모리로의 맵핑이 페이지 단위로 이루어지고, 제1 어레이의 시작 주소 값이 페이지 크기의 배수로 제한된 경우에, 제1 어레이에 매핑된 페이지들 중 동일한 값을 가지고 있는 페이지에 대해서는 VPN에서 PPN으로의 매핑 정보(VPN to PPN mapping information)만을 갱신하고, 한 페이지 내에 일부 데이터라도 다른 부분이 존재하는 경우에는 제1 어레이에 매핑된 물리적 페이지에서 값을 읽어서 제2 어레이에 매핑된 물리적 페이지로 복사할 수 있다.In one embodiment, when mapping from a virtual address to a physical memory is performed on a page basis and the start address value of the first array is limited to a multiple of the page size, only the VPN to PPN mapping information is updated for pages that have the same value among the pages mapped to the first array, and when there is a part of data that is different even within a page, the value can be read from the physical page mapped to the first array and copied to the physical page mapped to the second array.

가상 주소에서 물리적 메모리로의 맵핑이 페이지 단위로 이루어지고 제1 어레이의 시작 주소 값이 페이지 크기의 배수로 제한된 경우(즉, 데이터가 페이지 단위로 정렬되어 있는 경우)에, 본 발명의 일 실시예에서는 제1 어레이에 매핑된 페이지들 중 마지막 페이지를 제외한 페이지들에 대해서는 VPN(virtual page number) to PPN(physical page number) 매핑 정보만을 갱신할 수 있다. 마지막 페이지에 대해서는, 마지막 페이지에 데이터가 페이지 크기만큼 차 있는 경우에는 VPN에서 PPN으로의 매핑 정보((VPN to PPN mapping information)만을 갱신한다. 마지막 페이지에 데이터가 페이지 크기만큼 차 있지 않은 경우에는 마지막 페이지에 저장되어 있는 데이터에 대해서만 제1 어레이에 매핑된 물리적 페이지에서 값을 읽어서 제2 어레이에 매핑된 물리적 페이지로 실제로 복사한다.In a case where mapping from a virtual address to a physical memory is performed on a page basis and the start address value of the first array is limited to a multiple of the page size (i.e., data is aligned on a page basis), in one embodiment of the present invention, only VPN (virtual page number) to PPN (physical page number) mapping information can be updated for pages except the last page among the pages mapped to the first array. For the last page, if the last page is filled with data as much as the page size, only the VPN to PPN mapping information (VPN to PPN mapping information) is updated. If the last page is not filled with data as much as the page size, only for the data stored in the last page, the values are read from the physical page mapped to the first array and actually copied to the physical page mapped to the second array.

반면에, 제1 어레이의 시작 주소 값이 페이지 크기의 배수로 제한되지 않은 경우에는, 본 발명의 일 실시예에서는 제1 어레이에 매핑된 페이지들 중 첫번째와 마지막 페이지에 대해서는 제2 어레이에 매핑된 물리적 페이지로 데이터를 직접 복사하고, 나머지 페이지들에 대해서는 VPN에서 PPN으로의 매핑 정보만을 갱신할 수 있다.On the other hand, if the starting address value of the first array is not limited to a multiple of the page size, in one embodiment of the present invention, data is directly copied to the physical pages mapped to the second array for the first and last pages among the pages mapped to the first array, and only the mapping information from the VPN to the PPN can be updated for the remaining pages.

실시예에 따라서는, 하나의 PPN에 매핑된 VPN의 갯수에 대한 정보를 저장하는 테이블을 포함할 수 있다. 특정 PPN에 복사된 어레이들 중의 하나가 더 이상 사용하지 않게 되면, 상기 테이블의 상기 특정 PPN에 매핑된 VPN의 갯수를 1씩 감소시킨다. 이러한 감소의 결과로 PPN에 매핑된 VPN의 갯수가 0이 되면 해당 PPN을 현재 사용하고 있지 않는 페이지들을 관리하는 free PPN 리스트에 추가한다.In some embodiments, a table may be included that stores information about the number of VPNs mapped to a PPN. When one of the arrays copied to a specific PPN is no longer in use, the number of VPNs mapped to the specific PPN in the table is decreased by 1. When the number of VPNs mapped to the PPN becomes 0 as a result of this decrease, the PPN is added to a free PPN list that manages pages that are not currently in use.

본 발명에 따른 컴퓨팅 시스템은 다양한 페이지 크기를 지원할 수 있다. 이 경우에, 제1 어레이의 마지막 페이지가 큰 크기의 페이지로 설정되어 있는 반면에 마지막 페이지에 들어 있는 데이터의 크기가 작은 경우에는, 제1 어레이에 매핑된 페이지의 크기를 작은 크기의 페이지로 변환하여 제2 어레이에 할당된 페이지로 데이터를 복사할 수 있다. 이로써, 추후 연산 과정을 거치면서 제2 어레이에 저장된 유효한 데이터가 증가할 때 데이터가 가득 찬 페이지의 수가 증가되고 데이터가 일부만 들어 있는 마지막 페이지에 대해서는 그 데이터의 양이 적어질 수 있도록 함으로써 추후 제2 어레이를 제3 어레이로 복사할 때 데이터 복사에 드는 오버헤드를 최소화 할 수 있다.The computing system according to the present invention can support various page sizes. In this case, if the last page of the first array is set to a large-sized page while the size of data contained in the last page is small, the size of the page mapped to the first array can be converted to a small-sized page and the data can be copied to the page allocated to the second array. Accordingly, when the valid data stored in the second array increases through a subsequent operation process, the number of pages filled with data can be increased and the amount of data contained in the last page with only a portion of the data can be reduced, thereby minimizing the overhead of copying data when copying the second array to the third array later.

다양한 페이지 크기를 지원하는 컴퓨팅 시스템에서 NLP (Natural Language Processing)의 성능을 높이기 위해 빔 서치(beam search)를 적용한 연산을 수행할 때, 본 발명의 일 실시예에서는, 콘텍스트(context) 단계에서 생성된 특징 데이터를 빔의 수만큼 복사하는 과정에서 콘텍스트에서 생성된 특징 데이터의 크기에 따라 앞쪽 특징 데이터는 큰 크기의 페이지를 할당하고, 맨 뒤의 특징 데이터에 대해서는 작은 크기의 페이지를 할당할 수 있다.In order to improve the performance of NLP (Natural Language Processing) in a computing system supporting various page sizes, when performing an operation applying beam search, in one embodiment of the present invention, in the process of copying feature data generated in a context step by the number of beams, a large-sized page can be allocated to the front feature data and a small-sized page can be allocated to the back-most feature data according to the size of the feature data generated in the context.

다양한 페이지 크기를 지원하는 컴퓨팅 시스템에서, 본 발명의 일 실시예에서는 생성 단계(generation phase)에서 한 단어 생성시마다 추가되는 특징 데이터(feature data)의 크기와 동일한 크기의 페이지 또는 그 배수에 해당되는 크기를 가지는 페이지가 있으면 해당되는 페이지 크기를 선택해서 사용할 수 있도록 구성할 수 있다.In a computing system that supports various page sizes, in one embodiment of the present invention, if there is a page having the same size as the size of feature data added for each word generated in the generation phase or a page having a size corresponding to a multiple thereof, the corresponding page size can be selected and used.

이하, 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 가상주소 공간을 사용하는 AI 가속기(100)의 하드웨어 구성을 보여준다. AI 가속기(100)는, 프로세서(110)와, 가상주소에서 물리적 주소로의 변환을 위한 정보를 저장하고 있는 TLB(Translation Lookaside Buffer)(130)와, TLB(130)를 참조하여 프로세서(110)로부터의 가상주소를 물리적 주소로 변환하는 MMU(memory management unit)(120)를 구비한다. AI 가속기(100)는 또한, 다양한 연산동작을 수행하는 연산유닛(150)과, 연산유닛(150)이 메모리(200)에 직접 액세스할 수 있도록 하는 DMA(Direct Memory Access)(140)를 더 구비한다.Fig. 1 shows the hardware configuration of an AI accelerator (100) that uses a virtual address space. The AI accelerator (100) includes a processor (110), a TLB (Translation Lookaside Buffer) (130) that stores information for conversion from a virtual address to a physical address, and an MMU (memory management unit) (120) that converts a virtual address from the processor (110) to a physical address by referring to the TLB (130). The AI accelerator (100) also includes an operation unit (150) that performs various operation operations, and a DMA (Direct Memory Access) (140) that allows the operation unit (150) to directly access a memory (200).

프로세서(110)에서 가상주소를 이용해서 DRAM과 같은 메모리(200)에 접근하려고 할 때, MMU(120)는 TLB(130)를 참조하여 가상주소를 물리적 주소로 변환한다. 이를 통해 프로세서(110)는 메모리(200)에 읽기/쓰기 동작을 수행할 수 있다. AI 가속기(100) 내의 DMA(Direct Memory Access)(140)와 연산유닛(150)은 물리적 주소를 사용하여 직접 메모리(200)에 액세스 한다. When the processor (110) attempts to access memory (200) such as DRAM using a virtual address, the MMU (120) converts the virtual address into a physical address by referring to the TLB (130). Through this, the processor (110) can perform a read/write operation on the memory (200). The DMA (Direct Memory Access) (140) and the operation unit (150) within the AI accelerator (100) directly access the memory (200) using a physical address.

개별 빔들을 위한 특징 데이터 중에서는 서로 동일한 값을 가지는 부분들이 존재할 수 있다. 예를 들면, 콘텍스트 단계(context phase)에서 생성된 특징 데이터(feature data) (K, V)와, 생성 단계(generation phase)에서 문장 생성 시 여러 빔 사이에서 동일한 단어 시퀀스를 가지게 되는 경우 등이 있다.Among the feature data for individual beams, there may be parts that have the same values. For example, there are feature data (K, V) generated in the context phase, and cases where the same word sequence is present among multiple beams when generating sentences in the generation phase.

본 발명의 일 실시예에서는, 가상주소를 사용하는 컴퓨팅 시스템에서 페이지 단위로 물리적 페이지로 매핑이 이루어지는 경우에, 동일한 값을 가지는 페이지들에 대해서는 물리적으로 다른 페이지를 할당하고 데이터를 복사하는 대신, VPN에서 PPN으로의 매핑 정보만을 갱신해서 서로 다른 가상 페이지가 동일한 물리적 페이지를 가리키도록 함으로써 외부 메모리에 대한 읽기/쓰기 양을 최소화한다.In one embodiment of the present invention, when mapping is performed to physical pages on a page-by-page basis in a computing system using virtual addresses, instead of physically allocating different pages and copying data for pages having the same value, only the mapping information from VPN to PPN is updated so that different virtual pages point to the same physical page, thereby minimizing the amount of read/write to external memory.

도 2은 종래의 컴퓨팅 시스템에서 GPT에 빔 서치(beam search)를 적용하였을 때, 가상 주소(virtual address) 상에서 B개의 메모리 영역이 실제 물리적 메모리(physical memory)에 매핑된 예를 보여준다. 물리적 메모리에서 회색으로 표시된 페이지들은 B개의 어레이(array)(va_0 ~ va_(B-1)) 중에서 동일한 데이터를 가지고 있는 페이지들을 나타낸 것이고, 각 어레이에서 마지막 부분에 있는 페이지들은 페이지 중 데이터를 담고 있는 일부 영역에 대해서만 빗금 표시를 하였다. B개의 빔들에서 해당 부분에 다른 데이터를 가지고 있기 때문에 회색 배경의 진하기를 다르게 표기하였다. Fig. 2 shows an example of B memory areas being mapped to actual physical memory on a virtual address when beam search is applied to GPT in a conventional computing system. Pages marked in gray in the physical memory represent pages that have the same data among the B arrays (va_0 to va_(B-1)), and the pages in the last part of each array are hatched only for some areas that contain data among the pages. Since the B beams have different data in the corresponding parts, the gray background is indicated in different shades.

도 2에서 나타내는 상황은 GPT에서 콘텍스트 단계(context phase)가 끝나고 생성 단계(generation phase)가 시작되는 초기 단계 또는, 생성 단계 과정에서 현재 선택된 B개의 단어가 동일한 단어 시퀀스에 이어지는 경우에 대한 것으로써 실제 물리적 메모리 페이지 공간에서 데이터가 들어있는 부분에 대해서는 빗금 및 음영 처리를 하여 표시하였으며 각 빔마다 다른 데이터를 가지고 있는 것을 나타내기 위해 음영의 농도를 다르게 표기하였다.The situation shown in Fig. 2 is the initial stage where the context phase ends and the generation phase begins in the GPT, or the case where the currently selected B words are connected to the same word sequence during the generation phase. The parts that contain data in the actual physical memory page space are indicated by hatching and shading, and the shading intensity is indicated differently to indicate that each beam has different data.

종래 시스템에서는 여러 개의 빔들이 특징 데이터(feature data) 중에서 공통된 값을 가지는 부분들을 가지고 있어도 도 2과 같이 실제 물리 메모리의 다른 공간에 같은 값들을 복사해서 가지고 있는 형태로 운영하기 때문에 메모리 공간 및 메모리 버스 대역폭을 비효율적으로 사용하게 된다.In conventional systems, even if multiple beams have parts of feature data that have common values, they are operated in a form where the same values are copied and stored in different spaces of the actual physical memory, as shown in Fig. 2, which results in inefficient use of memory space and memory bus bandwidth.

도 3은 본 발명의 방법에 따른 메모리 운영 방법을 보여주는 도면이다. 도 3에 도시한 것처럼, 본 발명에서는 GPT를 위한 빔 서치 연산 과정에서 여러 빔 간에 동일한 데이터를 가지는 페이지들을 공유해서 메모리 공간 및 메모리 버스 대역폭 사용량을 줄이고 있다. 즉, 가상주소 상에서 B개의 빔에 대한 저장 공간이 사용되고 있을 때, B개의 빔 상에서 공통된 데이터를 가지는 부분에 대해서는 VPN에서 PPN으로의 매핑(VPN to PPN mapping)을 동일한 물리적 페이지에 접근하도록 매핑 테이블을 설정함으로써 메모리 공간을 효율적으로 사용한다. FIG. 3 is a diagram showing a memory operating method according to the method of the present invention. As illustrated in FIG. 3, in the present invention, pages having the same data among multiple beams are shared during a beam search operation process for GPT, thereby reducing memory space and memory bus bandwidth usage. That is, when storage space for B beams is being used on a virtual address, memory space is efficiently used by setting a mapping table so that VPN to PPN mapping accesses the same physical page for parts having common data on the B beams.

도 3에서 첫번째 빔인 va_0에서 메모리에 저장된 특징 데이터를 페이지 단위로 나누어 보았을 때, 다른 빔과 동일한 값을 가지고 있어서 공유가 가능한 부분은 물리적 메모리에서 회색으로 표시하였고, 다른 빔과 공유할 수 없는 페이지에 대해서는 빗금친 영역을 가지고 있는 것으로 표시하였다. When the feature data stored in the memory in the first beam va_0 in Figure 3 is divided into pages, the parts that have the same values as other beams and can be shared are marked in gray in the physical memory, and pages that cannot be shared with other beams are marked with a hatched area.

도 3의 예에서 va_0의 3개 페이지는 va_1, ..., va_(B-1)와 동일하지만, va_0의 마지막 페이지는 일부 데이터는 다른 빔과 같은 값을 가지고 있고 일부 데이터는 다른 값을 가지고 있다. 본 발명에서는 가상주소 공간을 사용하는 컴퓨팅 시스템에서 페이지 단위의 매핑 과정에서 동일한 값을 가지고 있는 페이지를 VPN에서 PPN으로의 매핑 정보를 수정함으로써 여러 빔 사이에 공유할 수 있도록 한다. 따라서 한 페이지 블록 내에 있는 모든 데이터가 공유 가능한 경우에는 매핑 정보를 수정해서 여러 빔 사이에서 공유가 가능하지만, va_0의 마지막 페이지의 경우와 같이 한 페이지 내에 일부 데이터라도 다른 부분이 존재하는 경우에는 페이지 단위로 공유할 수 없기 때문에 각 빔 간에 물리적으로 다른 페이지를 사용해서 데이터를 보관한다. In the example of FIG. 3, the three pages of va_0 are identical to va_1, ..., va_(B-1), but the last page of va_0 has some data that has the same values as other beams and some data that has different values. In the present invention, in a computing system using a virtual address space, pages that have the same values can be shared between multiple beams by modifying the mapping information from VPN to PPN during the mapping process for each page. Therefore, in a case where all data in one page block can be shared, sharing is possible between multiple beams by modifying the mapping information, but in a case where even some data in one page is different, as in the case of the last page of va_0, sharing for each page is not possible, and therefore data is stored using physically different pages between each beam.

도 3에서 두번째 빔인 va_1의 경우, 마지막 한 페이지를 제외한 다른 페이지에 대해서는 va_0에서 물리적 페이지로 사용되는 부분으로 매핑되고 있으며 다른 값을 가지고 있는 맨 마지막 페이지에 대해서만 물리적 메모리상에서 다른 페이지로 매핑되고 있다. 이러한 매핑은 va_(B-1)까지 동일하게 적용되고 있다. In the case of the second beam va_1 in Fig. 3, except for the last page, all other pages are mapped to the part used as a physical page in va_0, and only the very last page with a different value is mapped to a different page in physical memory. This mapping is applied equally up to va_(B-1).

일 실시예에서, 마지막 페이지에 대해서는, 마지막 페이지에 데이터가 페이지 크기만큼 차 있는 경우에는 VPN에서 PPN으로의 매핑 정보만을 갱신하고, 그렇지 않은 경우에는 마지막 페이지에 저장되어 있는 데이터에 대해서만 제1 어레이에 매핑된 물리적 페이지에서 값을 읽어서 제2 어레이에 매핑된 물리적 페이지로 복사하도록 구성할 수 있다. In one embodiment, for the last page, if the last page has data as full as the page size, only the mapping information from the VPN to the PPN is updated, and otherwise, only for the data stored in the last page, the values are read from the physical page mapped to the first array and copied to the physical page mapped to the second array.

도 4는 가상주소 공간을 지원하는 컴퓨팅 시스템에서 페이지 단위로 자원 사용 및 매핑 정보를 관리하고자 할 때, 종래 방식과 본 발명의 일 실시예에 따른 방식과의 차이를 보여준다. FIG. 4 shows the difference between a conventional method and a method according to an embodiment of the present invention when managing resource usage and mapping information on a page basis in a computing system supporting a virtual address space.

도 4의 (a)는 종래 방식의 페이지 관리를 위한 정보를 나타내고 있다. 물리적 메모리를 페이지 단위로 나누었을 때, 아직 사용되지 않은 페이지들에 대한 인덱스 정보를 관리하는 프리 PPN 리스트(Free PPN list)(420)와 가상 페이지가 어떤 물리적 페이지로 매핑되었는지를 나타내는 VPN에서 PPN으로의 매핑 테이블(VPN to PPN mapping table)(410)로 구성되어 있다. 일반적으로 운영시스템(OS)에서 이 정보를 관리한다. 사용자 프로그램에서 메모리를 접근하고자 할 때, 이 VPN에서 PPN으로의 매핑 테이블(410)을 통해서 물리적 주소로 변환하는 과정이 진행된다. 이 과정을 빠르게 처리할 수 있도록 하기 위해서 하드웨어에 메모리 관리 유닛(MMU, memory management unit)이 포함되기도 한다. Fig. 4 (a) shows information for conventional page management. When physical memory is divided into page units, it consists of a free PPN list (420) that manages index information for pages that are not yet used, and a VPN to PPN mapping table (410) that indicates which physical page a virtual page is mapped to. Typically, an operating system (OS) manages this information. When a user program wants to access memory, a process of converting to a physical address is performed through this VPN to PPN mapping table (410). In order to process this process quickly, a memory management unit (MMU) is sometimes included in the hardware.

사용자 프로그램에서 가상주소 공간 상에서 데이터를 사용하기 위해 메모리를 요청하면 OS에서는 프리 PPN 리스트(free PPN list)(420)에 있는 물리적 페이지를 요청된 가상주소에 매핑시켜준다. 또한 사용자 프로그램에서 해당 메모리를 더 이상 사용하지 않아서 해당 페이지가 비게(free) 되면, OS는 매핑되었던 정보를 삭제하고 그동안 사용되었던 물리적 페이지에 대한 인덱스 값들을 다시 프리 PPN 리스트(420)에 넣어서 다른 메모리 할당 요청이 들어오면 사용할 수 있게 된다. When a user program requests memory to use data in the virtual address space, the OS maps a physical page in the free PPN list (420) to the requested virtual address. In addition, when the user program no longer uses the memory and the page becomes free, the OS deletes the mapped information and puts the index values for the physical pages that were used back into the free PPN list (420) so that they can be used when another memory allocation request comes in.

도 4의 (b)는 본 발명의 방법에 따라 페이지 테이블 관리를 하기 위한 정보를 나타내고 있다. 종래 방식과 동일한 목적으로 프리 PPN 리스트(420), VPN에서 PPN으로의 매핑 테이블(410)이 사용되며, 추가적으로 PPN 정보 테이블(PPN info table)(430)을 사용하여 물리적 페이지에 매핑된 가상 페이지의 수를 관리한다. Figure 4 (b) shows information for managing page tables according to the method of the present invention. For the same purpose as the conventional method, a free PPN list (420) and a mapping table (410) from VPN to PPN are used, and additionally, a PPN information table (PPN info table) (430) is used to manage the number of virtual pages mapped to physical pages.

여러 VPN을 하나의 PPN으로 매핑해서 사용하는 경우에, 매핑된 VPN 중 일부에서 메모리 프리(memory free) 요청이 들어오더라도 해당 물리적 페이지에 매핑 된 VPN이 하나 이상 남아 있는 경우에는 해당 물리적 페이지는 계속 사용 중으로 표시해 두어야 한다. 이를 위하여 PPN 정보 테이블(430)을 사용하여 개별 물리적 페이지별로 자신에게 매핑되어 있는 VPN의 수를 관리해준다. 즉, 도 5에 도시한 것처럼, 매핑된 VPN에서 메모리 프리(memory free) 요청이 들어오면(단계 S10), PPN 정보 테이블(430)에서 해당 물리적 페이지에 매핑되어 있는 VPN의 수를 메모리 프리(memory free) 요청이 들어온 VPN의 수만큼 감소시킨다(단계 S20). 감소시킨 결과로 해당 물리적 페이지에 매핑되어 있는 VPN의 수가 0이 되면(단계 S30의 '예'), 해당 물리적 페이지를 프리 PPN 리스트(420)에 추가한다(단계 S40). In the case where multiple VPNs are mapped to one PPN and used, even if a memory free request comes in from some of the mapped VPNs, if there is at least one VPN mapped to the physical page, the physical page should be marked as still in use. To this end, the number of VPNs mapped to each physical page is managed using the PPN information table (430). That is, as illustrated in FIG. 5, when a memory free request comes in from a mapped VPN (step S10), the number of VPNs mapped to the physical page in the PPN information table (430) is decreased by the number of VPNs for which a memory free request has come in (step S20). If the number of VPNs mapped to the physical page becomes 0 as a result of the decrease ('Yes' in step S30), the physical page is added to the free PPN list (420) (step S40).

도 6은 본 발명에서 제안하는 방식을 적용해서 특징 데이터를 시퀀스 정보 테이블로 복사할 때, 효율적으로 데이터 공유가 가능해질 수 있도록 페이지의 크기를 가변적으로 조절하는 것을 나타내고 있다. GPT의 경우, 사용자의 질문에 따라 콘텍스트 단계(context phase)에서 특징 데이터를 생성하고, 생성 단계(generation phase)에서 사용자의 질문에 대한 출력 문장을 생성한다. GPT-3의 경우에 한 단어(token) 당 생성되는 특징 데이터는 K, V 행렬에서 한 행(row) 씩에 해당되고 한 행에 대한 데이터의 크기는 8kB 정도가 된다. 이때 콘텍스트 단계에서는 사용자의 질문을 한번에 다 받아서 적용하는 것이 가능하기 때문에, VPN에서 PPN으로의 매핑 과정에서 MMU(Memory Management Unit)의 TLB(Translation Lookaside Buffer) 히트(hit)를 높이기 위해 페이지의 크기를 큰 것으로 선택해서 메모리 공간을 할당하는 것이 유리할 수 있다. FIG. 6 shows that the page size is variably adjusted so that efficient data sharing is possible when copying feature data to a sequence information table by applying the method proposed in the present invention. In the case of GPT, feature data is generated in the context phase according to the user's question, and an output sentence for the user's question is generated in the generation phase. In the case of GPT-3, the feature data generated per word (token) corresponds to one row in the K, V matrices, and the size of the data for one row is approximately 8 kB. At this time, since it is possible to receive and apply all of the user's questions at once in the context phase, it may be advantageous to select a large page size and allocate memory space in order to increase the TLB (Translation Lookaside Buffer) hit of the MMU (Memory Management Unit) during the mapping process from VPN to PPN.

반면에 생성 단계에서는 한번에 한 단어씩만 생성되어 특징 데이터가 추가되며, 여러 빔에 대한 정보를 서로 공유하기 위해서는 한 단어마다 생성되는 특징 데이터인 pad_K와 pad_V의 크기에 맞는 페이지를 사용하는 것이 데이터 공유 확률을 높이고 물리적인 데이터의 복사를 줄일 수 있게 된다. 이를 위하여 본 발명의 일 실시예에서는 생성 단계에서 한 단어 생성시마다 추가되는 특징 데이터의 크기와 동일한 페이지 또는 그 배수에 해당되는 크기를 가지는 페이지가 있으면 해당되는 페이지 크기를 선택해서 사용할 수 있다.On the other hand, in the generation step, only one word is generated at a time and feature data is added, and in order to share information about multiple beams, using pages that match the sizes of pad_K and pad_V, which are feature data generated for each word, can increase the probability of data sharing and reduce physical data copying. To this end, in one embodiment of the present invention, if there is a page that is the same size as or a multiple of the size of feature data added for each word generated in the generation step, the corresponding page size can be selected and used.

도 6에서는 콘텍스트 단계에서 큰 크기의 페이지를 사용해서 메모리 할당(memory allocation)이 된 상태에서 생성 단계로 전환될 때, 본 발명의 방법에 따른 VPN에서 PPN으로의 매핑을 통한 물리적 페이지 공유가 효율적으로 이루어질 수 있도록 맨 마지막 페이지에 대해서 특징 데이터가 가득 차 있는 경우가 아니면 상대적으로 작은 크기의 페이지로 변환한다. 또한, 아직 데이터가 들어있지 않지만 어레이에 할당되어 있는 빈 페이지들에 대해서도 작은 크기의 페이지로 변경할 수 있다. In Fig. 6, when transitioning from a context phase in which memory allocation is performed using a large-sized page to a generation phase, in order to efficiently perform physical page sharing through mapping from a VPN to a PPN according to the method of the present invention, the last page is converted to a relatively small-sized page unless it is full of feature data. In addition, empty pages that do not yet contain data but are allocated to an array can also be changed to small-sized pages.

또한, 가상 주소 상에서 정의된 제1 어레이를 제2 어레이로 복사할 때, 제1 어레이의 마지막 페이지가 큰 크기의 페이지로 설정되어 있지만 마지막 페이지에 들어 있는 데이터의 크기가 작은 경우에는, 제1 어레이에 매핑된 페이지의 크기를 작은 크기의 페이지로 변환하여 제2 어레이에 할당된 페이지로 데이터를 복사하도록 구성할 수 있다. 이렇게 구성함으로써 추후 연산 과정을 거치면서 배열 B에 저장된 유효한 데이터가 증가할 때 데이터가 가득 찬 페이지의 수가 증가되고 데이터가 일부 들어 있는 마지막 페이지에 대해서는 그 데이터의 양이 적어질 수 있도록 함으로써 추후 제2 어레이를 제3 어레이로 복사할 때 데이터 복사에 드는 오버헤드를 최소화할 수 있다.In addition, when copying the first array defined on the virtual address to the second array, if the last page of the first array is set to a large-sized page but the size of the data contained in the last page is small, the size of the page mapped to the first array can be converted to a small-sized page and the data can be copied to the page allocated to the second array. By configuring it this way, when the valid data stored in array B increases through a later operation process, the number of pages full of data can increase and the amount of data for the last page containing some data can be reduced, thereby minimizing the overhead for copying data when copying the second array to the third array later.

본 발명의 일 실시예에 따르면, 빔 서치를 적용한 NLP의 추론 과정에서 빔 정보가 변경될 때마다 특징 데이터의 복사가 이루어질 때 실제 메모리에 읽기/쓰기 하는 양을 최소화 함으로써 메모리 버스 대역폭의 제한으로 인해 AI 가속기의 성능이 제한되는 문제를 완화할 수 있다.According to one embodiment of the present invention, when copying feature data is performed whenever beam information changes in the inference process of NLP applying beam search, the amount of reading/writing to actual memory is minimized, thereby alleviating the problem of performance of AI accelerator being limited due to limitations in memory bus bandwidth.

GPT-3의 경우 K와 V 행렬의 한 행(row)이 각각 8kB 정도의 크기를 가지고 있다. 사용자의 질문을 받아서 해당되는 단어의 수 만큼 콘텍스트 단계에서 특징 데이터가 생성되며 (단어 수 만큼의 행을 가지는 K, V 행렬), 생성 단계에서 한 단어 생성시마다 각각 K, V 행렬의 마지막에 1개의 행 씩이 추가되게 되며, 전체 토큰 수가 증가할수록 복사하게 되는 특징 데이터의 크기는 계속해서 증가하게 된다.In the case of GPT-3, each row of the K and V matrices is about 8kB in size. When a user's question is received, feature data is generated in the context phase for the number of corresponding words (K and V matrices with as many rows as the number of words), and in the generation phase, one row is added to the end of each K and V matrix for each word generated, and as the total number of tokens increases, the size of the feature data to be copied continues to increase.

종래에는 특징 데이터를 물리적으로 모두 복사하는 방식을 취하고 있는데, GPT-3에서 지원하는 최대 토큰 수가 2049 ~ 8001이기 때문에, 최대 특징 데이터 크기는 32~125MB (K, V 두 행렬에 대한 크기)까지 증가하게 되며 최악의 경우에는 빔 폭에 해당되는 횟수만큼 복사가 이루어지게 된다. Conventionally, a method has been taken to physically copy all feature data, but since the maximum number of tokens supported by GPT-3 is 2049 to 8001, the maximum feature data size increases to 32 to 125 MB (the size of the two matrices K and V), and in the worst case, copying is performed a number of times corresponding to the beam width.

본 발명의 일 실시예에서, 데이터의 복사는 맨 마지막 페이지에 대해서만 발생하게 되기 때문에 생성 단계가 진행되면서 전체 토큰의 수가 증가하더라도 최악의 경우 마지막 페이지에 해당되는 데이터만 복사되게 되므로 메모리 버스 대역폭 사용량을 종래 방식에 비해 매우 효과적으로 줄일 수 있게 되며, AI 가속기에서 단위시간당 처리할 수 있는 NLP 성능을 높일 수 있게 된다.In one embodiment of the present invention, since data copying occurs only for the last page, even if the total number of tokens increases as the generation step progresses, in the worst case, only the data corresponding to the last page is copied, so that the memory bus bandwidth usage can be reduced much more effectively than in the conventional method, and the NLP performance that can be processed per unit time in the AI accelerator can be increased.

본 발명에 따른 장치 또는 방법의 각 구성요소는 하드웨어 또는 소프트웨어로 구현되거나, 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, 각 구성요소의 기능이 소프트웨어로 구현되고 마이크로프로세서가 각 구성요소에 대응하는 소프트웨어의 기능을 실행하도록 구현될 수도 있다.Each component of the device or method according to the present invention may be implemented as hardware or software, or as a combination of hardware and software. In addition, the function of each component may be implemented as software, and a microprocessor may be implemented to execute the function of the software corresponding to each component.

본 명세서에 설명되는 시스템 및 기법들의 다양한 구현예들은, 디지털 전자 회로, 집적회로, FPGA(field programmable gate array), ASIC(application specific integrated circuit), 컴퓨터 하드웨어, 펌웨어, 소프트웨어, 및/또는 이들의 조합으로 실현될 수 있다. 이러한 다양한 구현예들은 프로그래밍가능 시스템 상에서 실행 가능한 하나 이상의 컴퓨터 프로그램들로 구현되는 것을 포함할 수 있다. 프로그래밍가능 시스템은, 저장 시스템, 적어도 하나의 입력 디바이스, 그리고 적어도 하나의 출력 디바이스로부터 데이터 및 명령들을 수신하고 이들에게 데이터 및 명령들을 전송하도록 결합되는 적어도 하나의 프로그래밍가능 프로세서(이것은 특수 목적 프로세서일 수 있거나 혹은 범용 프로세서일 수 있음)를 포함한다. 컴퓨터 프로그램들(이것은 또한 프로그램들, 소프트웨어, 소프트웨어 애플리케이션들 혹은 코드로서 알려져 있음)은 프로그래밍가능 프로세서에 대한 명령어들을 포함하며 "컴퓨터가 읽을 수 있는 기록매체"에 저장된다.Various implementations of the systems and techniques described herein can be implemented as digital electronic circuits, integrated circuits, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementations of one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor or a general purpose processor) coupled to receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device. Computer programs (also known as programs, software, software applications, or code) include instructions for the programmable processor and are stored on a "computer-readable medium."

컴퓨터가 읽을 수 있는 기록매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 이러한 컴퓨터가 읽을 수 있는 기록매체는 ROM, CD-ROM, 자기 테이프, 플로피디스크, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등의 비휘발성(non-volatile) 또는 비일시적인(non-transitory) 매체일 수 있으며, 또한 데이터 전송 매체(data transmission medium)와 같은 일시적인(transitory) 매체를 더 포함할 수도 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.A computer-readable recording medium includes any type of recording device that stores data that can be read by a computer system. Such a computer-readable recording medium can be a non-volatile or non-transitory medium, such as a ROM, a CD-ROM, a magnetic tape, a floppy disk, a memory card, a hard disk, a magneto-optical disk, a storage device, and may further include a transitory medium, such as a data transmission medium. In addition, the computer-readable recording medium can be distributed over a network-connected computer system, so that the computer-readable code can be stored and executed in a distributed manner.

본 명세서의 순서도에서는 각 과정들을 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 개시의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 개시의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 개시의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 본 명세서의 순서도에 기재된 순서를 변경하여 실행하거나 각 과정들 중 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 본 명세서의 순서도는 시계열적인 순서로 한정되는 것은 아니다.Although the flowchart of this specification describes each process as being executed sequentially, this is only an illustrative description of the technical idea of one embodiment of the present disclosure. In other words, a person having ordinary skill in the art to which one embodiment of the present disclosure belongs may change the order described in the flowchart of this specification without departing from the essential characteristics of one embodiment of the present disclosure, or may modify and modify and apply various modifications and variations such as executing one or more of the processes in parallel. Therefore, the flowchart of this specification is not limited to a chronological order.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an example of the technical idea of the present embodiment, and those skilled in the art will appreciate that various modifications and variations may be made without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are not intended to limit the technical idea of the present embodiment, but rather to explain it, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The protection scope of the present embodiment should be interpreted by the following claims, and all technical ideas within a scope equivalent thereto should be interpreted as being included in the scope of the rights of the present embodiment.

100 AI 가속기,
110 프로세서,
120 MMU(Memory Management Unit),
130 TLB(Translation Lookaside Buffer),
140 DMA(Direct Memory Access),
150 연산유닛,
200 외부 메모리.100 AI accelerators,
110 processor,
120 Memory Management Units (MMUs);
130 Translation Lookaside Buffer (TLB),
140 DMA (Direct Memory Access),
150 operation units,
200 External memory.

Claims

A memory mapping method for beam searching in a computing system that uses a mapping from a virtual address to a physical address to access physical memory using a virtual address,
When copying a first array defined on a virtual address to a second array, a step of changing the mapping information for a portion of the physical memory mapped to the second array that will store the data to be copied to point to the physical memory area mapped to the first array
A memory mapping method for beam search having a .

In the first paragraph,
Mapping from virtual addresses to physical memory is done on a page-by-page basis.
In the case where the start address value of the first array is limited to a multiple of the page size, only the VPN to PPN mapping information is updated for pages that have the same value among the pages mapped to the first array, and in the case where there is a part of data that is different in one page, a step of reading the value from the physical page mapped to the first array and copying it to the physical page mapped to the second array.
A memory mapping method for beam search, comprising:

In the first paragraph,
Mapping from virtual addresses to physical memory is done on a page-by-page basis.
If the starting address value of the first array is limited to a multiple of the page size,
For pages except the last page among the pages mapped to the first array, only the VPN to PPN mapping information is updated.
For the last page, if the last page has data as full as the page size, only the mapping information from VPN to PPN is updated, and if not, only for the data stored in the last page, a step of reading the value from the physical page mapped to the first array and copying it to the physical page mapped to the second array.
A memory mapping method for beam search, comprising:

In the second or third paragraph,
The above computing system includes a PPN information table that stores information about the number of VPNs mapped to one PPN,
A step of decreasing the number of VPNs mapped to the specific PPN in the PPN information table by 1 for arrays that are no longer used among the arrays copied to the specific PPN,
A memory mapping method for beam search, comprising the step of adding the PPN to a free PPN list that manages pages that are not currently in use when the number of VPNs mapped to the PPN becomes 0 as a result of the above reduction.

In claim 2 or 3, the computing system supports various page sizes,
A memory mapping method for beam search, comprising the step of converting the size of a page mapped to the first array into a page of a small size and copying the data to a page allocated to the second array when the last page of the first array is set to a large-sized page but the size of data contained in the last page is small.

In claim 2 or 3, the computing system supports various page sizes,
A memory mapping method for beam search, comprising a step of allocating a page of a relatively small size to the last page in the process of copying feature data generated at the context stage as many times as the number of beams when performing an operation applying beam search of NLP (Natural Language Processing) in the above computing system.

In the second or third paragraph,
The above computing system supports various page sizes,
A memory mapping method for beam search, comprising a step of selecting and using a page size that is the same as or a multiple of the size of feature data added for each word generated in the generation phase.

In the second or third paragraph,
The first array is the feature data generated in the context stage.
The second array is the feature data from the generation stage.
Memory mapping method for beam search.

A memory mapping device for beam searching in a computing system that uses a mapping from virtual addresses to physical addresses to access physical memory using virtual addresses,
The above matching device changes the mapping information for the part of the physical memory mapped to the second array that will store the data to be copied when copying the first array defined on the virtual address to the second array to point to the physical memory area mapped to the first array.
Memory mapping device for beam search.

In Article 9,
The above memory mapping device has a VPN to PPN mapping table for storing mapping information from a VPN to a PPN.
Mapping from virtual addresses to physical memory is done on a page-by-page basis.
In the case where the start address value of the first array is limited to a multiple of the page size, the memory mapping device updates only the VPN to PPN mapping information for pages that have the same value among the pages mapped to the first array, and in the case where there is a part of data that is different in one page, the value is read from the physical page mapped to the first array and copied to the physical page mapped to the second array.
Memory mapping device for beam search.

In Article 9,
The above memory mapping device has a VPN to PPN mapping table for storing mapping information from a VPN to a PPN.
Mapping from virtual addresses to physical memory is done on a page-by-page basis.
The above memory mapping device updates only the VPN to PPN mapping information for pages except the last page among the pages mapped to the first array.
For the last page, if the last page has data as full as the page size, only the mapping information from VPN to PPN is updated, otherwise, only for the data stored in the last page, the values are read from the physical page mapped to the first array and copied to the physical page mapped to the second array.
Memory mapping device for beam search.

In clause 10 or 11,
The above memory mapping device includes a PPN information table that stores information about the number of VPNs mapped to one PPN, and a free PPN list that manages pages that are not currently in use.
The above memory mapping device decreases the number of VPNs mapped to the specific PPN in the PPN information table by 1 for arrays that are no longer used among the arrays copied to the specific PPN.
As a result of the above reduction, when the number of VPNs mapped to the PPN becomes 0, the memory mapping device adds the PPN to the free PPN list.
Memory mapping device for beam search.

In clause 10 or 11,
The above computing system supports various page sizes,
If the last page of the first array is set to a large-sized page, but the size of the data contained in the last page is small, the memory mapping device converts the size of the page mapped to the first array to a small-sized page and copies the data to the page allocated to the second array.
Memory mapping device for beam search.

In clause 10 or 11,
The above computing system supports various page sizes,
When performing an operation applying beam search of NLP (Natural Language Processing) in the above computing system, the memory mapping device supports various page sizes, and in the process of copying the feature data generated in the context stage to the number of beams, a page of a relatively small size is allocated to the last page.
Memory mapping device for beam search.

In clause 10 or 11,
The above computing system supports various page sizes,
The above memory mapping device selects and uses a page size that is the same as or a multiple of the size of the feature data added each time a word is generated in the generation phase, if there is a page with the same size as or a multiple of the size of the feature data.
Memory mapping device for beam search.

In clause 10 or 11,
The first array is the feature data generated in the context stage.
The second array is the feature data from the generation stage.
Memory mapping device for beam search.