[go: up one dir, main page]

CN120705075A - Data storage method, data reading method and graphics processor - Google Patents

Data storage method, data reading method and graphics processor

Info

Publication number
CN120705075A
CN120705075A CN202510787347.3A CN202510787347A CN120705075A CN 120705075 A CN120705075 A CN 120705075A CN 202510787347 A CN202510787347 A CN 202510787347A CN 120705075 A CN120705075 A CN 120705075A
Authority
CN
China
Prior art keywords
data
page data
graphics processor
memory
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510787347.3A
Other languages
Chinese (zh)
Inventor
郁晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hygon Information Technology Co Ltd
Original Assignee
Hygon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hygon Information Technology Co Ltd filed Critical Hygon Information Technology Co Ltd
Priority to CN202510787347.3A priority Critical patent/CN120705075A/en
Publication of CN120705075A publication Critical patent/CN120705075A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/3013Organisation of register space, e.g. banked or distributed register file according to data content, e.g. floating-point registers, address registers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3086Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本发明实施例提供一种数据存储方法、数据读取方法及图形处理器。其中,所述数据存储方法应用于图形处理器,包括:获取待压缩页面数据;利用预先确定的目标压缩算法进行压缩,得到压缩后的页面数据;目标压缩算法的目标压缩率确定图形处理器内存中可存储的数据量;将压缩后的页面数据中,将数据量与可存储的数据量相等的页面数据存储至图形处理器内存,数据量超出的页面数据存储至已申请的非图形处理器内存,并记录存储地址;已申请的非图形处理器内存是在图形处理器之外的处理器中申请的内存,且存储容量至少由目标压缩率确定。本发明实施例提供的数据存储方法提高图形处理器内存中数据存储和访问的效率。

An embodiment of the present invention provides a data storage method, a data reading method, and a graphics processor. The data storage method is applied to a graphics processor and includes: obtaining page data to be compressed; compressing the data using a predetermined target compression algorithm to obtain compressed page data; determining the amount of data that can be stored in the graphics processor memory based on the target compression ratio of the target compression algorithm; storing, in the compressed page data, page data with an amount equal to the amount of data that can be stored in the graphics processor memory, and storing page data with an excess amount of data in an already applied non-graphics processor memory, and recording the storage address; the already applied non-graphics processor memory is memory applied in a processor other than the graphics processor, and the storage capacity is at least determined by the target compression ratio. The data storage method provided by the embodiment of the present invention improves the efficiency of data storage and access in the graphics processor memory.

Description

Data storage method, data reading method and graphic processor
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a data storage method, a data reading method and a graphics processor.
Background
With the increasing demand for computing resources in the areas of high performance computing (High Performance Computing, HPC) and deep learning (DEEP LEARNING, DL), the memory capacity of graphics processors (Graphics Processing Unit, GPU) is becoming a bottleneck limiting performance improvement.
GPU memory expansion techniques are a class of approaches to solve the memory bottleneck problem by adding physical memory, shared memory, or optimizing memory management policies. One existing GPU memory expansion technique reduces the occupation of data in the GPU memory by performing data compression processing between the data cache and the memory of the GPU, however, the compressed page data may generate memory fragments, which affects the memory access performance. In this context, how to provide a technical solution to improve the data storage and access efficiency of a graphics processor is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The embodiment of the invention provides a data storage method, a data reading method and a graphics processor, which are used for improving the data storage and access efficiency of the graphics processor.
In a first aspect, an embodiment of the present invention provides a data storage method, applied to a graphics processor, including:
acquiring page data to be compressed;
Compressing the page data to be compressed by using a predetermined target compression algorithm to obtain compressed page data, wherein the target compression rate of the target compression algorithm is used for determining the data quantity storable in the internal memory of the graphic processor;
storing the compressed page data, wherein the page data with the data quantity equal to the storable data quantity, in the memory of the graphics processor, and the page data with the data quantity exceeding the storable data quantity, in the memory of the applied non-graphics processor, and recording the storage address of the page data in the memory of the applied non-graphics processor;
Wherein the applied non-graphics processor memory is applied memory in a processor external to the graphics processor, and the storage capacity of the applied non-graphics processor memory is determined based at least on the target compression rate.
In a second aspect, an embodiment of the present application provides a data reading method, applied to a graphics processor, including:
Acquiring a data reading request, wherein the data reading request comprises a data reading address;
When the page data corresponding to the data reading address is determined to be compressed page data, and the compressed page data is stored in a graphic processor memory and an applied non-graphic processor memory, acquiring the page data with the data quantity equal to the storable data quantity in the graphic processor memory based on the data reading address in the graphic processor memory, and acquiring the page data with the data quantity exceeding the storable data quantity based on a storage address in a data reading request in the applied non-graphic processor memory to obtain the compressed page data, wherein the compressed page data is stored by the data storage method according to the first aspect;
determining a decompression algorithm of the compressed page data by utilizing a target compression algorithm, and obtaining decompressed page data;
and reading data from the decompressed page data.
In a third aspect, an embodiment of the present application provides a graphics processor, including:
The device comprises a compressor, a graphics processor and a target compression algorithm, wherein the compressor is used for acquiring page data to be compressed and compressing the page data to be compressed by utilizing the predetermined target compression algorithm to obtain compressed page data;
The memory controller is used for storing the compressed page data, the page data with the data volume equal to the storable data volume, into the memory of the graphics processor, storing the page data with the data volume exceeding the storable data volume into the applied memory of the non-graphics processor, and recording the storage address of the page data in the applied memory of the non-graphics processor;
Wherein the applied non-graphics processor memory is applied memory in a processor external to the graphics processor, and the storage capacity of the applied non-graphics processor memory is determined based at least on the target compression rate.
The data storage method provided by the embodiment of the invention is applied to a graphics processor, and is characterized in that firstly page data to be compressed is obtained, the page data to be compressed is compressed by utilizing a predetermined target compression algorithm, so as to obtain the compressed page data, and the target compression rate of the target compression algorithm is used for determining the storable data quantity in the memory of the graphics processor. Because the compressed data obtained after the compression processing of the page data to be compressed may have the situation that the data volume (storable data volume) determined by the target compression rate of the target compression algorithm does not meet the predetermined requirement, in the embodiment of the invention, for one compressed page data, page data with the data volume equal to the storable data volume in the compressed page data can be stored into the internal memory of the graphics processor, and meanwhile, page data with the data volume exceeding the storable data volume in the compressed page data is stored into the internal memory of the applied non-graphics processor, and the storage address of the applied non-graphics processor is recorded, and the internal memory of the applied non-graphics processor is determined at least based on the target compression rate of the target compression algorithm, so that the compressed page data can be stored completely at one time. Therefore, the method can not only ensure that the data stored in the internal memory of the graphic processor has no gap, but also avoid the graphic processor from adopting a page redistribution mode to reapply the page data which exceeds the storable data amount in the compressed page data stored in the internal memory of the graphic processor, thereby improving the data storage efficiency in the internal memory of the graphic processor; when the compressed page data stored in the applied non-graphic processor memory is needed to be used later, the compressed page data can be directly obtained based on the recorded storage address, so that the additional cost caused by storing the compressed page data by the graphic processor in a page redistribution mode is avoided on the basis of reducing the capacity occupation of the graphic processor memory and realizing the expansion of the memory capacity of the graphic processor, and the aim of improving the efficiency of data storage in the graphic processor memory is fulfilled.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present invention.
Fig. 2 is a flowchart of a data reading method according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a graphics processor according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of an implementation process of a data storage method according to an embodiment of the present invention.
Detailed Description
As described in the background section, with the increasing demand for data processing, the demand for memory capacity of Graphics Processors (GPUs) for assisting in processing data operations is increasing, and the memory capacity of Graphics Processors (GPUs) (GPU memory) is becoming a bottleneck for limiting performance improvement. Taking the current popular large language model (Large Language Model, LLM) as an example, when performing hybrid precision training, a single model copy requires about 1260GB of GPU memory, whereas the memory capacity of a single GPU for LLM training is typically tens to hundreds of GB, thus requiring multiple GPUs to perform model deployment and training together, e.g., deploying a single 1260GB LLM copy requires about 16 80GB GPUs.
However, introducing multiple GPUs for training brings about many difficulties, and although model parallel techniques such as pipeline parallel (PIPELINE PARALLELISM) and tensor parallel (Tensor Parallelism) successfully solve the problem of model segmentation, these techniques introduce additional high interconnection bandwidth requirements and complex requirements for resource scheduling such as computation and communication.
Also, there are factors that make it difficult to expand the physical capacity of the GPU memory.
Since the memory chip in the GPU needs to be packaged inside the GPU, taking high bandwidth memory (High Bandwidth Memory, HBM) as an example, only a certain number of HBM stacks (HBM Stacks) can be accommodated in the existing GPU, and the memory capacity of each HBM stack is fixed, so that the physical capacity of the GPU memory is limited by the physical space.
And the capacity expansion requirement of the GPU physical memory is lower than the memory capacity expansion requirement, so that the capacity expansion requirement of the GPU physical memory is always higher than the capacity expansion requirement of the GPU physical memory in order to ensure the computing performance of the GPU, and the capacity expansion requirement of the GPU physical memory is preferably met in the design and development stage of the GPU.
The memory compression technology can reduce the memory occupation requirement, is widely applied to CPU systems, and compresses data stored in a memory by applying a compression algorithm to reduce the memory occupation, so that the performance degradation of the CPU system caused by insufficient memory is delayed. Memory compression techniques are equally applicable to GPUs.
An existing GPU memory expansion technology reduces occupation of data in GPU memory by performing data compression processing between a data cache and a memory of the GPU, however, since the data is stored in the GPU memory in a preset format, after the data is compressed, some memory space may exist, and no data exists in the compressed page data, and memory fragments may be generated by the compressed page data, so as to facilitate orderly storage of the compressed page data, a memory manager in the GPU applies for a memory page again by adopting a page reallocation (re-allocation) mode, so as to store the compressed page data. However, the page reallocation not only occupies a certain amount of bandwidth, but also increases memory access delay, thereby affecting data storage and access efficiency.
In order to solve the above problems, the present invention provides a data storage method, a data reading method, and a graphics processor, so as to improve the data storage and access efficiency in the GPU memory.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flowchart of a data storage method according to an embodiment of the present invention. Referring to fig. 1, the data storage method is applied to a graphic processor, and includes the following steps.
And S110, acquiring page data to be compressed.
Data generated during the running of the program by the GPU cannot be contained entirely in a cache (e.g., an L1 cache) internal to the GPU computing core and needs to be stored in GPU memory for subsequent access. The page data to be compressed may be data generated by a computing core of the GPU and not accommodated in a cache of the GPU computing core.
After acquiring the page data to be compressed, step S120 is continued.
And step S120, compressing the page data to be compressed by utilizing a predetermined target compression algorithm to obtain compressed page data.
The target compression rate of the target compression algorithm is used to determine the amount of data that can be stored in the graphics processor memory.
And for the page data to be compressed, the compression processing can be carried out through a predetermined target compression algorithm, so that the capacity occupation of the page data to be compressed in the memory of the graphics processor is reduced, and the memory capacity expansion of the graphics processor is realized.
In some embodiments, to determine the target compression algorithm to improve the compression performance of the page data to be compressed, the following steps are performed before the step S110 is performed.
And S100, acquiring debugging page data.
Before the graphics processor runs the program, it needs to test according to the test data set to analyze the performance of the program. The debug page data is data generated when the graphics processor runs a program according to the test data set. After obtaining the debug page data, step S101 is continuously executed.
And step S101, compressing the debug page data for a plurality of times by utilizing a plurality of candidate compression algorithms, determining the compression rate of each compression of the debug page data by using the candidate compression algorithm aiming at each candidate compression algorithm, and obtaining the average compression rate of the debug page data based on the compression rate determined each time.
The resulting data may vary considerably due to the type of program running on the graphics processor. Different candidate compression algorithms have different compression performances for different data, so that multiple candidate compression algorithms can be utilized to compress the debug page data for multiple times, thereby obtaining the average compression rate of each compression algorithm on the debug page data, and the compression algorithm suitable for the program can be selected according to the requirement subsequently so as to improve the performance of the graphics processor.
The candidate compression algorithm is various common compression algorithms capable of performing data compression, such as Huffman (Huffman) coding, LZ77 (compression algorithm based on sliding window), DEFLATE (compression algorithm combining LZ77 algorithm and Huffman coding), and the like, which are widely applied to file compression and data transmission, and can completely recover original data.
After determining the average compression rate of the debug page data under each compression algorithm, step S102 is continuously executed.
And S102, selecting a candidate compression algorithm corresponding to the maximum value in the average compression ratios as a target compression algorithm, and taking the average compression ratio of the target compression algorithm as a target compression ratio.
Selecting the candidate compression algorithm corresponding to the maximum value in the average compression rate can further reduce the data storage pressure of the internal memory of the graphic processor, thereby improving the expansion performance of the internal memory capacity of the graphic processor. In some other embodiments, the target compression algorithm may also be selected according to other performance requirements, for example, a compression algorithm with the highest compression speed and an average compression rate capable of meeting the data storage requirement of the graphics processor memory is selected as the target compression algorithm.
In some embodiments, after the step S102 is performed, the target compression rate is saved for use in subsequent steps.
It will be appreciated that in some embodiments, the target compression algorithm may be determined directly. For example, a compression algorithm default used in a graphics processor is used as the target compression algorithm, or a specific compression algorithm is used as the target compression algorithm, so that the steps S100 to S102 are not required to be executed to determine the target compression algorithm, the logic of the data storage method is simplified, and the execution efficiency of the data storage method is improved.
With continued reference to fig. 1, after step S120 is performed, step S130 is continued to be performed.
And 130, storing the compressed page data with the data volume equal to the storable data volume into the internal memory of the graphic processor, storing the page data with the data volume exceeding the storable data volume into the internal memory of the applied non-graphic processor, and recording the storage address of the page data in the internal memory of the applied non-graphic processor.
Wherein the applied non-graphics processor memory is applied memory in a processor external to the graphics processor, and the storage capacity of the applied non-graphics processor memory is determined based at least on the target compression rate.
The graphics processor memory is global memory on a graphics processor executing the data storage method, and the applied non-graphics processor memory is memory applied on other processors than the graphics processor executing the data storage method, for example, memory of a CPU. In some embodiments, the graphics processor has a direct connection with the non-graphics processor to increase the speed of subsequent access to data stored in the graphics processor memory and the non-graphics processor memory. For example, the graphics processor and non-graphics processor may form a direct connection through high bandwidth interconnect technologies such as NVLink, CXL, etc. to improve the performance of the graphics processor.
Since the compressed page data may have different data amounts after the page data to be compressed is compressed by using the predetermined target compression algorithm, that is, if the compressed page data is directly stored in the graphics processor memory, there may be a data amount of the compressed page data exceeding the storable data amount of the graphics processor memory, so that the memory manager in the graphics processor performs page reassignment on the graphics processor memory, which may affect the storage and access efficiency of the graphics processor to data.
Meanwhile, each compression algorithm has the compression rate realized by itself, namely, a predetermined target compression algorithm has the corresponding target compression rate, after the page data to be compressed is compressed according to the target compression algorithm, the data volume in the compressed page data is at least equal to the storable data volume, and the situation that the storable data volume is exceeded may exist. It can be understood that, since the data amount of the compressed page data is at least equal to the storable data amount, the compressed page data is sliced according to the storable data amount, so that no space waste exists.
Therefore, the compressed page data, the page data with the same data quantity as the storable data quantity, are stored in the memory of the graphic processor, so that the memory manager in the graphic processor can be prevented from carrying out page reassignment on the memory of the graphic processor, and the data storage and access efficiency is improved. Further, the compressed page data includes page data exceeding the storable data volume in addition to being equal to the storable data volume, so that the data needs to be stored in the applied non-graphics processor memory to ensure the data storage and access performance of the graphics processor, and the page memory is prevented from being reapplied by adopting a page redistribution technology, so that the compressed page data is integrally stored.
In some embodiments, after performing step S120 to obtain compressed page data, the data storage method further includes determining a corresponding actual compression rate based on the compressed page data. The actual compression rate may calculate an actual compressed data amount of the compressed page data.
Specifically, in some embodiments, the actual compression rate is a ratio of an amount of data before uncompressed of the page data to be compressed to an amount of data actually compressed of the page data after compression, for example, the amount of data before uncompressed of the page data to be compressed is 4KB, the amount of data actually compressed of the page data after compression is 2KB, and then the actual compression rate is 4KB/2 kb=2.
And after determining the actual compression rate corresponding to the compressed page data, comparing the actual compression rate with the target compression rate.
If the actual compression rate is smaller than the target compression rate, determining that the actual compressed data amount is larger than the storable data amount, and executing the step S130, namely storing the page data with the data amount equal to the storable data amount in the graphics processor memory, storing the page data with the data amount exceeding the storable data amount in the applied non-graphics processor memory, and recording the storage address of the page data in the applied non-graphics processor memory.
If the actual compression rate is greater than or equal to the target compression rate, determining that the actual compressed data volume is less than or equal to the storable data volume, which means that there is no page data exceeding the storable data volume, and the storable data volume of the graphics processor memory satisfies the storing of the compressed page data, the compressed page data may be directly stored into the graphics processor memory. Therefore, the compressed page data can be accurately and properly stored completely by judging the actual compression rate and the target compression rate, so that the data storage efficiency is improved.
It can be seen that the data storage method provided by the embodiment of the invention is applied to a graphics processor, and is used for firstly obtaining page data to be compressed, and compressing the page data to be compressed by utilizing a predetermined target compression algorithm to obtain compressed page data, wherein the target compression rate of the target compression algorithm is used for determining the storable data amount in the memory of the graphics processor. Because the compressed data obtained after the compression processing of the page data to be compressed may have the situation that the data volume (storable data volume) determined by the target compression rate of the target compression algorithm does not meet the predetermined requirement, in the embodiment of the invention, for one compressed page data, page data with the data volume equal to the storable data volume in the compressed page data can be stored into the internal memory of the graphics processor, and meanwhile, page data with the data volume exceeding the storable data volume in the compressed page data is stored into the internal memory of the applied non-graphics processor, and the storage address of the applied non-graphics processor is recorded, and the internal memory of the applied non-graphics processor is determined at least based on the target compression rate of the target compression algorithm, so that the compressed page data can be stored completely at one time. Therefore, the method can not only ensure that the data stored in the internal memory of the graphic processor has no gap, but also avoid the graphic processor from adopting a page redistribution mode to reapply the page data which exceeds the storable data amount in the compressed page data stored in the internal memory of the graphic processor, thereby improving the data storage efficiency in the internal memory of the graphic processor; when the compressed page data stored in the applied non-graphic processor memory is needed to be used later, the compressed page data can be directly obtained based on the recorded storage address, so that the additional cost caused by storing the compressed page data by the graphic processor in a page redistribution mode is avoided on the basis of reducing the capacity occupation of the graphic processor memory and realizing the expansion of the memory capacity of the graphic processor, and the aim of improving the efficiency of data storage in the graphic processor memory is fulfilled.
In some embodiments, after determining the target compression algorithm, the data storage method further comprises the following steps before the step S110 is performed.
And obtaining the available capacity of the memory of the graphic processor. The available capacity of the graphics processor memory included in the graphics processor is the maximum memory capacity that can be applied by a program running on the graphics processor, so that in the subsequent steps, the capacity that can be used by the program running on the graphics processor can be expanded to the maximum extent.
After the step of obtaining the available capacity of the graphics processor memory is performed, the data storage method further comprises the step of determining an application capacity on the non-graphics processor memory based on the available capacity, wherein the application capacity is used as an applied non-graphics processor memory, and is determined by multiplying the available capacity by a difference value between the target compression rate and a first value.
The application capacity of the non-graphic processor memory is the maximum memory capacity which can be applied by the program running on the graphic processor on the non-graphic processor memory. In order to ensure that the applied non-graphics processor memory can accommodate all data amounts determined in excess of the target compression rate, the applied capacity is multiplied by the available capacity by the difference between the target compression rate and a first value, wherein the first value may be 1, so that the non-graphics processor memory does not cause data storage or access anomalies due to insufficient capacity before the graphics processor memory is fully occupied.
In some specific embodiments, the applied non-graphics processor memory is further determined according to a pre-configured non-graphics processor memory capacity threshold, so as to avoid the excessive capacity occupation of the non-graphics processor memory, which affects the performance of the non-graphics processor.
For example, after determining the target compression algorithm, the target compression rate of the target compression algorithm is 4, the actual memory capacity of the graphics processor memory is 40GB, then a memory space with a capacity of 40GB is applied on the graphics processor memory, and a memory space with a capacity of 40GB (4-1) =120 GB is applied on the non-graphics processor memory (e.g. CPU memory), so as to ensure the normal running of the program, and if in the above example, the operating system or the program is preconfigured with a non-graphics processor memory capacity threshold of 100GB, a memory space with a capacity of 100GB is applied on the non-graphics processor memory, instead of a memory space with a capacity of 120 GB.
In some embodiments, the non-graphics processor memory is a lock page memory to avoid swapping data stored in the non-graphics processor memory to the hard disk. Because the data in the page-locked memory is not exchanged into the hard disk, the access performance of the graphics processor to the data in the non-graphics processor memory can be ensured.
In some specific embodiments, after the step of taking the average compression rate of the target compression algorithm as the target compression rate, the following steps are further performed.
And storing a base address of the applied non-graphics processor memory, wherein the base address is used for indicating the storage position of the page data exceeding the storable data amount in the applied non-graphics processor memory in combination with the storage address.
After the target compression rate is stored, when the actual compression rate and the target compression rate need to be compared, the stored target compression rate can be read, and the efficiency of a data storage method is improved. And after the base address of the applied non-graphics processor memory is stored, when the compressed page data stored in the applied non-graphics processor memory needs to be accessed, quick and accurate memory access can be performed according to the base address.
In some embodiments, the actual compression rate and the memory address are recorded in a Page Table Entry (Page Table Entry) of the compressed Page data, and a Page compression flag of the compressed Page data is also recorded in the Page Table Entry. The page compression mark is used for indicating whether the compressed page data is compressed or not, and the actual compression rate is the ratio of the uncompressed data volume of the page data to be compressed to the actual compressed data volume of the page data after compression.
By recording the page compression flag, the actual compression rate and the storage address, when compressed page data stored in the internal memory of the graphic processor and the applied non-graphic processor respectively are accessed later, the compressed page data can be read based on the page compression flag, the actual compression rate and the storage address, so that the data access performance of the graphic processor is improved.
In some embodiments, in the page table entry of the compressed page data, the page compression flag, the actual compression rate, and the storage address (i.e., the offset address in the applied non-processor memory) are 1bit, 3bit, and 16bit, respectively.
To further improve the data access performance of the graphics processor, in some embodiments, after the step S120, before the step of determining the corresponding actual compression rate based on the compressed page data, the data storage method further includes storing the compressed page data into a compressed data cache of the graphics processor.
Compressed data caching may reduce the frequency of access to the GPU/CPU DRAM to improve overall system performance. Because address indexing is also required to read compressed Page data from the applied non-graphics processor memory of the CPU DRAM, access to the TLB (Translation Lookaside Buffer, address translation look-aside buffer), page Table, and the acquisition and use of the base address of the applied non-graphics processor memory are involved. And the compressed page data is directly cached by the compressed data cache, so that the access expense brought by the access to the GPU/CPU DRAM can be reduced.
When the computing core in the graphic processor accesses the compressed data cache, the access speed is faster than that of the graphic processor memory, so that the data access performance of the graphic processor can be improved by preferentially using the compressed data cache to store the compressed page data; and when the compressed data cache cannot accommodate the compressed page data, namely the compressed page data needs to be stored into a memory, continuing to execute the step of comparing the actual compression rate with the target compression rate, thereby further improving the data storage and reading performance of the graphics processor.
The embodiment of the invention also provides a data reading method which is applied to the graphic processor, and referring to FIG. 2, the data reading method comprises the following steps.
Step S201, a data read request is acquired, the data read request including a data read address.
And the program running on the GPU sends a data reading request through a computing core of the GPU, so that data in a cache or a memory of the GPU is obtained.
After the data reading request is acquired, step S202 is performed.
Step S202, when the page data corresponding to the data reading address is determined to be compressed page data, and the compressed page data is stored in a graphic processor memory and an applied non-graphic processor memory, acquiring the page data with the data quantity equal to the storable data quantity in the graphic processor memory based on the data reading address in the graphic processor memory, and acquiring the page data with the data quantity exceeding the storable data quantity based on the storage address in the data reading request in the applied non-graphic processor memory, so as to obtain the compressed page data.
The compressed page data is data stored by the data storage method according to any one of the foregoing embodiments, and the storable data amount is determined by a target compression rate of a target compression algorithm adopted by the data storage method.
If the page data corresponding to the data read address is compressed page data, for example, the page compression mark of the page table entry records that the page data is the page data subjected to compression processing, and the page table entry records the storage address of the applied non-graphics processor memory, it is indicated that the compressed page data corresponding to the data read address is stored in the graphics processor memory and the applied non-graphics processor memory, and at this time, the page data can be read from the graphics processor memory and the non-graphics processor memory respectively to form the compressed page data.
In some specific embodiments, when it is determined that the page table entry of the compressed page data corresponding to the data read address records a storage address of the applied non-graphics processor memory, the method further includes the steps of obtaining a base address of the applied non-graphics processor memory, and obtaining a physical address of the non-graphics processor memory based on the base address and the storage address.
For example, the storage address of the applied non-graphics processor memory is an address offset (offset address) of the compressed page data stored in the non-graphics processor memory, so that a base address of the applied non-graphics processor memory needs to be obtained, and a physical address in the non-graphics processor memory is obtained according to the storage address on the basis of the base address, where the physical address in the non-graphics processor memory is an actual physical location of the compressed page data stored in the non-graphics processor memory.
If the page table entry of the compressed page data corresponding to the data read address does not record the storage address of the applied non-graphics processor memory, the compressed page data corresponding to the data read address is only stored in the graphics processor memory, and only the compressed page data corresponding to the data read address is required to be read from the graphics processor memory in the follow-up process.
In some specific embodiments, the step S202 includes the steps of:
The method comprises the steps of obtaining page data with the same data volume as storable data volume according to a data reading address in a graphic processor memory, obtaining page data corresponding to a physical address of a non-graphic processor memory in an applied non-graphic processor memory, obtaining compressed page data which is data exceeding the storable data volume, and obtaining the compressed page data based on the page data obtained in the graphic processor memory and the page data obtained in the applied non-graphic processor memory.
The compressed page data are respectively stored in the graphics processor memory and the applied non-graphics processor memory, specifically, the compressed page data with the same data amount as the storable data amount are stored in the graphics processor memory, and the page data with the data amount exceeding the storable data amount are stored in the applied non-graphics processor memory, so that the complete compressed page data are required to be respectively read from the graphics processor memory and the applied non-graphics processor memory, and the compressed page data can be decompressed subsequently to obtain the decompressed page data.
After the compressed page data is obtained, step S203 is continuously executed.
And step 203, determining a decompression algorithm of the compressed page data by utilizing a target compression algorithm, and obtaining decompressed page data.
The target compression algorithm is a compression algorithm used by the compressed page data, so that a corresponding decompression algorithm can be determined according to a compression mode of the target compression algorithm so as to decompress the compressed page data.
After the decompressed page data is obtained, step S204 is continued.
And step S204, reading data from the decompressed page data.
After decompressing the compressed page data, the graphics processor can directly read the data, and the graphics processor can directly access the data according to the program requirement.
It can be seen that, in the data reading method provided by the embodiment of the present invention, firstly, the data reading address of the data reading request is obtained, and then when the corresponding page data is determined to be the compressed page data and stored in the graphics processor memory and the applied non-graphics processor memory, since the compressed page data is compressed by the predetermined target compression algorithm, and the page data with the data amount equal to the storable data amount is stored in the graphics processor memory, and the page data with the data amount exceeding the storable data amount is stored in the applied non-graphics processor memory; therefore, the corresponding page data are respectively read from the internal memory of the graphic processor and the internal memory of the applied non-graphic processor according to the data reading address and the storage address in the data reading address, and finally the compressed page data are obtained, and decompressing the compressed page data by using the target compression algorithm to obtain page data to be compressed which is required to be accessed by the graphics processor, thereby realizing the expansion of the memory capacity of the graphics processor.
In some embodiments, after performing step S201, further comprising:
judging whether the data read address hits in the cache of the graphic processor;
If yes, determining that page data corresponding to a data reading address is not compressed, and reading data requested by the data reading request from a cache hit by the data reading address;
if not, determining the page data corresponding to the data reading address as the compressed page data.
The cache of the graphics processor may be a level one cache (L1 cache) or a level two cache (L2 cache) or a compressed data cache of the graphics processor.
When the data read address hits in the L1 cache or the L2 cache, it is indicated that the page data corresponding to the data read address is not compressed, and the page data can be directly accessed in the L1 cache or the L2 cache, so as to execute the data read request.
The compressed data cache can store a certain amount of compressed page data, when the data read address does not hit the first-level cache and the second-level cache, but hits the compressed data cache, the compressed page data is stored in the compressed data cache, so that the compressed page data can be quickly read from the compressed data cache, the index of the graphics processor memory and the applied non-graphics processor memory according to the address of the compressed page data is avoided, the TLB and the page table of the graphics processor memory are not required to be accessed, the access cost for accessing the graphics processor memory and the non-graphics processor memory is reduced, and the data reading efficiency of the graphics processor is improved.
When the compressed data cache is missed, the physical address of the applied non-graphics processor memory is calculated by utilizing the data read address, the storage address and the base address according to the mode, and then the compressed page data is read according to the physical address of the applied non-graphics processor memory and the data read address.
The embodiment of the invention also provides a graphics processor. Referring to fig. 3, the graphic processor 1 includes:
The compressor 11 is used for acquiring page data to be compressed, and compressing the page data to be compressed by utilizing a predetermined target compression algorithm to obtain compressed page data, wherein the target compression rate of the target compression algorithm is used for determining the storable data quantity in the memory of the graphics processor;
The memory controller 12 is configured to store, in the graphics processor memory, page data having a data size equal to the storable data size, out of the compressed page data, page data having a data size exceeding the storable data size, into the applied non-graphics processor memory, and record a storage address of the page data in the applied non-graphics processor memory;
Wherein the applied non-graphics processor memory is applied memory in a processor external to the graphics processor, and the storage capacity of the applied non-graphics processor memory is determined based at least on the target compression rate.
Optionally, the memory controller 12 is further configured to, according to the obtained data read address of the data read request, obtain, when it is determined that the page data corresponding to the data read address is compressed page data and the compressed page data is stored in the graphics processor memory 14 and the applied non-graphics processor memory 21, page data with a data amount equal to the storable data amount in the graphics processor memory 14 based on the data read address, and obtain, in the applied non-graphics processor memory 21, page data with a data amount exceeding the storable data amount based on the storage address in the data read request;
The compressor 11 is further configured to determine a decompression algorithm for the compressed page data by using a target compression algorithm, and obtain decompressed page data;
the graphics processor 1 further comprises:
And the data reading module is used for reading the data from the decompressed page data.
Optionally, the graphics processor 1 further includes:
The compression rate comparison module is used for determining a corresponding actual compression rate based on the compressed page data, and comparing the actual compression rate with the target compression rate, wherein the actual compression rate can calculate the actual compression data amount of the compressed page data;
If the actual compression rate is smaller than the target compression rate, determining that the actual compression data amount is larger than the storable data amount, storing page data with the data amount equal to the storable data amount in the graphic processor memory in the compressed page data, storing the page data with the data amount exceeding the storable data amount in the applied non-graphic processor memory, and recording the storage address of the page data in the applied non-graphic processor memory;
And if the actual compression rate is greater than or equal to the target compression rate, determining that the actual compressed data volume is less than or equal to the storable data volume, and storing the compressed page data into the memory of the graphics processor.
Optionally, the graphics processor 1 further includes:
The performance analysis module is used for acquiring debugging page data, compressing the debugging page data for a plurality of times by utilizing a plurality of candidate compression algorithms, determining the compression rate of each time of compressing the debugging page data by using the candidate compression algorithm according to each candidate compression algorithm, obtaining the average compression rate of the debugging page data based on the compression rate determined each time, selecting the candidate compression algorithm corresponding to the maximum value in the average compression rate as a target compression algorithm, and taking the average compression rate of the target compression algorithm as a target compression rate.
The performance analysis module may provide a target compression algorithm and a target compression rate.
For example, the performance analysis module may be executed by:
firstly, continuously scanning debugging page data which is generated by GPU calculation in the current performance analysis stage and temporarily stored on a GPU memory system by the performance analysis module;
These debug page data will then be continuously fed into the compressor 11, where the compressor 11 integrates various mainstream compression algorithms. At this time, all compression algorithms are used for compression, and the compression rate of each compression algorithm is calculated.
Finally, the compression ratios in multiple iterations are averaged according to the dimensions of each compression algorithm. The compression algorithm with the minimum mean compression rate is selected as the practical compression algorithm applied in the method, namely the target compression algorithm, and the corresponding mean compression rate is rounded and then written into a target compression rate register for subsequent use.
Optionally, the graphics processor 1 further includes:
The system comprises a memory application module, a non-graphics processor memory, a memory application module, a memory control module and a memory control module, wherein the memory application module is used for acquiring the available capacity of the graphics processor memory, determining the application capacity on the non-graphics processor memory based on the available capacity as the applied non-graphics processor memory, wherein the application capacity is the storage capacity of the non-graphics processor memory, and the application capacity is determined by multiplying the available capacity by the difference value between the target compression rate and a first value;
a target compression ratio register 121 for storing the target compression ratio.
The memory application module can be triggered by the performance analysis module to execute the application of the non-graphic processor memory, namely the initialization flow of the capacity expansion system.
The implementation process of the memory application module comprises the following steps:
firstly, detecting and acquiring actual parameters in a current host, including:
1) The available capacity of the GPU DRAM;
2) The available capacity of the CPU DRAM;
3) The system environment variable configured by the upper user is the upper limit threshold value of the memory capacity occupied by the memory application module in the CPU DRAM;
Then, the actual detected parameter configuration and the data stored in the target compression rate register (target compression rate) are taken as input, and the current final initialization parameters of the GPU are finalized:
1) A final target compression rate;
2) The memory size applied on the CPU side.
For example, if the available capacity of GPU DRAM is 40GB and the finalized final target compression rate is x3, the memory size of the CPU side application is 80GB (40 GBx (3-1)), or if the available capacity of GPU DRAM is 50GB and the finalized final target compression rate is x4, the memory size of the CPU side application is 150GB (50 GBx (4-1));
Then, a page lock memory (pin memory) is applied to the CPU side using the calculated memory size applied to the CPU side. The page locking memory can avoid the situation that pages are exchanged to a CPU hard disk, and simultaneously, the base address (memory head address) of the applied non-graphic processor memory is recorded to a newly added register on the GPU, namely a base address register of the bottom-of-pocket memory;
finally, by writing the appointed hardware register, the compressed data cache, the TLB and the MMU on the GPU hardware are enabled, and the applied non-graphic processor memory and the graphic processor memory are used for completing the complete storage of the compressed page data.
Optionally, the graphics processor 1 further comprises a compressed data buffer 13 for storing the compressed page data.
Optionally, the graphics processor 1 further includes a base address register 122;
The base address register 122 is used for storing a base address of the applied non-graphics processor memory, and the base address is used in combination with the storage address to indicate a storage location of page data exceeding a storable data amount in the applied non-graphics processor memory.
It can be seen that, in the graphics processor 1 provided in the embodiment of the present invention, the computing core 10 generates the page data to be compressed, and the compressor 11 can compress the page data to be compressed by using a predetermined target compression algorithm, so as to obtain compressed page data, where the target compression rate of the target compression algorithm is used to determine the data amount storable in the graphics processor memory. Because the compressed data obtained after the compression processing of the page data to be compressed may have a situation that the data amount (storable data amount) determined by the target compression rate of the predetermined target compression algorithm is not satisfied, in the embodiment of the present invention, for one compressed page data, page data with the data amount equal to the storable data amount in the compressed page data may be stored in the graphics processor memory 14 on one hand, and meanwhile, page data with the data amount exceeding the storable data amount in the compressed page data may be stored in the applied non-graphics processor memory 21 on the other hand, and the storage address of the applied non-graphics processor memory 21 is recorded, and the applied non-graphics processor memory 21 is determined at least based on the target compression rate of the target compression algorithm, so that the compressed page data may be stored completely at one time. Therefore, the method can not only ensure that the data stored in the graphics processor memory 21 has no gap, but also avoid the graphics processor 1 from adopting a page reassignment mode to reapply the compressed page data stored in the graphics processor memory, and improve the data storage efficiency of the graphics processor memory 14 by applying for the page data exceeding the storable data amount, and can directly acquire the compressed page data stored in the applied non-graphics processor memory 21 based on the recorded storage address when the compressed page data is needed to be used later, so that the capacity occupation of the graphics processor memory 14 can be reduced, the additional expense caused by storing the compressed page data by adopting the page reassignment mode of the graphics processor 1 is avoided on the basis of the memory capacity expansion of the graphics processor 1, and the aim of improving the data storage efficiency of the graphics processor memory 14 is fulfilled.
Further, when the computing core in the graphics processor needs to access the compressed page data, the memory controller of the graphics processor firstly acquires the data read address of the data read request, and further when the corresponding page data is determined to be the compressed page data and stored in the graphics processor memory and the applied non-graphics processor memory, the compressed page data is compressed by a predetermined target compression algorithm, and in the compressed page data, the page data with the data volume equal to the storable data volume is stored in the graphics processor memory, and the page data with the data volume exceeding the storable data volume is stored in the applied non-graphics processor memory, so that the page data needs to be read from the graphics processor memory and the applied non-graphics processor memory respectively according to the data read address and the storage address in the data read address, finally the compressed page data is obtained, and then the compressed page data is decompressed by the target compression algorithm, so that the page data to be compressed which is required to be accessed by the graphics processor is obtained, and the memory capacity expansion of the graphics processor is realized.
For convenience in describing the functional implementation of the graphics processor provided in the embodiment of the present invention, please refer to fig. 4, and fig. 4 is a schematic diagram illustrating a process of implementing the data storage method provided in the embodiment of the present invention.
In fig. 4,4 page data to be compressed are compressed, pageA, pageB, pageC, pageD, and the target compression rate is illustrated as x4.
As shown in fig. 4, after each page data to be compressed is compressed by a predetermined target compression algorithm in the compressor, compressed page data is obtained, and each page data to be compressed correspondingly generates an actual compression rate:
PageA the corresponding actual compression ratio after compression is x1;
PageB the corresponding actual compression ratio after compression is x1.33;
PageC the corresponding actual compression ratio after compression is x2;
PageD the actual compression ratio corresponding to x4 after compression.
And then comparing the actual compression rate and the target compression rate after compressing each page data to be compressed, and storing each compressed data smaller than the target compression rate x4 into the graphics processor memory and the applied non-graphics processor memory (namely the CPU spam memory in FIG. 4) respectively.
As shown in fig. 4, the actual compression rate PageA, pageB, pageC is smaller than the target compression rate, so that the compressed page data with the data volume equal to the data volume (storable data volume) determined by the target compression rate is stored in the GPU memory, and the compressed page data with the data volume exceeding the storable data volume is stored in the CPU spam memory (the applied non-graphics processor memory).
PageD is equal to the target compression rate, so the compressed page data can be directly stored in the GPU memory.
The foregoing describes several embodiments of the present invention, and the various alternatives presented by the various embodiments may be combined, cross-referenced, with each other without conflict, extending beyond what is possible embodiments, all of which are considered to be embodiments of the present invention disclosed and disclosed.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims (20)

1.一种数据存储方法,其特征在于,应用于图形处理器,包括:1. A data storage method, characterized in that it is applied to a graphics processor, comprising: 获取待压缩页面数据;Get the page data to be compressed; 利用预先确定的目标压缩算法,对所述待压缩页面数据进行压缩,得到压缩后的页面数据;所述目标压缩算法的目标压缩率用于确定图形处理器内存中可存储的数据量;Compressing the to-be-compressed page data using a predetermined target compression algorithm to obtain compressed page data; a target compression rate of the target compression algorithm is used to determine the amount of data that can be stored in the graphics processor memory; 将压缩后的页面数据中,数据量与所述可存储的数据量相等的页面数据存储至所述图形处理器内存中,数据量超出所述可存储的数据量的页面数据存储至已申请的非图形处理器内存中,并记录其在已申请的非图形处理器内存的存储地址;storing, in the compressed page data, page data having a data volume equal to the storable data volume in the graphics processor memory, and storing page data having a data volume exceeding the storable data volume in the applied non-graphics processor memory, and recording the storage addresses of the page data in the applied non-graphics processor memory; 其中,已申请的非图形处理器内存是在所述图形处理器之外的处理器中申请的内存,且所述已申请的非图形处理器内存的存储容量至少基于所述目标压缩率确定。The applied non-GPU memory is a memory applied for in a processor other than the GPU, and the storage capacity of the applied non-GPU memory is determined based at least on the target compression ratio. 2.如权利要求1所述的数据存储方法,其特征在于,在得到压缩后的页面数据的步骤之后,所述数据存储方法还包括:2. The data storage method according to claim 1, wherein after the step of obtaining the compressed page data, the data storage method further comprises: 基于压缩后的页面数据确定对应的实际压缩率;所述实际压缩率可计算压缩后的页面数据的实际压缩数据量;Determine a corresponding actual compression ratio based on the compressed page data; the actual compression ratio can be used to calculate the actual compressed data volume of the compressed page data; 比较所述实际压缩率和所述目标压缩率;comparing the actual compression ratio with the target compression ratio; 如果所述实际压缩率小于所述目标压缩率,则确定所述实际压缩数据量大于所述可存储的数据量,并执行所述将压缩后的页面数据中,数据量与所述可存储的数据量相等的页面数据存储至所述图形处理器内存中,数据量超出所述可存储的数据量的页面数据存储至已申请的非图形处理器内存中,并记录其在已申请的非图形处理器内存的存储地址的步骤;If the actual compression ratio is less than the target compression ratio, determining that the actual compressed data amount is greater than the storable data amount, and executing the step of storing, in the compressed page data, page data having a data amount equal to the storable data amount into the graphics processor memory, storing, in addition to the page data having a data amount exceeding the storable data amount into the applied non-graphics processor memory, and recording the storage addresses of the page data in the applied non-graphics processor memory; 如果所述实际压缩率大于或等于所述目标压缩率,则确定所述实际压缩数据量小于或等于所述可存储的数据量,并将压缩后的页面数据存储至所述图形处理器内存中。If the actual compression ratio is greater than or equal to the target compression ratio, it is determined that the actual compressed data amount is less than or equal to the storable data amount, and the compressed page data is stored in the graphics processor memory. 3.如权利要求1所述的数据存储方法,其特征在于,在所述获取待压缩页面数据的步骤之前,还包括:3. The data storage method according to claim 1, characterized in that before the step of obtaining the page data to be compressed, the method further comprises: 获取调试页面数据;Get debug page data; 利用多种候选压缩算法对所述调试页面数据进行多次压缩;compressing the debugging page data multiple times using multiple candidate compression algorithms; 针对于每一种候选压缩算法,确定使用该候选压缩算法每次压缩所述调试页面数据的压缩率,并基于每次确定的压缩率得到所述调试页面数据的平均压缩率;For each candidate compression algorithm, determining a compression ratio of the debugging page data each time the candidate compression algorithm is used to compress the debugging page data, and obtaining an average compression ratio of the debugging page data based on the compression ratio determined each time; 选择所述平均压缩率中的最大值对应的候选压缩算法作为目标压缩算法,并将所述目标压缩算法的平均压缩率作为目标压缩率。The candidate compression algorithm corresponding to the maximum value of the average compression ratios is selected as the target compression algorithm, and the average compression ratio of the target compression algorithm is used as the target compression ratio. 4.如权利要求3所述的数据存储方法,其特征在于,在所述选择所述平均压缩率中的最大值对应的压缩算法作为目标压缩算法,并将所述目标压缩算法的平均压缩率作为目标压缩率的步骤之后,还包括:4. The data storage method according to claim 3, characterized in that after the step of selecting the compression algorithm corresponding to the maximum value among the average compression ratios as the target compression algorithm and taking the average compression ratio of the target compression algorithms as the target compression ratio, the method further comprises: 获取所述图形处理器内存的可用容量;Obtaining the available capacity of the graphics processor memory; 基于所述可用容量确定在所述非图形处理器内存上的申请容量,作为已申请的非图形处理器内存,所述申请容量为所述非图形处理器内存的存储容量;Determining a requested capacity on the non-GPU memory based on the available capacity as the requested non-GPU memory, where the requested capacity is the storage capacity of the non-GPU memory; 其中,所述申请容量由所述可用容量乘以所述目标压缩率与第一值的差值确定。The requested capacity is determined by multiplying the available capacity by the difference between the target compression rate and the first value. 5.如权利要求4所述的数据存储方法,其特征在于,所述已申请的非图形处理器内存为锁页内存。5 . The data storage method according to claim 4 , wherein the requested non-GPU memory is page-locked memory. 6.如权利要求4所述的数据存储方法,其特征在于,在将所述目标压缩算法的平均压缩率作为目标压缩率的步骤之后,还包括:6. The data storage method according to claim 4, characterized in that after the step of using the average compression rate of the target compression algorithm as the target compression rate, the method further comprises: 保存所述目标压缩率;saving the target compression ratio; 以及,保存所述已申请的非图形处理器内存的基地址,所述基地址结合所述存储地址用以指示:超出所述可存储的数据量的页面数据在已申请的非图形处理器内存的存储位置。Furthermore, a base address of the applied non-GPU memory is saved, wherein the base address is combined with the storage address to indicate a storage location of the page data exceeding the storable data amount in the applied non-GPU memory. 7.如权利要求2所述的数据存储方法,其特征在于,所述实际压缩率和所述存储地址记录在所述压缩后的页面数据的页表条目中;所述页表条目中还记录所述压缩后的页面数据的页面压缩标记;7. The data storage method according to claim 2, wherein the actual compression ratio and the storage address are recorded in a page table entry of the compressed page data; the page table entry also records a page compression flag of the compressed page data; 其中,所述页面压缩标记用于指示所述压缩后的页面数据是否经过压缩;所述实际压缩率为所述待压缩页面数据的未压缩前数据量与压缩后的页面数据的实际压缩数据量之比。The page compression mark is used to indicate whether the compressed page data has been compressed; the actual compression ratio is the ratio of the uncompressed data volume of the to-be-compressed page data to the actual compressed data volume of the compressed page data. 8.如权利要求2-6任一项所述的数据存储方法,其特征在于,在所述利用预先确定的目标压缩算法,对所述待压缩页面数据进行压缩,得到压缩后的页面数据的步骤之后,所述基于压缩后的页面数据确定对应的实际压缩率的步骤之前,还包括:8. The data storage method according to any one of claims 2 to 6, characterized in that after the step of compressing the to-be-compressed page data using a predetermined target compression algorithm to obtain compressed page data and before the step of determining a corresponding actual compression ratio based on the compressed page data, the method further comprises: 将所述压缩后的页面数据存储至所述图形处理器的压缩数据缓存中。The compressed page data is stored in a compressed data cache of the graphics processor. 9.一种数据读取方法,其特征在于,应用于图形处理器,包括:9. A data reading method, characterized in that it is applied to a graphics processor, comprising: 获取数据读取请求,所述数据读取请求包括数据读地址;Obtaining a data read request, wherein the data read request includes a data read address; 在确定所述数据读地址对应的页面数据为压缩后的页面数据,且压缩后的页面数据存储于图形处理器内存和已申请的非图形处理器内存中时,基于所述数据读地址在图形处理器内存中,获取数据量与图形处理器内存中可存储的数据量相等的页面数据,以及在已申请的非图形处理器内存中,基于数据读取请求中的存储地址获取数据量超出所述可存储的数据量的页面数据,得到压缩后的页面数据;所述压缩后的页面数据为如权利要求1-8任一项所述的数据存储方法存储的数据;所述可存储的数据量由所述数据存储方法采用的目标压缩算法的目标压缩率确定;When it is determined that the page data corresponding to the data read address is compressed page data, and the compressed page data is stored in the graphics processor memory and the applied non-graphics processor memory, based on the data read address, page data having a data volume equal to a data volume storable in the graphics processor memory is obtained, and based on the storage address in the data read request, page data having a data volume exceeding the storable data volume is obtained in the applied non-graphics processor memory, thereby obtaining compressed page data; the compressed page data is data stored by the data storage method according to any one of claims 1 to 8; the storable data volume is determined by a target compression ratio of a target compression algorithm adopted by the data storage method; 利用目标压缩算法确定对所述压缩后的页面数据的解压缩算法,并得到解压后的页面数据;Determining a decompression algorithm for the compressed page data using a target compression algorithm, and obtaining decompressed page data; 在所述解压后的页面数据中读取数据。Data is read from the decompressed page data. 10.如权利要求9所述的数据读取方法,其特征在于,在所述获取数据读取请求的步骤之后,还包括:10. The data reading method according to claim 9, characterized in that after the step of obtaining the data reading request, the method further comprises: 在确定所述数据读地址对应的页面数据为压缩后的页面数据时,判断所述数据读地址对应的压缩后的页面数据的页表条目中,是否记录有已申请的非图形处理器内存的存储地址;When it is determined that the page data corresponding to the data read address is compressed page data, determining whether a page table entry of the compressed page data corresponding to the data read address records a storage address of a non-GPU memory that has been applied for; 如果是,则确定所述数据读地址对应的压缩后的页面数据存储于图形处理器内存和已申请的非图形处理器内存中;If yes, determining that the compressed page data corresponding to the data read address is stored in the graphics processor memory and the applied non-graphics processor memory; 如果否,则确定所述数据读地址对应的压缩后的页面数据存储于图形处理器内存中。If not, it is determined that the compressed page data corresponding to the data read address is stored in the graphics processor memory. 11.如权利要求10所述的数据读取方法,其特征在于,在判断出所述数据读地址对应的压缩后的页面数据的页表条目中,记录有已申请的非图形处理器内存的存储位置时,所述方法还包括:11. The data reading method according to claim 10 , wherein when it is determined that a page table entry of the compressed page data corresponding to the data read address records a storage location of a non-GPU memory that has been requested, the method further comprises: 获取所述已申请的非图形处理器内存的基地址;Obtaining the base address of the requested non-GPU memory; 基于所述基地址和所述存储地址得到所述非图形处理器内存的物理地址。A physical address of the non-GPU memory is obtained based on the base address and the storage address. 12.如权利要求11所述的数据读取方法,其特征在于,所述基于所述数据读地址在图形处理器内存中,获取数据量与可存储的数据量相等的页面数据,以及在已申请的非图形处理器内存中,基于数据读取请求中的存储地址获取超出可存储的数据量的页面数据,得到压缩后的页面数据,包括:12. The data reading method according to claim 11, wherein the step of acquiring page data having a data volume equal to a storable data volume in a graphics processor memory based on the data read address, and acquiring page data exceeding the storable data volume in an applied non-graphics processor memory based on a storage address in a data read request, to obtain compressed page data, comprises: 在图形处理器内存中,根据所述数据读地址获取数据量与可存储的数据量相等的页面数据;In the graphics processor memory, acquiring page data having a data amount equal to a storable data amount according to the data read address; 在已申请的非图形处理器内存中,获取与所述非图形处理器内存的物理地址对应的页面数据,所获取的压缩后的页面数据为超出所述可存储的数据量的数据;Acquire, from the applied non-GPU memory, page data corresponding to the physical address of the non-GPU memory, wherein the acquired compressed page data is data exceeding the storable data amount; 基于在图形处理器内存中获取的页面数据,和在已申请的非在图形处理器内存中获取的页面数据,得到压缩后的页面数据。Compressed page data is obtained based on the page data obtained in the graphics processor memory and the page data obtained in the requested non-graphics processor memory. 13.如权利要求12所述的数据读取方法,其特征在于,在所述获取数据读取请求的步骤之后,还包括:判断所述数据读地址是否命中所述图形处理器的缓存;13. The data reading method according to claim 12, characterized in that after the step of obtaining the data reading request, the method further comprises: determining whether the data reading address hits the cache of the graphics processor; 如果是,则确定数据读地址对应的页面数据未压缩,从所述数据读地址命中的缓存中读取所述数据读取请求请求的数据;If yes, determining that the page data corresponding to the data read address is not compressed, and reading the data requested by the data read request from the cache hit by the data read address; 如果否,则确定数据读地址对应的页面数据为压缩后的页面数据。If not, it is determined that the page data corresponding to the data read address is compressed page data. 14.一种图形处理器,其特征在于,包括:14. A graphics processor, comprising: 压缩器,用于获取待压缩页面数据,并利用预先确定的目标压缩算法,对所述待压缩页面数据进行压缩,得到压缩后的页面数据;所述目标压缩算法的目标压缩率用于确定图形处理器内存中可存储的数据量;a compressor configured to obtain page data to be compressed and compress the page data using a predetermined target compression algorithm to obtain compressed page data; a target compression rate of the target compression algorithm is used to determine the amount of data that can be stored in the graphics processor memory; 内存控制器,用于将压缩后的页面数据中,数据量与所述可存储的数据量相等的页面数据存储至所述图形处理器内存中,数据量超出所述可存储的数据量的页面数据存储至已申请的非图形处理器内存中,并记录其在已申请的非图形处理器内存的存储地址;a memory controller configured to store, in the compressed page data, page data having a data amount equal to the storable data amount into the graphics processor memory, and store page data having a data amount exceeding the storable data amount into an applied non-graphics processor memory, and record the storage address of the page data in the applied non-graphics processor memory; 其中,已申请的非图形处理器内存是在所述图形处理器之外的处理器中申请的内存,且所述已申请的非图形处理器内存的存储容量至少基于所述目标压缩率确定。The applied non-GPU memory is a memory applied for in a processor other than the GPU, and the storage capacity of the applied non-GPU memory is determined based at least on the target compression ratio. 15.如权利要求14所述的图形处理器,其特征在于,所述内存控制器还用于根据获取的数据读取请求的数据读地址,在确定所述数据读地址对应的页面数据为压缩后的页面数据,且压缩后的页面数据存储于图形处理器内存和已申请的非图形处理器内存中时,基于所述数据读地址在图形处理器内存中,获取数据量与图形处理器内存中可存储的数据量相等的页面数据,以及在已申请的非图形处理器内存中,基于数据读取请求中的存储地址获取数据量超出所述可存储的数据量的页面数据;15. The graphics processor of claim 14 , wherein the memory controller is further configured to, based on a data read address obtained in a data read request, obtain page data having a data volume equal to a data volume storable in the graphics processor memory based on the data read address, and obtain page data having a data volume exceeding the data volume storable in the graphics processor memory based on the storage address in the data read request, when determining that the page data corresponding to the data read address is compressed page data and the compressed page data is stored in the graphics processor memory and the requested non-graphics processor memory; 所述压缩器还用于利用目标压缩算法确定对所述压缩后的页面数据的解压缩算法,并得到解压后的页面数据;The compressor is further configured to determine a decompression algorithm for the compressed page data using a target compression algorithm, and obtain decompressed page data; 所述图形处理器还包括:The graphics processor further includes: 数据读取模块,用于在所述解压后的页面数据中读取数据。The data reading module is used to read data from the decompressed page data. 16.如权利要求15所述的图形处理器,其特征在于,还包括:压缩率比较模块,用于基于压缩后的页面数据确定对应的实际压缩率;并比较所述实际压缩率和所述目标压缩率;所述实际压缩率可计算压缩后的页面数据的实际压缩数据量;16. The graphics processor of claim 15, further comprising: a compression ratio comparison module, configured to determine an actual compression ratio corresponding to the compressed page data; and compare the actual compression ratio with the target compression ratio; the actual compression ratio being used to calculate an actual compressed data volume of the compressed page data; 如果所述实际压缩率小于所述目标压缩率,则确定所述实际压缩数据量大于所述可存储的数据量,并将压缩后的页面数据中,数据量与所述可存储的数据量相等的页面数据存储至所述图形处理器内存中,数据量超出所述可存储的数据量的页面数据存储至已申请的非图形处理器内存中,并记录其在已申请的非图形处理器内存的存储地址;If the actual compression ratio is less than the target compression ratio, determining that the actual compressed data amount is greater than the storable data amount, storing, among the compressed page data, page data having a data amount equal to the storable data amount in the graphics processor memory, storing, among the compressed page data, page data having a data amount exceeding the storable data amount in the requested non-graphics processor memory, and recording the storage addresses of the page data in the requested non-graphics processor memory; 如果所述实际压缩率大于或等于所述目标压缩率,则确定所述实际压缩数据量小于或等于所述可存储的数据量,并将压缩后的页面数据存储至所述图形处理器内存中。If the actual compression ratio is greater than or equal to the target compression ratio, it is determined that the actual compressed data amount is less than or equal to the storable data amount, and the compressed page data is stored in the graphics processor memory. 17.如权利要求16所述的图形处理器,其特征在于,还包括:17. The graphics processor according to claim 16, further comprising: 性能分析模块,用于获取调试页面数据;利用多种候选压缩算法对所述调试页面数据进行多次压缩;针对于每一种候选压缩算法,确定使用该候选压缩算法每次压缩所述调试页面数据的压缩率,并基于每次确定的压缩率得到所述调试页面数据的平均压缩率;选择所述平均压缩率中的最大值对应的候选压缩算法作为目标压缩算法,并将所述目标压缩算法的平均压缩率作为目标压缩率。A performance analysis module is configured to obtain debug page data; compress the debug page data multiple times using a plurality of candidate compression algorithms; determine, for each candidate compression algorithm, a compression ratio for each compression of the debug page data using the candidate compression algorithm, and obtain an average compression ratio of the debug page data based on the compression ratio determined each time; select the candidate compression algorithm corresponding to the maximum value among the average compression ratios as a target compression algorithm, and use the average compression ratio of the target compression algorithms as a target compression ratio. 18.如权利要求17所述的图形处理器,其特征在于,还包括:18. The graphics processor according to claim 17, further comprising: 内存申请模块,用于获取所述图形处理器内存的可用容量;基于所述可用容量确定在所述非图形处理器内存上的申请容量,作为已申请的非图形处理器内存;所述申请容量为所述非图形处理器内存的存储容量;其中,所述申请容量由所述可用容量乘以所述目标压缩率与第一值的差值确定;a memory request module, configured to obtain an available capacity of the graphics processor memory; determine a requested capacity of the non-graphics processor memory based on the available capacity, as the requested non-graphics processor memory; the requested capacity is the storage capacity of the non-graphics processor memory; wherein the requested capacity is determined by multiplying the available capacity by a difference between the target compression ratio and a first value; 目标压缩率保持模块,用于保存所述目标压缩率。The target compression ratio maintaining module is used to store the target compression ratio. 19.如权利要求15-18任一项所述的图形处理器,其特征在于,还包括:压缩数据缓存,用于存储所述压缩后的页面数据。19. The graphics processor according to any one of claims 15 to 18, further comprising: a compressed data cache for storing the compressed page data. 20.如权利要求19所述的图形处理器,其特征在于,还包括:目标压缩率寄存器和基地址寄存器;20. The graphics processor of claim 19, further comprising: a target compression rate register and a base address register; 所述目标压缩率寄存器用于保存所述目标压缩率;The target compression rate register is used to store the target compression rate; 所述基地址寄存器用于保存所述已申请的非图形处理器内存的基地址,所述基地址结合所述存储地址用以指示:超出可存储的数据量的页面数据在已申请的非图形处理器内存的存储位置。The base address register is used to store the base address of the applied non-GPU memory. The base address is combined with the storage address to indicate the storage location of the page data exceeding the storable data amount in the applied non-GPU memory.
CN202510787347.3A 2025-06-12 2025-06-12 Data storage method, data reading method and graphics processor Pending CN120705075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510787347.3A CN120705075A (en) 2025-06-12 2025-06-12 Data storage method, data reading method and graphics processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510787347.3A CN120705075A (en) 2025-06-12 2025-06-12 Data storage method, data reading method and graphics processor

Publications (1)

Publication Number Publication Date
CN120705075A true CN120705075A (en) 2025-09-26

Family

ID=97110340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510787347.3A Pending CN120705075A (en) 2025-06-12 2025-06-12 Data storage method, data reading method and graphics processor

Country Status (1)

Country Link
CN (1) CN120705075A (en)

Similar Documents

Publication Publication Date Title
CN107250991B (en) Transparent hardware-assisted memory decompression
JP6069031B2 (en) Computer and memory management method
US10572378B2 (en) Dynamic memory expansion by data compression
KR100734823B1 (en) Method and apparatus for morphing memory compressed machines
US12493422B2 (en) Resource allocation method and apparatus
CN105393228B (en) Method, device and user equipment for reading and writing data in flash memory
JP7438246B2 (en) Hardware-based memory compression
US12242376B2 (en) Paging in thin-provisioned disaggregated memory
CN113360093B (en) Memory system and device
US20190138446A1 (en) Compressed pages having data and compression metadata
US10482021B2 (en) Priority-based storage and access of compressed memory lines in memory in a processor-based system
CN115249057A (en) System and computer-implemented method for graph node sampling
US20130205071A1 (en) Compressed cache storage acceleration
CN110442594A (en) A kind of Dynamic Execution method towards Spark SQL Aggregation Operators
US12386786B2 (en) Data processing method and apparatus
CN120705075A (en) Data storage method, data reading method and graphics processor
JP6243884B2 (en) Information processing apparatus, processor, and information processing method
CN106991058B (en) Method and device for processing pre-fetched files
WO2023030173A1 (en) Method for managing dynamic library and corresponding apparatus
US10977176B2 (en) Prefetching data to reduce cache misses
CN113253947B (en) A deduplication method, apparatus, device and readable storage medium
KR102879266B1 (en) Memory device using compressed zones and operating method thereof
US11914527B2 (en) Providing a dynamic random-access memory cache as second type memory per application process
US20250291676A1 (en) Method for reducing a time-to-ready time in client storage drives without a capacitor during ungraceful shutdown
CN110674057A (en) A data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination