Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
The technical terms involved in the embodiments of the present application will be described below.
1. System read-write file
Currently, system read-write files are classified into a cache input-output (Buffer Input Output, BIO) mode and a direct input-output (Direct Input Output, DIO) mode.
The BIO mode is a mode of establishing a cache memory in the memory, so that a user space reads and writes files without directly triggering file reading and writing, but directly reading and writing the files into the cache memory. The DIO mode is a mode that a cache memory is not required to be built in a memory, and a driver can directly read a file from a disk. In general, the read/write efficiency of the BIO method is higher than that of the DIO method.
2. Direct memory access (DMA-BUF)
Typically, DMA-BUF memory is a specialized memory that can be shared among user space, kernel space, and DMA devices.
The user space is the system position where the application uses the memory, and is characterized by limited system authority, the use memory is responsible for by a kernel memory management part, the memory is used in a user virtual address mode (each process has own process address space), and usually only the actual reading and writing of the process can provide the actual physical memory, and most of the memory (stack and file page) can be recovered. The embodiment of the DMA-BUF in the user space can map the actual physical address to the process address space (UserVaddr) of the user, access the memory in the form of zero copy, and the memory mapped to the process address space by the DMA-BUF cannot be actively recovered by the system.
The kernel space is a process related driver and the position of the kernel core subsystem is used for connecting hardware with higher authority and providing service for upper-layer application. The DMA-BUF memory may be mapped to the kernel address space (Kvaddr) to access, share memory in a zero copy manner.
Typically, device access to memory requires access by the CPU, but this approach takes up the computational power of the CPU. For example, if the file reading and writing all need the CPU to participate, the CPU may need to wait for the file reading to complete to run other applications, and if the number of reads is large, the throughput of the system is affected. A DMA device is a special device that has the capability to directly access memory. The DMA-BUF memory may be mapped to virtual addresses accessible to the DMA device by way of an Input/output memory management unit (Input/OutputMemory Management Unit, IOMMU) (iova).
Through the mapping of the three modes, the physical address of the same DMA-BUF can share the same memory among the three modes, and the same memory is accessed in a zero-copy mode, so that the service processing is efficiently completed.
Currently, the artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) model requires multiple devices (e.g., central processing unit (Central Processing Unit, CPU) and AI processing units (AIProcessing Unit, APU)) to coordinate processing, and thus the cache memory for reading model files in the AI domain is typically DMA-BUF memory.
3. Other terms
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The terms "at least one," "at least one," and the like in the description and in the claims, mean that they encompass any one, any two, or a combination of two or more of the objects. For example, at least one of a, b, c (item) may represent "a", "b", "c", "a and b", "a and c", "b and c" and "a, b and c", wherein a, b, c may be single or plural. Similarly, the term "at least two" means two or more, and the meaning of the expression is similar to the term "at least one".
The memory application method, the memory application device, the electronic equipment and the readable storage medium provided by the embodiment of the application are described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
The memory application method provided by the embodiment of the application can be applied to the driving of the electronic equipment and/or the scene that the equipment needs to read data.
Assuming that a driver of an electronic device and/or the device needs to read a model file of an AI large model, in the related art, as shown in fig. 1, a user space of the electronic device may send a memory application request corresponding to the model file to a system of the electronic device according to a data size of the model file of the AI large model, so as to apply for a cache memory (for example, a DMA-BUF memory) with a memory size matched with the data size for the model file, and then under a condition that the DMA-BUF memory application is completed, the system may use mmap/read to call, and read the model file into the DMA-BUF memory in a user space zero copy scheme, so that the system may share the DMA-BUF memory to the driver and the DMA device for AI word-out processing. However, since the data size of the model file of the AI large model may be large, for example, the 3B model is 1. Xgbt, the 7B model is 3.Xgb, and the GB-level memory application is fast and requires 60 ms, if the memory is insufficient, the memory application may require 700ms to 1.X seconds s, and with mmap/read call, the model file may need to be read into the DMA-BUF memory in a user space zero copy scheme to wait for the DMA-BUF memory application to be completed, which results in a long time for applying for the DMA-BUF memory with a memory size matching the data size for the model file and a long time for reading the model file into the DMA-BUF memory, and, in the ground, since the memory size of the DMA-BUF memory is large, the electronic device may release the DMA-BUF memory quickly, and thus the electronic device may need to read the data again in driving and/or the device, which may need to apply for the DMA-BUF again for a long time, which results in a long time for the electronic device to read the data, which is inefficient.
However, in the embodiment of the present application, the system of the electronic device may receive a memory application request for requesting to apply for a DMA-BUF memory, where the DMA-BUF memory is used to cache a model file of the AI large model, and execute a DMA-BUF memory application operation according to the memory application request, where the system of the electronic device may read a portion 1 of the model file of the AI large model to a cache memory block in case of applying to the cache memory block of the DMA-BUF memory, where a data size of the portion 1 matches a memory size of the cache memory block. It can be understood that, under the condition that the electronic device applies for a cache memory block with a smaller memory size of the DMA-BUF memory, the part 1 of the model file of the AI large model can be read into the cache memory block, and the data 1 can be read into the DMA-BUF memory without waiting for the whole DMA-BUF memory to be applied for completion, so that the waiting time required for reading the model file of the AI large model can be reduced, the time consumption for reading the data of the electronic device can be reduced, and the data reading efficiency of the electronic device can be improved.
It should be noted that, the foregoing is illustrated by taking a driving of an electronic device and/or reading of a model file of an AI large model by a device as an example, and in practical application, the driving of the electronic device and/or the device may also read other data, which is not limited by the embodiment of the present application.
According to the memory application method provided by the embodiment of the application, the execution main body can be a memory application device, or an electronic device, or a functional module or entity in the electronic device. In the embodiment of the application, an example of a memory application method executed by an electronic device is taken as an example, and the memory application method provided by the embodiment of the application is described.
Fig. 2 shows a flowchart of a memory application method according to an embodiment of the present application. As shown in fig. 2, the memory application method provided in the embodiment of the present application may include the following steps 101 and 102.
Step 101, under the condition that a memory application request corresponding to first data is received, the electronic device executes a memory application operation according to the memory application request.
In some embodiments of the present application, the memory application request may be used to request the cache memory corresponding to the first data.
In some embodiments of the present application, the cache memory may be a DMA-BUF memory.
In some embodiments of the present application, the first data may be a model file of an AI large model, or image data, or video data, or page data. Of course, the first data may be other data, which is not limited in the embodiment of the present application.
In some embodiments of the present application, the total data size of the first data may be greater than or equal to a preset data size. It is understood that the total data size of the first data is large. Of course, the total data size of the first data may also be smaller than the preset data size, that is, the total data size of the first data may also be smaller, which is not limited in the embodiment of the present application.
In some embodiments of the present application, the memory application request may include a memory size of the cache memory. The memory size of the cache memory is matched with the data size of the first data.
It should be noted that the above "matching" is understood to be the same, or the difference between the two is less than or equal to a preset value.
In some embodiments of the present application, in a case where the driver of the electronic device needs to read the first data, the user space of the electronic device may send a memory application request to the system of the electronic device, so that the system of the electronic device may receive the memory application request. The driver may include at least one of a driver for user space, a driver for kernel space, a driver for DMA devices, and the like.
The driving of the user space may include a camera driving, and of course, the driving of the user space may also include other driving, which is not exhaustive in this embodiment of the present application. The kernel space driver may include a direct rendering manager (DIRECT RENDERING MANAGER, DRM) driver, although the kernel space driver may include other drivers, and the embodiments of the present application are not exhaustive herein. The DMA device may include a graphics processor (Graphics Processing Unit, GPU), APU, etc., although the DMA device may include other devices, and embodiments of the present application are not intended to be exhaustive.
Step 102, under the condition of applying for the first cache memory block, the electronic device reads a first part of the first data to the first cache memory block, wherein the data volume of the first part is matched with the memory size of the first cache memory block.
In some embodiments of the present application, the electronic device may determine, in real time, whether to apply to the first cache memory block during the process of executing the memory application operation, and may read the first portion of the first data to the first cache memory block if it is determined that the first cache memory block is applied to the first cache memory block.
Under the condition that the memory size of the idle memory in the electronic device is larger than or equal to the memory size carried in the memory application request, the system of the electronic device can directly apply for the cache memory from the idle memory. Under the condition that the memory size of the idle memory in the electronic equipment is smaller than the memory size carried in the memory application request, the system of the electronic equipment can firstly recycle at least part of occupied memory and then apply for cache memory from the recycled memory and the idle memory.
The memory size of the first cache memory block may be a preset memory size or a memory size determined according to a data size of the first data. The system of the electronic device may count the memory size of the applied cache memory in real time, and in case that the memory size matches a preset memory size (or a memory size determined according to the data amount of the first data), the system of the electronic device may determine that the application is applied to the first cache memory block.
The first portion may be a portion of the first data corresponding to the first cache memory block. Here, the first data start position of the first portion is a data position corresponding to the first cache memory block. The first data location may be a data start location of the first data, or other data locations in the first data.
In the case that the first cache memory block is the first cache memory block applied by the system of the electronic device, the first data location may be a data start location of the first data. In the case where the first cache memory block is an i-th cache memory block applied by a system of the electronic device, the first data location may be another data location in the first data, and i is a positive integer greater than 1.
For example, assuming that the first cache memory block is a first cache memory block applied by a system of the electronic device, where the first cache memory block is 128MB, a first data start position of the first portion is a data start position of the first data, and the first portion is a portion of 0 to 128MB of the first data.
Further, for example, assuming that the first cache memory block is a second cache memory block applied by a system of the electronic device, the memory size of the first cache memory block is 128MB, and the first cache memory block is 128MB, the first data start position of the first portion is another data position (for example, a data position of 128 MB) of the first data, and the first portion is a 128MB to 256MB portion of the first data.
The following will illustrate a specific scheme for the electronic device to apply for the first cache memory block.
In some embodiments of the present application, as shown in fig. 3 in conjunction with fig. 2, before the step 102, the memory application method provided in the embodiment of the present application may further include the following steps 201 and 202.
Step 201, the electronic device records the applied cache memory block to the first list.
In some embodiments of the present application, the first list may be a list. Of course, the first list may be other lists, which are not limited in this embodiment of the present application.
Step 202, determining the cache memory block recorded in the first list as the first cache memory block under the condition that the memory size of the cache memory block recorded in the first list is matched with the preset memory size.
In some embodiments of the present application, the preset memory size may be a memory size set by a user, or a preset memory size, or a contracted memory size, or a memory size determined according to a data size of the first data.
Under the condition that the preset memory size is the memory size determined according to the data amount of the first data, the electronic device can determine the preset memory size according to the data reading rate of the electronic device and the data amount of the first data. The electronic device is provided with a preset memory, a first data storage unit, a second data storage unit and a third data storage unit, wherein the smaller the data reading speed of the electronic device is, the smaller the preset memory size is, and the larger the data quantity of the first data is, the larger the preset memory size is.
In some embodiments of the present application, the predetermined memory size may be 128MB, 256MB, or 512MB. Of course, the preset memory size may be other memory sizes, which is not limited in the embodiment of the present application.
In some embodiments of the present application, the system of the electronic device may determine, in real time, a size relationship between a memory size of a cache memory block in the first list and a preset memory size, and in a case where the memory size of the cache memory block in the first list matches the preset memory size, the electronic device may determine the cache memory block as the first cache memory block, and determine to apply to the first cache memory block.
In some embodiments of the present application, after the electronic device determines the cache memory block recorded in the first list as the first cache memory block, the electronic device may empty the cache memory block recorded in the first list, so as to record other cache memory blocks in a subsequent step, and determine other cache memory blocks.
Therefore, the electronic device can store the applied cache memory block into the first list so as to accurately count the memory size of the applied cache memory block, and the electronic device can accurately determine the cache memory block in the first list as the first cache memory block under the condition that the memory size of the applied cache memory block is matched with the preset memory size, so that the accuracy of determining the first cache memory block by the electronic device can be improved.
In some embodiments of the present application, as shown in fig. 4 in conjunction with fig. 2, the above step 102 may be specifically implemented by the following step 102 a.
In step 102a, under the condition of applying for the first cache memory block, the electronic device reads the first portion to a virtual storage space corresponding to a first virtual address, where the first virtual address corresponds to the first cache memory block.
In some embodiments of the present application, the first virtual address may be a core address. The virtual storage control may specifically be a kernel storage space.
In some embodiments of the present application, the electronic device may determine the first virtual address according to a physical address of the first cache memory block.
In some embodiments of the application, as shown in fig. 5 in conjunction with fig. 4, the above step 102a may be specifically implemented by the following step 102a 1.
In step 102a1, under the condition of applying to the first cache memory block, the electronic device reads the first portion to the virtual storage space corresponding to the first virtual address according to the first data position of the first portion in the first data and the data amount of the first portion by the data processing thread.
In some embodiments of the present application, the data processing thread may be an IO processing thread.
In some embodiments of the present application, the electronic device may first submit the first task to the data processing thread, so that the data processing thread may read the first portion to the virtual storage space corresponding to the first virtual address according to the first task, the first data location of the first portion in the first data, and the data amount of the first portion.
Wherein the first task may carry a first data location and a first portion of the data volume.
In some embodiments of the present application, if the data processing thread receives a first task and the processing thread exits the sleep state, the data processing thread may obtain the first portion from the first data in the disk according to the first task, according to the first data location and the data amount of the first portion.
Thus, the electronic device can accurately acquire the corresponding first part from the first data according to the first data position and the data quantity of the first part through the data processing thread.
In summary, the electronic device may directly read the first portion to the virtual storage space corresponding to the first virtual address, so as to directly read the first portion of the first data in the disk to the first cache memory block, without the electronic device first reading the first portion of the first data in the disk to the other memory, and then copying the first portion in the other memory to the first cache memory through the CPU, thereby avoiding the waste of memory resources while avoiding the waste of CPU computation power caused by copying the first portion in the other memory to the first cache memory.
In some embodiments of the application, as shown in fig. 6 in conjunction with fig. 4, before the electronic device reads the first portion to the virtual storage space corresponding to the first virtual address in the step 102a, the memory application method provided in the embodiment of the present application may further include the following step 301, and the step 102a may be specifically implemented by the following step 102a 2.
In step 301, under the condition of applying for the first cache memory block, the electronic device maps the physical address of the first cache memory block to the virtual storage space, so as to obtain a first virtual address.
It should be noted that, for the description of mapping the physical address of the first cache memory block to the virtual storage space by the electronic device, reference may be made to the specific description of the related art, which is not repeated in the embodiments of the present application.
Step 102a2, the electronic device reads the first portion to a virtual storage space corresponding to a first virtual address, where the first virtual address corresponds to the first cache memory block.
In the related art, the system of the electronic device maps the physical address of the first cache memory block to the user address space, so when the first portion is read, since the design of the DMA-BUF memory is PFN-based, no PAGE access method is provided to the upper layer, so the system of the electronic device needs to apply for the additional PAGE CACHE memory block, first read the first portion into the PAGE CACHE memory block, and copy the portion in the PAGE CACHE memory block into the first cache memory block through the CPU, so when the total data size of the first data is greater than or equal to the preset data size, that is, when the total data size of the first data is greater, on one hand, the memory resource is wasted, and on the other hand, copying the portion in the PAGE CACHE memory block into the first cache memory block causes the computational waste of the CPU and causes the power consumption of the CPU to increase.
However, in the embodiment of the present application, the system of the electronic device may directly map the physical address of the first cache memory block to the virtual storage space (i.e., the kernel address space) to obtain the first virtual address, and in the kernel address space, the system of the electronic device may manage the DMA-BUF memory in a page form, i.e., the electronic device may directly read the first portion of the first data in the disk to the first cache memory block in a DIO manner. In the related art, the physical address of the first cache memory block is mapped to the user address space, and the user address space can only be used for DMA-BUF PFN, so that after the first portion is read to the user address space, the consistency of the first portion in the first cache memory block corresponding to the user address space may not be ensured, for example, the first cache memory block corresponding to the user address space may be recovered by the system, or the memory migration occurs due to the memory rule, etc., and the cache memory block corresponding to the user address space is changed, etc., so that the access must be performed by using the BIO method, that is, only the PAGE CACHE memory block can be applied first in the related art, and then the portion in the PAGE CACHE memory block is copied to the first cache memory block. In the embodiment of the application, the system of the electronic equipment can avoid applying PAGE CACHE for the memory block additionally, so that the waste of memory resources can be avoided, the calculation power waste of the CPU can be reduced, and the power consumption of the CPU can be reduced.
Therefore, the electronic device can map the physical address of the first cache memory block to the virtual storage space to accurately obtain the first virtual address, so that in the subsequent step, the electronic device can accurately store the first portion to the virtual storage space corresponding to the first virtual address. In addition, the virtual storage space can be a kernel address space, that is, the electronic device can map the physical address of the first cache memory block to the kernel address space to obtain the first virtual address, so that in the subsequent step, the electronic device can directly store the first part in the disk to the kernel address space without applying PAGE CACHE, and therefore, the waste of memory resources can be avoided, and meanwhile, the waste of CPU calculation power caused by copying the data in PAGE CACHE to the cache memory can be avoided.
The embodiment of the application provides a memory application method, which is characterized in that an electronic device can execute memory application operation according to a memory application request corresponding to first data under the condition of receiving the memory application request, and read a first part of the first data to a first cache memory block under the condition of applying to the first cache memory block, wherein the data volume of the first part is matched with the memory size of the first cache memory block. Under the condition that the electronic equipment applies for a first cache memory block with smaller memory size, a first part of data volume matched with the memory size of the first cache memory block is read into the first cache memory block, and the first data can be read into the cache memory without waiting for the completion of the whole cache memory application, so that waiting time required for reading the first data can be reduced, time consumption for data reading of the electronic equipment can be reduced, and data reading efficiency of the electronic equipment can be improved.
And under the condition that the electronic device releases the cache memory, the electronic device can read the first part into the first cache memory block under the condition that the electronic device applies to the first cache memory block of the cache memory, namely, the electronic device can quickly read the part of the first data, so that the time for reading the part of the first data into the cache memory again can be reduced.
In some embodiments of the application, the first data further includes a second portion, the first portion being adjacent to the second portion. The memory application method provided by the embodiment of the present application may further include the following step 103.
Step 103, under the condition of applying for the second cache memory block, the electronic device reads the second portion to the second cache memory block, wherein the data size of the second portion is matched with the memory size of the second cache memory block.
In some embodiments of the present application, the first portion being adjacent to the second portion may be understood as the data end position of the first portion being the data start position of the second portion.
Illustratively, assuming that the first portion is 0-128 MB of the first data, the second portion may be 128 MB-300 MB of the first data.
In some embodiments of the present application, the data amount of the second portion may be the same as or different from the data amount of the first portion.
In some embodiments of the present application, the memory size of the first cache memory block is the same as or different from the memory size of the second cache memory block.
For example, the memory size of the first cache memory block may be 64MB and the memory size of the second cache memory block may be 128MB.
In some embodiments of the present application, the memory size of the first cache memory block and the memory size of the second cache memory block are determined according to a data size of the first data.
For example, if the data size of the first data is 10G, the memory size of the first cache memory block may be 244MB, and the memory size of the second cache memory block may be 344MB. Of course, the memory sizes of the first cache memory block and the second cache memory block may also be other values, which are not limited herein.
The electronic device may determine a memory size of the first cache memory block and a memory size of the second cache memory block according to at least one of a rate at which the electronic device reads the data and a data amount of the first data, respectively.
Taking the first cache memory block as an example, the memory size of the first cache memory block may be positively correlated with the rate at which the electronic device reads data, i.e., the smaller the rate at which the electronic device reads data, the smaller the memory size of the first cache memory block. The memory size of the first cache memory block may be positively correlated with the data size of the first data, i.e., the smaller the data size of the first data, the smaller the memory size of the first cache memory block.
Therefore, the electronic device can determine the memory size of the first cache memory block and the memory size of the second cache memory block according to the data amount of the first data, that is, the electronic device can flexibly configure the memory size of the first cache memory block and the memory size of the second cache memory block, so that the flexibility of reading the data of the electronic device can be improved.
It should be noted that, for the description of the electronic device reading the second portion to the second cache memory block, reference may be made to the specific description of the electronic device reading the first portion to the first cache memory block in the above embodiment, and the embodiments of the present application are not repeated here.
In some embodiments of the present application, the electronic device may read the second portion to the second cache memory block when the second cache memory block is applied in the process of reading the first portion to the first cache memory block.
In some embodiments of the present application, in the process of the electronic device reading the second portion to the second cache memory block, the electronic device may read the third portion to the third cache memory block when applying to the third cache memory block, where the size of the data size of the third portion matches the memory size of the third cache memory block. The third portion is adjacent to the second portion.
It will be appreciated that the electronic device may repeatedly execute the above step 103 in the process of reading the second portion to the second cache memory block, so as to read different portions of the first data to the applied cache memory blocks at the same time, thereby reading the first data to the cache memory. That is, in the process that the electronic device reads the last data content from the last applied cache memory block, if the electronic device applies to another cache memory block, the electronic device may read the last portion from the last applied cache memory block and simultaneously read the current portion from the other cache memory block.
That is, the memory application method provided by the embodiment of the application can change the mode of applying for memory and data reading in the related technology from serial to pipelined parallel, namely, the memory application method provided by the embodiment of the application can start to read the part of the first data without waiting until all cache memory applications are completed, so that waiting time required for reading the part of the first data can be reduced, and on the other hand, the first data is not required to be read in a serial mode, but the part of the first data can be read at the same time, so that time required for reading the first data can be reduced. In summary, compared with the related art, the time for reading the first data is reduced from the time for applying for the cache memory+the time for reading the first data to MAX (time for applying for the cache memory), the time for applying for the first cache memory block (e.g. the first cache memory block) +the time for applying for the last cache memory block (e.g. the second cache memory block) +the time for reading the last portion (e.g. the second portion), so that the total time consumption for reading the first data is reduced, i.e. the efficiency of the electronic device for reading the first data is improved.
In summary, the electronic device can read the second portion to the second cache memory block in the process of reading the first portion to the first cache memory block, that is, the electronic device can simultaneously read a plurality of portions to the plurality of cache memory blocks, that is, the electronic device can simultaneously read a plurality of portions of the first data to the cache memory block, so that the time required for reading the first data to the cache memory can be reduced. That is, the efficiency of the electronic device to read data can be improved.
The following will illustrate a specific scheme of the memory application method provided in the embodiment of the present application by using a complete example.
As shown in fig. 7, it is assumed that DMA devices (e.g., GPU and APU) of an electronic device need to read model files of the AI large model, which include weight files therein. Therefore, the ioctl program may trigger the user space to send a memory application request corresponding to the model file to the system of the electronic device, where the memory application request is used to request to apply for the DMA-BUF memory (i.e. the cache memory in the foregoing embodiment), so that the system of the electronic device may receive the memory application request and execute the DMA-BUF memory application operation according to the memory application request. In the process of applying for the DMA-BUF memory, the system of the electronic device may record the applied cache memory block into the first list, and determine the cache memory block recorded in the first list as the first cache memory block (for example chunkpage a 1) when the memory size of the cache memory block recorded in the first list matches with the preset memory size. At this time, the system of the electronic device may submit task 1 (i.e., the first task in the above embodiment) to the data processing thread 1, so that the data processing thread 1 may directly obtain the first portion (for example, chunk 1) from the first data in the disk, and directly store chunk 1 into chunkpage. While the system of the electronic device submits task 1 to the data processing thread 1, the system of the electronic device may empty the cache memory block recorded in the first list, record the applied cache memory block into the first list, and determine the cache memory block recorded in the first list as the second cache memory block (for example chunkpage a 2) when the memory size of the memory block recorded in the first list matches with the preset memory size. At this point, the system of the electronic device may submit task 2 to data processing thread 2 so that data processing thread 2 may directly retrieve a second portion (e.g., chunk 2) from the first data in disk and store chunk2 directly into chunkpage. While the system of the electronic device submits task 2 to processing thread 2, the system of the electronic device may empty the cache memory block recorded in the first list, record the applied cache memory block into the first list, and determine the cache memory block recorded in the first list as a third cache memory block (for example chunkpage) when the memory size of the memory block recorded in the first list matches with the preset memory size. The system of the electronic device may then submit task 3 to data processing thread 3 such that data processing thread 3 may directly retrieve a third portion (e.g., chunk 3) from the first data in disk and store chunk 3 directly into chunkpage. And so on until the entire portion of the first data is stored in the DMA-BUF memory.
It can be appreciated that while the system of the electronic device reads the chunk 1 to chunkpage, the system of the electronic device may apply chunkpage and read the chunk 2 to chunkpage2, and the system of the electronic device may apply chunkpage and read the chunk3 to chunkpage3, that is, the system of the electronic device may apply for the cache memory in a pipeline parallel manner and read the first data to the cache memory, so that the driving of the electronic device and/or the time consumed by the device for reading the first data may be reduced.
And, the data processing thread can directly read the first data in the disk to the cache memory without firstly reading the first data in the disk to PAGE CACHE and then copying the first data in PAGE CACHE to the cache memory through the CPU, so that the waste of memory resources is avoided, and meanwhile, the waste of CPU calculation power caused by copying the data in other memories to the cache memory by the CPU is avoided.
The following will specifically describe the differences between the memory application method provided by the embodiment of the present application and the memory application method in the related art by comparing the memory application method in the related art with the memory application method provided by the embodiment of the present application.
In the related art, it is assumed that DMA devices (e.g., GPU and APU) of an electronic device need to read model files of an AI large model, which include weight files therein. Therefore, the User space may apply for the DMA-BUF memory, as shown in fig. 8, at this time, the ioctl program may trigger the User space to send a memory application request corresponding to the model file to the system of the electronic device, where the memory application request is used to request for applying for the DMA-BUF memory (i.e. the cache memory in the foregoing embodiment), so that the system of the electronic device may receive the memory application request, execute the DMA-BUF memory application operation according to the memory application request, and under the condition that the DMA-BUF memory application is completed, the User space may open the first data (e.g. the model file of the AI large model), and obtain the User address space (UserVaddr) according to the physical address mapping of the DMA-BUF memory, so that the User space may apply for PAGE CACHE first, then read the model file of the AI large model in the disk into the PAGE CACHE, and copy the model file of the AI large model in the PAGE CACHE into the User address space (User Vaddr) through the CPU, thereby completing the reading of the model file of the AI large model into the DMA BUF.
As can be seen from fig. 8 and the above description, in the related art, the first data read request is waited for until the DMA-BUF memory application is completed, and an additional application PAGE CACHE is required in the file read process.
However, in an embodiment of the present application, it is assumed that the DMA devices (e.g., GPU and APU) of the electronic device need to read the model files of the AI large model, including the weight files therein. Therefore, the user space may apply for the DMA-BUF memory, as shown in fig. 9, where the ioctl program may trigger the user space to send a memory application request corresponding to the model file to the system of the electronic device, where the memory application request is used to request for applying for the DMA-BUF memory (i.e. the cache memory in the foregoing embodiment), so that the system of the electronic device may receive the memory application request, and execute the DMA-BUF memory application operation according to the memory application request, in the process of applying for the DMA-BUF memory, the system of the electronic device may record an applied cache memory block page into the first list, and determine, when the memory size of the page recorded in the first list matches with a preset memory size (chunk limit), the page recorded in the first list as a first cache memory block, and record the memory size of the first cache memory block and a first data location corresponding to the first data (e.g. the model file of the AI large model). Therefore, the system of the electronic device can empty the cache memory block page recorded in the first list while submitting the task 1 to the data processing thread 1, and record the applied cache memory block page into the first list. In this way, when the data processing thread 1 is in the sleep state, the system of the electronic device can wait for the end of the sleep of the data processing thread 1, and when the data processing thread 1 exits the sleep state, the system of the electronic device can directly read the first portion corresponding to the first cache memory block into the first cache memory block through the data processing thread 1. And so on, the electronic device can repeat the steps until the model file of the AI large model is read into the DMA-BUF memory.
It can be understood that, while the system of the electronic device reads the first portion into the first cache memory block, the system of the electronic device may also apply for the second cache memory block and read the second portion into the second cache memory block, that is, the system of the electronic device may apply for the DMA-BUF memory in a pipeline parallel manner and read the model file of the AI large model into the DMA-BUF memory, so that the driving of the electronic device and/or the time consumed for the device to read data may be reduced.
The following will specifically describe the beneficial effects of the memory application method provided by the embodiment of the application in terms of an empty scene and a reload scene.
In no-load scene, no pressure is applied to the memory at this time, and the problems in the related art are mainly reflected in the pressure of the memory and the power consumption level caused by PAGECACHE memory application and CPU copying.
In a heavy-load scene, usually, a multi-open condition is applied, and at the moment, the memory pressure is larger, and the problems in the related art include the memory and the power consumption pressure in the idle scene, and the CPU power consumption pressure caused by memory recycling.
However, in the memory application method provided by the embodiment of the application, the largest basic point of optimization for the no-load scene is that intermediate state memory (i.e. PAGE CACHE memory) and CPU copy are omitted by using a DIO mode. For the memory recovery effect of the heavy-load scene, the application-IO asynchronous CHUNK synchronous mode is adopted, so that the time consumption of the memory application can hide IO time, and the total time is faster than the time of the pure serial memory application plus IO.
The current project test can reduce the time for reading the model file of the AI 7B large model from 2s to 1s at maximum under the no-load scene, and the average time is 1.2s. As shown in fig. 10, taking the example of reading the model file of the AI large model by the camera device, if the data size of the model file of the AI large model is 500MB, the memory application method in the related art needs 1283.5ms to read the model file, and the memory application method provided by the embodiment of the application only needs 309.0ms to read the model file. If the data size of the model file of the AI large model is 1.2GB, then the memory application method in the related art needs 1819.6ms to read the model file, and the memory application method provided by the embodiment of the application only needs 620.9ms to read the model file. If the data size of the model file of the AI large model is 1.6GB, then the memory application method in the related art needs 2379.2ms to read the model file, and the memory application method provided by the embodiment of the application only needs 751.3ms to read the model file.
According to the memory application method provided by the embodiment of the application, the execution main body can be a memory application device. In the embodiment of the present application, a memory application device executes a memory application method as an example, which describes a memory application device provided in the embodiment of the present application.
Fig. 11 is a schematic structural diagram of a memory application device according to an embodiment of the present application. As shown in fig. 11, the memory application device 50 provided in the embodiment of the present application may include an application module 51 and a reading module 52.
The application module 51 is configured to execute a memory application operation according to the memory application request when receiving the memory application request corresponding to the first data. The reading module 52 is configured to read a first portion of the first data to the first cache memory block when the application module 51 applies to the first cache memory block, where the data size of the first portion matches the memory size of the first cache memory block.
The embodiment of the application provides a memory application device, which reads a first part of data, the data quantity of which is matched with the memory size of a first cache memory block, into the first cache memory block without waiting for the completion of the whole cache memory application under the condition that the memory application device applies for the first cache memory block with smaller memory size, so that the waiting time for reading the first data can be reduced, the time consumption for reading the data of the memory application device can be reduced, and the data reading efficiency of the memory application device can be improved.
In one possible implementation, the first data further includes a second portion, and the first portion is adjacent to the second portion. The reading module 52 is further configured to read the second portion to the second cache memory block when the application module 51 applies to the second cache memory block, where the size of the data size of the second portion matches the memory size of the second cache memory block.
In one possible implementation, the memory size of the first cache memory block is the same as or different from the memory size of the second cache memory block.
In one possible implementation manner, the memory size of the first cache memory block and the memory size of the second cache memory block are determined according to the data amount of the first data.
In one possible implementation, the reading module 52 is specifically configured to read the first portion to a virtual storage space corresponding to a first virtual address, where the first virtual address corresponds to a first cache memory block.
In one possible implementation manner, the memory application device 50 provided in the embodiment of the present application may further include a mapping module. The mapping module is configured to map the physical address of the first cache memory block to the virtual storage space before the reading module 52 reads the first portion to the virtual storage space corresponding to the first virtual address, so as to obtain the first virtual address.
In one possible implementation manner, the reading module 52 is specifically configured to read, by the data processing thread, the first portion to the virtual storage space corresponding to the first virtual address according to the first data location of the first portion in the first data and the data amount of the first portion.
In one possible implementation manner, the memory application device 50 provided in the embodiment of the present application may further include a recording module and a determining module. The recording module is used for recording the applied cache memory block to the first list. The determining module is used for determining the memory block recorded in the first list as the first cache memory block under the condition that the memory size of the memory block recorded in the first list by the recording module is matched with the preset memory size.
The memory application device in the embodiment of the application can be an electronic device, or can be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. The electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile Internet Device (MID), an augmented reality (augmentedreality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-mobile personal computer, a UMPC, a netbook, a Personal Digital Assistant (PDA), or the like, and may be a server, a network attached storage (network attached storage, NAS), a personal computer (personal computer, PC), a Television (TV), an teller machine, a self-service machine, or the like, which is not particularly limited.
The memory application device in the embodiment of the present application may be a device with an operating system. The operating system may be an Android operating system, a Linux operating system, or other possible operating systems, and the embodiment of the application is not limited specifically.
The memory application device provided in the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 2 to 10, and in order to avoid repetition, a detailed description is omitted here.
In some embodiments of the present application, as shown in fig. 12, an electronic device 60 is further provided in the embodiments of the present application, which includes a processor 61 and a memory 62, where a program or an instruction capable of running on the processor 61 is stored in the memory 62, and the program or the instruction realizes each process step of the above-mentioned embodiment of the memory application method when being executed by the processor 61, and can achieve the same technical effect, so that repetition is avoided and redundant description is omitted here.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
Fig. 13 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 100 includes, but is not limited to, a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110.
Those skilled in the art will appreciate that the electronic device 100 may further include a power source (e.g., a battery) for powering the various components, and that the power source may be logically coupled to the processor 110 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 13 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.
The processor 110 is configured to execute a memory application operation according to a memory application request when receiving a memory application request corresponding to first data, and read a first portion of the first data to a first cache memory block when applying to the first cache memory block, where a data size of the first portion matches a memory size of the first cache memory block.
The embodiment of the application provides an electronic device, wherein under the condition that the electronic device applies for a first cache memory block with smaller memory size, a first part of which the data volume is matched with the memory size of the first cache memory block is read into the first cache memory block, and the first data can be read into the cache memory without waiting for the completion of the whole cache memory application, so that the waiting time required for reading the first data can be reduced, the time consumption for reading the data of the electronic device can be reduced, and the data reading efficiency of the electronic device can be improved.
In some embodiments of the application, the first data further includes a second portion, the first portion being adjacent to the second portion.
The processor 110 is further configured to read the second portion to the second cache memory block if the second cache memory block is applied, where the size of the data size of the second portion matches the memory size of the second cache memory block.
In some embodiments of the present application, the processor 110 is specifically configured to read the first portion to a virtual storage space corresponding to a first virtual address, where the first virtual address corresponds to a first cache memory block.
In some embodiments of the present application, the processor 110 is further configured to map the physical address of the first cache memory block to the virtual storage space to obtain the first virtual address before reading the first portion to the virtual storage space corresponding to the first virtual address.
In some embodiments of the present application, the processor 110 is specifically configured to read, by the data processing thread, the first portion to the virtual storage space corresponding to the first virtual address according to the first data location of the first portion in the first data and the data amount of the first portion.
In some embodiments of the present application, the processor 110 is further configured to record the requested cache memory block to the first list, and determine the cache memory block recorded in the first list as the first cache memory block if the memory size of the cache memory block recorded in the first list matches the preset memory size.
It should be appreciated that in embodiments of the present application, the input unit 104 may include a graphics processor (graphics processing unit, GPU) 1041 and a microphone 1042, the graphics processor 1041 processing image data of still pictures or video obtained by an image capturing device (e.g. a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes at least one of a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
Memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 109 may include volatile memory or nonvolatile memory, or the memory 109 may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM), static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (doubledata RATE SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINKDRAM, SLDRAM), and direct memory bus random access memory (directrambus RAM, DRRAM). Memory 109 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
The processor 110 may include one or more processing units, and optionally the processor 110 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 110.
The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above-mentioned memory application method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the memory application method embodiment, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the processes of the embodiments of the memory application method described above, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.