[go: up one dir, main page]

CN121029323A - Accelerator equipment and control methods for accelerator equipment - Google Patents

Accelerator equipment and control methods for accelerator equipment

Info

Publication number
CN121029323A
CN121029323A CN202511556607.2A CN202511556607A CN121029323A CN 121029323 A CN121029323 A CN 121029323A CN 202511556607 A CN202511556607 A CN 202511556607A CN 121029323 A CN121029323 A CN 121029323A
Authority
CN
China
Prior art keywords
acceleration
unit
target
function
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202511556607.2A
Other languages
Chinese (zh)
Inventor
张德闪
郭振华
李军
王丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Metabrain Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Metabrain Intelligent Technology Co Ltd filed Critical Suzhou Metabrain Intelligent Technology Co Ltd
Priority to CN202511556607.2A priority Critical patent/CN121029323A/en
Publication of CN121029323A publication Critical patent/CN121029323A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application discloses an accelerator device and a control method of the accelerator device, relating to the technical field of computers, wherein the accelerator device is divided into a static area and a dynamic area, the static area comprises virtual function device state information and acceleration unit state information, the dynamic area comprises at least one acceleration function unit, the accelerator is used for traversing the accelerating unit state information table according to the received user requirements to determine a target accelerating function unit from at least one accelerating function unit, traversing the virtual function device state information table to determine target virtual function devices matched with the target accelerating function unit, and binding the target accelerating function unit and the target virtual function devices. Therefore, the problem of low utilization rate of the acceleration unit caused by the fixed binding of the acceleration functional unit and the virtual functional unit is solved, and the application can enable a single acceleration functional unit to be capable of time-sharing service of acceleration tasks from a plurality of virtual functions and improve the utilization rate of the acceleration functional unit.

Description

Accelerator device and control method of accelerator device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an accelerator apparatus and a control method of the accelerator apparatus.
Background
In data centers, accelerator devices are used to accelerate a wide variety of applications. To support that a Single device may be shared by multiple Virtual machines and ensure high performance, it is generally based on SR-IOV (Single Root I/O Virtualization) technology, so that an accelerator device may create 1 physical Function device PF (Physical Function ) and multiple Virtual Function devices VF (Virtual Function), where the VFs are independent of each other and may be respectively allocated to different Virtual machines, so that multiple Virtual machines may share the accelerator device.
In the related art, the acceleration functional units and the virtual functional devices are in a static one-to-one binding relationship, and each acceleration functional unit only serves one specific VF. However, the related art may cause the utilization of the acceleration functional unit to be not high, because the acceleration functional unit AFU is completely idle when the VF to which the acceleration functional unit is subordinate does not execute the acceleration task, but other VFs cannot use it due to the access path isolation, thereby causing the utilization of the acceleration functional unit to be not high.
Disclosure of Invention
The application provides accelerator equipment and a control method of the accelerator equipment, which at least solve the problem that when a virtual function fixedly bound by an acceleration functional unit does not have an acceleration task, the corresponding acceleration functional unit is completely idle, so that the utilization rate of the acceleration functional unit is low.
The present invention provides an accelerator device, the field programmable gate array being divided into a static area comprising virtual function device state information and acceleration unit state information, and a dynamic area comprising at least one acceleration function unit, wherein,
The acceleration unit state information table is configured to store state information of at least one acceleration function unit within the dynamic area, the virtual function device state information is configured to store state information of at least one virtual function device, wherein,
The accelerator is used for traversing the accelerating unit state information table according to the received user requirements to determine a target accelerating function unit from the at least one accelerating function unit, traversing the virtual function device state information table to determine target virtual function devices matched with the target accelerating function unit, and binding the target accelerating function unit and the target virtual function devices.
The invention also provides a control method of the accelerator device, which is applied to the accelerator device, wherein the method comprises the following steps:
judging whether a user demand is received or not;
If the user demand is received, traversing the acceleration unit state information table to determine a target acceleration functional unit from the at least one acceleration functional unit based on the user demand;
And traversing the virtual function device state information table to determine target virtual function devices matched with the target acceleration function units, and binding the target acceleration function units and the target virtual function devices.
According to the method and the device, the target acceleration functional unit is determined from at least one acceleration functional unit by traversing the acceleration unit state information table according to the received user requirements, the target virtual functional device matched with the target acceleration functional unit is determined by traversing the virtual functional device state information table, and the target acceleration functional unit and the target virtual functional device are bound. Therefore, the problem that when the acceleration function unit is not used for accelerating the task, the corresponding acceleration function unit is completely idle, so that the utilization rate of the acceleration function unit is low is solved.
Drawings
For a clearer description of embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a schematic diagram of a related art accelerator principle;
FIG. 2 is a block schematic diagram of an accelerator apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an accelerator apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an acceleration functional unit module according to an embodiment of the present invention;
fig. 5 is a flowchart of a control method of an accelerator apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present invention.
It should be noted that in the description of the present invention, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "first," "second," and the like in this specification are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
Before describing the accelerator device of the embodiment of the present application, a system architecture of a conventional accelerator device is described, as shown in fig. 1, where the system architecture is divided into two parts, namely a static area and a dynamic area, where the dynamic area includes acceleration units AFU that implement various functions, such as compression, encryption, decryption, etc., and the area can implement dynamic function update through a partial reconfiguration technique, the static area implements an SR-IOV technique, creates PFs and multiple VFs, and adds a mapping module to implement dynamic combination of the VFs and different functional acceleration units. In this architecture design, the acceleration functional unit is in binding relation to the VF, and because the SR-IOV technology requires that the VFs are isolated and independent from each other, the acceleration functional unit can only serve the bound VF. However, when the virtual function to which the acceleration function unit is fixedly bound has no acceleration task, the corresponding acceleration function unit is completely idle, resulting in a problem of low utilization rate of the acceleration function unit. In order to solve the above problems, the present application provides an accelerator device, which supports a single accelerator function unit AFU to bind with a plurality of virtual devices VF, and implements that the single accelerator function unit AFU can time-share service of acceleration tasks issued by a plurality of VFs, thereby improving the utilization rate of the AFU.
The present invention will be further described in detail below with reference to the drawings and detailed description for the purpose of enabling those skilled in the art to better understand the aspects of the present invention.
An embodiment of the present invention provides an accelerator apparatus, and fig. 2 is a block diagram of an example of an accelerator apparatus according to an embodiment of the present invention.
As shown in fig. 2, the accelerator device 10 is divided into a static area 100 and a dynamic area 200, the static area 100 including virtual function device state information and acceleration unit state information, and the dynamic area 200 including at least one acceleration function unit.
The accelerator device is a device for accelerating a specific type of computing task, the accelerating function refers to a specific task or function executed by the accelerator, such as compression, encryption, decryption and the like, the static area is an area where a logic function is fixed during the running of the computing task, the dynamic area is an area where the logic function can be dynamically replaced through partial reconfiguration during the running of the computing task, the accelerating unit state information table is configured to store state information of at least one accelerating function unit in the dynamic area, and the virtual function device state information is configured to store state information of at least one virtual function device.
The accelerator is used for traversing the accelerating unit state information table according to the received user requirements to determine a target accelerating function unit from at least one accelerating function unit, traversing the virtual function device state information table to determine target virtual function devices matched with the target accelerating function unit, and binding the target accelerating function unit and the target virtual function devices.
It should be noted that the state information of the virtual function device includes at least one of a virtual function device identification number, a supportable number of interrupts, an access space capacity of a register included in the virtual function device, a priority weight, and a binding state with the acceleration function unit, and the state information of the acceleration function unit includes at least one of an acceleration unit index number, an acceleration unit function identification number, an occupied space capacity of at least one set of context configuration registers, a maximum supportable number of contexts, an assignable number of contexts, an assigned context number, and an interrupt number.
Specifically, as shown in fig. 3, when the system receives an acceleration service request from a user, the system first traverses the state information of the acceleration unit according to the type of the acceleration function specified in the request, and screens out the target acceleration function unit with corresponding function and still available resources. The system then traverses the virtual function device state information table to find the target virtual function device that is currently in an idle state and has not bound any acceleration function units. After the matched target acceleration functional unit and the target virtual functional device are successfully determined, the system establishes a binding relation between the target acceleration functional unit and the target virtual functional device, not only updates records in two state information tables, but also establishes an access path at a hardware level, so that the target virtual functional device (VF) can access and configure an independent context register group distributed for the target Acceleration Functional Unit (AFU) through a Base Address Register (BAR) thereof, and the exclusive and isolated use of the acceleration function is realized.
Thus, in the related art, a VF usually forms a fixed "one-to-one" binding relationship with an AFU, and when the VF has no computing task, the bound AFU cannot be used by other needed VFs even if the bound AFU is in an idle state, resulting in long-term idle and waste of hardware resources. The implementation of the invention removes the restriction of static binding by introducing two dynamic management tables of acceleration unit state information and virtual function equipment state information into a static area. The system may dynamically assign independent contexts of an AFU to different VFs based on real-time user demand. That is, one AFU may be time-multiplexed by a plurality of VFs. For example, when the task of VF1 is completed, its occupied context is released and the system can immediately assign another free context of the AFU to the newly requested VF2. Because the hardware resources of the AFU are in the state of being scheduled to execute tasks in most of time, and are not forced to be idle due to binding relation, the utilization rate of the hardware resources of the whole accelerator is fundamentally improved, and the problem of low utilization rate caused by static allocation of the resources in the traditional architecture is effectively solved.
According to one embodiment of the present invention, the accelerator apparatus further comprises a register access interface, an interrupt interface, and a memory access interface.
The register access interface is configured to receive a first access request sent by the virtual function device and map the first access request to a configuration module of a corresponding acceleration function unit.
The interrupt interface is configured to receive an interrupt request sent by the acceleration functional unit, map the interrupt request to a corresponding virtual functional device, and send the interrupt request.
The memory access interface is configured to receive a second access request of the acceleration functional unit and map to a target memory storage space.
Specifically, as shown in connection with FIGS. 3 and 4, the static area 100 is used to implement SR-IOV technology, create a PF and a plurality of VFs, and add a plurality of modules and table structures to support the dynamic incorporation of acceleration units with the plurality of VFs. The dynamic region 200 contains various acceleration functional units implemented, each of which contains 3 interfaces, a register access interface, an interrupt interface, and a memory access interface (DDR), to enable host access to acceleration unit registers, acceleration units to send interrupts to the host, and acceleration units to access the device DDR memory space. In order to support that a single acceleration functional unit AFU can be bound with a plurality of VFs and realize that the single acceleration functional unit AFU can time-share service of acceleration tasks issued by the plurality of VFs, the patent reconstructs the framework of the acceleration functional unit, and the framework comprises 3 modules, as shown in fig. 4.
In implementation execution, when a VF of a bound AFU needs to configure task parameters, it will initiate a register write operation to its BAR mapped address space, which is received as a "first access request" via the register access interface. The interface module redirects the request to the exclusive context configuration register group in the target AFU configuration module according to the binding relation between the current VF and the AFU (through the mapping table query of the static area), and completes the parameter writing. When the execution unit of the AFU completes the task, an interrupt request is sent to the system through the interrupt interface, the request comprises identification information of a target VF, the interrupt interface precisely delivers an interrupt signal to the VF according to the request, the task is notified to be completed, and the result can be read. In the task execution process, if the AFU needs to read the data to be processed from the host memory or write the result back to the memory, it will initiate a "second access request" through the memory access interface, where the interface is responsible for converting the logical address in the request into a physical address, and transmitting the data to the storage space of the target memory through a DMA (direct memory access) mechanism, so as to implement efficient data handling.
Therefore, in the related art, the communication paths of the VF and the AFU are single and static, and dynamic binding and multi-VF sharing scenes are difficult to support. Firstly, the register access interface enables a plurality of VFs to safely multiplex the register resources of the same AFU without configuration conflict, and secondly, the interrupt interface ensures that task completion notification can be accurately sent to the VF initiating the request, interrupt confusion is avoided, and independence and response instantaneity of each virtual machine are ensured. Finally, the memory access interface provides the capability of the AFU to directly access external memory, avoiding repeated copying of data between the CPU and the accelerator, and significantly reducing processing delay and CPU overhead.
According to one embodiment of the invention, the acceleration functional unit comprises a configuration module, a scheduling module and an execution module.
The system comprises a configuration module, wherein the configuration module comprises at least one group of context configuration registers, and the at least one group of context configuration registers comprise a starting command register which is configured to start corresponding acceleration tasks according to received setting signals.
The scheduling module comprises a state register and a scheduling unit, wherein the state register is configured to represent whether the context configuration register is configured or not, and the scheduling unit is configured to execute corresponding scheduling actions based on a preset scheduling strategy.
The execution module comprises a context register and an execution unit, wherein the context register is configured to record the context sequence number of the current scheduled execution, and the execution unit is configured to read context configuration register information corresponding to the context sequence number from the configuration module and execute the functional task corresponding to the acceleration functional unit.
It should be noted that the preset scheduling policy includes at least one of a fixed priority scheduling policy, a polling scheduling policy and a weighted polling scheduling policy, and the maximum number of groups of the context configuration registers is determined according to the maximum number of virtual function devices that can support binding by the corresponding acceleration function unit.
According to one embodiment of the invention, the scheduling module further comprises a pointer register.
The pointer register is configured to record a last scheduling position, so as to search from the last scheduling position recorded by the pointer register when the preset scheduling policy is a polling scheduling policy.
According to one embodiment of the invention, the scheduling module further comprises at least one weight register.
Wherein the weight register is configured to characterize the importance of the context configuration register.
In particular, the configuration module includes N sets of context configuration registers, where N is the maximum VF number that the AFU can support for binding, a set of register lists that the context configuration registers must have for this AFU to execute specifically. The number and the sequence of each group of the context configuration registers are consistent, and the application program must contain a starting command register, and the starting command register is set by the VF to inform the dispatching module that the group of the context configuration registers is configured and completed, and the application program has the condition of executing the AFU function. When the execution module reads the set of context configuration registers to execute the AFU function, the start command register is reset, indicating that the task has started, preventing repeated execution.
The scheduling module comprises at least one status register and one scheduling unit. The bit number of the status register at least comprises N bits, is used for representing whether a certain group of context configuration registers in the configuration module are configured, and has the condition of executing the AFU function. If a bit is 1, the corresponding context configuration register is configured to be executable, and if the bit is 0, the corresponding context configuration register is not executable. The dispatch unit checks the status register for the presence of a bit of 1. And if the certain bit is 1, the scheduling execution module executes the AFU task. The scheduling unit may select a plurality of different strategies when scheduling in particular. For example, (1) a fixed priority policy, which bit is the highest priority when it is determined to find 1 from the low bit to the high bit in turn, (2) a polling schedule, which is to add a pointer register to record the last scheduled position and find the next scheduled position, and (3) a weighted polling, which is to add a Weight register Weight to each set of context configuration register list in the configuration module, and when scheduling, select one context with the largest Weight register value for scheduling in all contexts with the status register of 1.
The execution module comprises at least one context register and an execution unit. The context register is used for recording the context sequence number of the current scheduled execution. The execution unit reads the context configuration register information corresponding to the context sequence number from the configuration module and executes the AFU function task. During execution, data is read through the DDR access interface. And when the task execution is finished, notifying the VF that the task is finished through the interrupt interface, and notifying a scheduling module. After receiving the notification, the scheduling module resets the bit corresponding to the status register, and then schedules the next task.
In the actual execution process, after a certain virtual function device (VF) completes the parameter writing of the context configuration register bound with the VF, a corresponding start command register is set, and a corresponding bit in the operation synchronization trigger state register is set to be 1, so that the context is ready to be executed. The scheduling unit of the scheduling module continuously polls the status register and determines the task to be executed next according to a preset scheduling strategy (such as fixed priority, polling or weighted polling). If the polling strategy is adopted, the scheduling unit starts scanning from the last scheduling position recorded by the pointer register, so that the processed item is prevented from being repeatedly checked, the scheduling efficiency is improved, and if the polling with the weight is adopted, the execution opportunity is allocated according to the weight proportion by referring to the numerical value in the weight register. After the task is selected, the scheduling unit writes the context sequence number of the task into a context register of the execution module and triggers the execution unit. The execution unit reads all configuration information corresponding to the context sequence number from the configuration module according to the configuration information, executes corresponding acceleration tasks, informs the corresponding VF through an interrupt mechanism after completion, and meanwhile, the scheduling unit clears the ready flag in the status register to release the context resource for subsequent tasks.
Thus, the AFU in the related art is usually designed for a single task flow, and concurrent requests from multiple VFs cannot be perceived or processed, so that an independent AFU instance must be allocated to each VF, resulting in resource waste. The embodiment of the invention enables one AFU to manage N groups of independent context configuration at the same time, and each context corresponds to an independent task environment of one VF. When tasks of a plurality of VFs are ready successively, the state registers uniformly mark ready states, and the scheduling unit performs arbitration and scheduling according to strategies. For example, under the polling strategy, the round-robin scanning is realized through the pointer register to ensure that each VF task obtains a fair execution opportunity, and under the polling with weight, the key service VF can be configured with higher weight to obtain more frequent scheduling, thereby meeting the differentiation requirement. The execution unit can rapidly switch the context and execute tasks of different VFs under the control of the scheduling unit, so that the use efficiency of the AFU can be improved. Therefore, the embodiment of the invention enables the physical resource of the AFU to be efficiently shared by a plurality of VFs through an internal time division multiplexing mechanism, and obviously improves the processing capacity of unit hardware resource and the utilization rate of the whole accelerator.
Here, each module in the static area 100 will be described.
(1) The state information of the virtual function device may be as shown in table 1, where the state information of all virtual function devices VF included in the static area 100 is stored in the state information of the virtual function device, and table 1 is a state information table of the virtual function device.
TABLE 1
Each row represents the state information of a virtual function device VF, and specifically comprises (1) a VF identification number used for identifying the corresponding VF and (2) the number of MSI-X interrupts supported by the VF. (3) The size of the register access space contained in the VF, the request of the host to access the space is routed to the register access interface of a certain acceleration unit contained in the VF, and then the register of that acceleration unit is accessed, so that the size of the space is not smaller than the size of the context register of the acceleration functional unit bound by the VF. (4) Priority weight, optional, if the VF priority is supported, different priority weights are set for different VFs, and the larger the weight value is, the higher the priority is indicated. (5) Whether the VF is bound with the acceleration functional unit or not, 0 represents unbound, and 1 represents bound.
(2) The acceleration unit state information may be as shown in table 2, and the acceleration unit state information stores state information of all acceleration function units included in the dynamic area 200, and table 2 is an acceleration unit state information table.
TABLE 2
Each row represents the state information of an acceleration unit, and specifically comprises (1) an acceleration unit index number for identifying the number of the acceleration unit in a dynamic area, (2) an acceleration unit function identification number for identifying the function of the acceleration unit, wherein the acceleration unit function identification number is in UUID format, (3) the space occupied by each group of context configuration registers for explaining the space occupied by the registers when the acceleration unit executes the function, (4) the maximum supportable context number for indicating the maximum number of the binding VFs when the acceleration unit configuration module contains the group number of the context configuration registers, (5) the allocable context number for indicating the number of the binding VFs, and (6) the allocated context number for indicating the number of the binding VFs, wherein each bit represents a number, such as 10101010b, for indicating the number of the context numbers 1, 3,5, 7, and 7, binding with the VFs, 0, 2, 4 and 6 numbers to be allocated, and (7) the interrupt number for indicating the number of the interrupt needed by the acceleration unit.
Here, a binding procedure of the virtual function VF and the acceleration function unit is described.
After receiving the user's request for the accelerator device, for example, the user requests that the accelerator device include function a, the data center may traverse the two tables included in the accelerator device, the VF state information table and the acceleration function state information table, and determine whether a certain accelerator device meets the request. The specific determination flow is as follows:
The acceleration unit state information table is consulted to determine if the device contains the desired function a and if there is any context available for allocation. If not, the acceleration device does not meet the requirements, and the next acceleration device is queried. Otherwise, the next step is performed.
The method comprises the steps of (1) inquiring a virtual function state information table to determine whether a state of a virtual function is matched with an acceleration function unit, namely (1) the virtual function VF is not bound with the acceleration function unit, 2) a register access space contained in the virtual function VF is not smaller than a context configuration register space of the acceleration function unit, and 3) the MSI-X interrupt number supported by the virtual function VF is not smaller than the interrupt number of the acceleration function unit. If the requirements are met, the acceleration equipment meets the requirements, otherwise, the acceleration equipment does not meet the requirements, and the next acceleration equipment is inquired.
According to one embodiment of the invention, the static area further comprises a virtual function device and acceleration function unit mapping table.
Wherein the virtual function device and acceleration function unit mapping table is configured to store a mapping relationship between the target acceleration function unit and the target virtual function device.
Specifically, after the matching and binding decisions of the target AFU and the target VF are completed according to the user requirements, a specific mapping entry may be written into the virtual function device and acceleration function unit mapping table. In the subsequent operation process, the mapping table of the virtual function device and the acceleration function unit is used as a core basis for redirecting the hardware access path. For example, when a certain VF initiates a register access request, the register access routing module will first query the mapping table, find out the AFU identifier and the context number bound to the VF according to the VF identifier of the initiated request, and then redirect the original access request to the configuration register set of the specified context inside the AFU. Similarly, when the AFU finishes executing the task and needs to send the interrupt, the interrupt interface also determines the target VF which should receive the interrupt by inquiring the mapping table, so as to ensure that the interrupt signal is accurately sent. The contents of the table are dynamically updated along with the unbinding or rebinding of the VF, and the table is a hub for realizing the dynamic scheduling of AFU resources and the sharing of multiple tenants.
In the actual execution process, after the accelerator equipment meeting the user requirement is found, the accelerator management program distributes an assignable context serial number of an acceleration unit through a configuration interface provided by the PF of the accelerator, and realizes the binding of the VF and the acceleration unit. The specific operation is to configure the VF and acceleration unit mapping table.
The VF and acceleration unit mapping table is shown in Table 3, and the VF and acceleration unit mapping table stores each VF and the acceleration function unit information bound by the VF, and Table 3 is the VF and acceleration unit mapping table.
TABLE 3 Table 3
Each row specifically comprises (1) a VF identification number for identifying the corresponding VF, (2) an acceleration unit index number for identifying the acceleration functional unit in a dynamic zone, (3) an allocated context sequence number for indicating which context configuration register of the acceleration functional unit the VF corresponds to, (4) ddr_start and ddr_size for indicating the VF and the DDR interval accessible by the acceleration functional unit.
After the mapping table of the VF and the acceleration unit is configured, the binding of the VF and the acceleration functional unit is realized.
Thus, in the related art, the binding relationship of VF and AFU is static and hard-wired, and a programmable middle layer is lacking to manage the relationship, so that resources cannot be reused. According to the embodiment of the invention, the idle AFU context can be flexibly distributed to any idle VF with requirements according to the real-time load condition through the dynamically configurable mapping table of the virtual function device and the acceleration function unit, in addition, the virtual function device and the acceleration function unit mapping table can provide uniform addressing basis for all cross-function communication, so that the request and the response of each VF can be accurately routed to the corresponding context under the resource sharing environment, and isolation is realized.
According to one embodiment of the invention, the static area further comprises a register access routing module and a register access arbitration module.
The register access routing module is configured to query a mapping table of the virtual function device and the acceleration function unit according to a first access sent by the virtual function device, determine a corresponding acceleration function unit number and a context register group number, generate a new access request according to the corresponding acceleration function unit number and the context register group number and a target context register address in the first access request, and transmit the new access request to the register access arbitration module.
The register access arbitration module is configured to determine a corresponding acceleration functional unit according to a corresponding acceleration functional unit number in the new access request, and send a context register group number and a target context register address in the new access request to a configuration module of the corresponding acceleration functional unit through a register access interface of the corresponding acceleration functional unit, so as to perform register access.
Specifically, when a VF needs to configure its bound AFU, a register write or read operation is initiated, which enters the register access routing module as a "first access request". The routing module firstly queries a mapping table of virtual function equipment and acceleration function units according to the VF identification number in the request, and obtains a corresponding AFU number and a context register set sequence number. The routing module then combines the destination register address in the original request to generate a new access request containing the destination AFU number, the context register set number, and the destination address. This new request is sent to the register access arbitration module. The arbitration module identifies the target AFU according to the AFU number in the new request, and determines the processing sequence according to a preset rule (such as priority or polling) when a plurality of concurrent requests exist. Finally, the arbitration module transmits the sequence number of the context register group and the target address to the configuration module of the AFU through the register access interface of the target AFU, and completes register access.
In the actual execution process, after the VF and acceleration unit mapping table is configured, the binding of the VF and the acceleration functional unit is realized. This VF is then assigned to the user's virtual machine for delivery to the user for use.
The following describes in detail how to configure a certain set of context registers of an acceleration functional unit by a VF, trigger the acceleration functional unit to execute, and send an interrupt to the VF when the acceleration functional unit execution ends. The specific flow is as follows:
(1) Configuring a set of context registers of the acceleration functional unit by the VF:
1. assuming that the context register address to be configured is Addr, the user generates a PCIe access request to the accelerator device by accessing an address in the VF device BAR space that is offset to Addr.
2. The access request is transmitted to a register access routing module of the static area, and the register access routing module firstly indexes the VF and an acceleration unit mapping table according to the VF identification number of the request, and finds the content corresponding to the VF, namely the acceleration function unit number bound by the VF and the corresponding context register set serial number. Then, a new register access request is formed according to the obtained acceleration functional unit number, the context register group number and the specific context register address Addr.
3. The request is passed to a register access arbitration module. When the request is executed, the acceleration unit is selected according to the number of the acceleration functional unit, and then the context register group serial number and the specific context register address Addr are transmitted to the configuration module of the acceleration functional unit through the register access interface of the acceleration unit, so that the access to the specific register is completed.
(2) Triggering the acceleration functional unit to execute:
1. First, according to the above-described register access flow, the write operation to the start command register included in the corresponding context register group is completed.
2. Subsequently, the configuration module notifies the scheduling module, and sets the corresponding bit of the status register in the scheduling module to 1.
3. And if the current AFU is not executing the task, executing a scheduling strategy by a scheduling unit in the scheduling module, and scheduling task execution according to the state register.
Thus, in a conventional architecture without a routing and arbitration mechanism, if multiple VFs share one AFU, their register access paths will collide, resulting in a configuration error or a system crash. According to the embodiment of the invention, by introducing the register access routing module, the accurate redirection of the access request is realized, so that the request of each VF can be correctly mapped to the exclusive context register group in the AFU, configuration isolation is ensured, and illegal access across contexts is prevented. In addition, the introduction of the 'register access arbitration module' solves the problem of resource competition when multiple VFs access the same AFU concurrently. When task configuration requests of a plurality of VFs arrive at the same time, the arbitration module avoids hardware bus conflict and data damage through ordered scheduling, and ensures that the AFU configuration module can stably and reliably receive and process each request.
According to one embodiment of the invention, the static area further comprises an interrupt interface arbitration module and an interrupt routing module.
The interrupt interface arbitration module is configured to add a corresponding acceleration function unit number to the received interrupt request, and transmit the interrupt request with the acceleration function unit number added to the interrupt routing module.
The interrupt routing module is configured to query a mapping table of the virtual function device and the acceleration function unit according to the interrupt request added with the acceleration function unit number, determine the corresponding virtual function device, and send the interrupt request to the corresponding virtual function device.
Specifically, when a certain AFU completes a task and needs to notify its bound VF, an interrupt request is initiated to the interrupt interface arbitration module. The arbitration module firstly arbitrates interrupt requests which are possibly sent by a plurality of AFUs at the same time, ensures that the system bus does not collide, and embeds the AFU number of the request source in the arbitrated interrupt request. The interrupt request carrying the AFU number is then passed to the interrupt routing module. The interrupt routing module queries a mapping table of virtual function equipment and acceleration function units according to the AFU number and in combination with the context sequence number provided by the AFU internal execution module, and locates the original VF initiating the task.
In the actual execution process, the acceleration function unit transmits interrupt to the VF when the execution is finished:
1. After the acceleration functional unit has finished executing, an interrupt request is created, which includes the context sequence number stored in the execution module context register.
2. The interrupt request is transferred to the interrupt interface arbitration module through the interrupt interface of the acceleration functional unit, and the interrupt interface arbitration module adds the acceleration functional unit number in the interrupt request and transfers the request to the interrupt routing module.
3. And the interrupt routing module queries the VF and acceleration unit mapping table according to the acceleration function unit number and the context serial number, determines the corresponding VF number, and finally transmits the interrupt to the corresponding VF.
Thus, in the related art, each VF typically has exclusive access to one AFU, the interrupt path is fixed, no complex routing is required, in the embodiment of the invention, tasks of a plurality of VFs may be executed by the same AFU and interrupt, and if no effective mechanism exists, the system cannot distinguish interrupt sources, so that error notification or interrupt loss is caused. The scheme solves the bus competition problem of concurrent interruption of a plurality of AFUs through the interrupt interface arbitration module, avoids the loss of interrupt signals or the blocking of a system, and improves the stability and the real-time response capability of the system. In addition, the interrupt routing module combines the virtual function device and the acceleration function unit mapping table, and inverse accurate mapping of the interrupt is realized.
In addition, after the virtual machine is closed and the VF device is released, the accelerator management program realizes the unbinding of the VF and the acceleration unit through a configuration interface provided by the PF of the accelerator. The specific operation steps are as follows:
1. And inquiring the VF state information table according to the VF identification number, and setting the binding state of the VF state information table and the acceleration functional unit to be 0 to represent the unbound state.
2. And inquiring the VF and acceleration unit mapping table according to the VF identification number to obtain the acceleration function unit number and the assigned context sequence number contained in the VF and acceleration unit mapping table. The allocatable context number available_ contexts and the allocated context sequence number allocated_ contexts _number of the corresponding acceleration functional unit in the acceleration unit state information table are then updated.
3. And finally deleting the content corresponding to the VF in the VF and acceleration unit mapping table.
According to one embodiment of the invention, the static area further comprises a memory access arbitration module and a memory access module.
The memory access arbitration module is configured to add a corresponding acceleration functional unit number to the received second access request, and transmit the second access request with the acceleration functional unit number to the memory access module.
The memory access module is configured to query the virtual function device and the acceleration function unit mapping table according to the second access request after adding the acceleration function unit number, determine a memory space start address and a memory space address length of the target memory storage space, and access the target memory storage space according to the memory space start address and the memory space address length.
Specifically, when a certain AFU needs to read input data from an external DDR or write back the result during execution of a task, a second access request is initiated. The request first enters a memory access arbitration module that arbitrates priority or polling of concurrent requests from multiple AFUs, avoids bus collisions, and embeds the number of its source AFU for requests that pass arbitration. The request carrying the AFU number is then sent to the memory access module. The module queries a mapping table of virtual function equipment and acceleration function unit according to the AFU number, and determines the VF and binding context of the VF currently served by the AFU. Based on the identity of the VF, the memory access module searches the pre-allocated memory area configuration of the VF, and obtains the starting address of the corresponding target DDR memory space and the address length of the allowed access. Finally, the memory access module generates a read-write command conforming to the DDR protocol, and the DDR controller accesses the appointed physical address range to complete data transmission.
Thus, in a non-unified managed architecture, multiple AFUs directly accessing DDR may result in bus contention, request loss or data coverage, the embodiment of the invention solves the concurrent access conflict of the physical layer through the memory access arbitration module, ensures the sequential processing of requests on the DDR bus, and improves the reliability and the bandwidth utilization rate of memory access. In addition, the 'memory access module' combines the 'virtual function device and the accelerating function unit mapping table' to realize the access control and address mapping of the logic layer, ensure that the AFU can only access the memory range authorized by the binding VF thereof, and realize the isolation and protection of storage resources.
In order for those skilled in the art to further understand the accelerator apparatus of the embodiments of the present application, a detailed description will be given below with reference to specific embodiments.
Assuming that the VF state information table and the acceleration unit state information table of the static area of one accelerator are shown in Table 4, table 4 is the acceleration unit state information table, and Table 5 is the VF state information table.
TABLE 4 Table 4
TABLE 5
Assuming that all 2 users need a device whose acceleration function is compression, the UUID (Universally Unique Identifier, universally unique identification code) of the compression function is 4C3A8ECC-0940-6B0A-C2E3-1570822F090C.
From the accelerator unit state information table, the current accelerator contains a compression function, namely the accelerator functional unit numbered 1, and there are also 4 allocable contexts. From the VF state information table, VF1 and VF3 are not yet bound to the acceleration functional unit, so VF1 and VF3 can be arranged to be bound to the acceleration functional unit numbered 1.
From the assigned context sequence number entry (10101010 b) of acceleration functional unit No. 1, the context sequence numbers 1, 3, 5,7 are already bound to the VF, 0, 2, 4, 6 sequence numbers to be assigned. Assume that No. 0 is assigned to VF1 and No. 2 is assigned to VF3. The assigned context sequence number content of acceleration functional unit No. 1 is updated to 10101111b after binding is completed.
The VF and acceleration unit mapping table is configured as shown in Table 6, which shows that VF1 and VF3 are both bound with the acceleration functional unit No. 1, the assigned context sequence numbers are 0 and 2, respectively, and Table 6 is the VF and acceleration unit mapping table.
TABLE 6
Taking the example of a user writing data 0x1000 to a register with address 0x10 through VF3, the register access flow is described:
(1) First, the user initiates a write request by shifting to the BAR space of VF3 an address of 0x 10.
(2) The write request is transferred to the register access routing module of the static area, and the module inquires the mapping table of the VF and the accelerating unit according to the VF identification number 3, determines that the register access address belongs to the accelerating functional unit 1 at the time and the context serial number is 2, thereby constructing a new register access request.
(3) The new register access request is transferred to the register access arbitration module, when the request is executed, the acceleration unit 1 is selected, and then the context register group serial number 2 and the specific context register address 0x10 are transferred to the configuration module of the acceleration functional unit through the register access interface of the acceleration unit 1, so that the register writing operation of the context register group 2 with the address 0x10 is completed.
Therefore, the accelerator device based on the SR-IOV technology is redesigned, so that a single acceleration functional unit can be dynamically bound with a plurality of VFs according to the user requirement, the binding relation of the previous VFs and the acceleration units in one-to-one correspondence is broken through, the acceleration functional unit can serve a plurality of VFs, the idle of the acceleration functional unit is avoided, and the utilization rate of the acceleration functional unit is improved.
According to the accelerator equipment provided by the embodiment of the application, the target acceleration functional unit is determined from at least one acceleration functional unit by traversing the acceleration unit state information table according to the received user requirements, the target virtual functional equipment matched with the target acceleration functional unit is determined by traversing the virtual functional equipment state information table, and the target acceleration functional unit and the target virtual functional equipment are bound. Therefore, the problem that when the acceleration function unit is not used for accelerating the task, the corresponding acceleration function unit is completely idle, so that the utilization rate of the acceleration function unit is low is solved.
An embodiment of the present invention provides a control method of an accelerator apparatus, which is applied to the accelerator apparatus as described above,
As shown in fig. 5, the control method of the accelerator apparatus includes the steps of:
in step S101, it is determined whether a user demand is received.
In step S102, if a user demand is received, a target acceleration functional unit is determined from at least one acceleration functional unit based on the user demand by traversing the acceleration unit state information table.
Specifically, while the system is running, it is continuously monitored and judged whether an acceleration service request from a user arrives. If not, the flow is paused or waited for, if yes, the user demand is received, and the process goes to step S102. In this step, the system parses the user requirements and extracts the acceleration function type it specifies (e.g., by UUID matching). Then, the system traverses all entries in the "acceleration unit state information table" to compare, one by one, whether the function UUID of each AFU matches the demand, and checks whether its "number of allocated contexts" is less than "total number of contexts" to confirm whether there is a free context. When the first AFU satisfying the function match and available resources is found, it is determined to be the target acceleration functional unit, ready for subsequent binding with the virtual function device (VF).
Therefore, the binding relation between the AFU and the VF in the related art is preconfigured and fixed, and cannot be adjusted according to the actual function requirement and the real-time resource state, so that even if the AFU with the matched function and the idle function exists, the system cannot be utilized due to mismatch of the binding relation, and resource waste is caused. The embodiment of the invention introduces a dynamic decision mechanism by judging the user demand and traversing the state table and then matching the function and the resource. Because the system actively inquires the real-time state of all AFUs before each allocation, the target AFU with matched functions and available resources can be accurately positioned, and the accuracy and the high efficiency of resource allocation are ensured.
According to one embodiment of the invention, the target acceleration function unit is determined from at least one acceleration function unit by traversing the acceleration unit state information table based on user requirements, and the method comprises the steps of determining a current acceleration function unit from the at least one acceleration function unit based on the acceleration unit state information table, judging whether the current acceleration function unit meets the user requirements and whether the current acceleration function unit has an allocatable context, and taking the current acceleration function unit as the target acceleration function unit if the current acceleration function unit meets the user requirements and the current acceleration function unit has the allocatable context.
Specifically, the embodiment of the invention selects an item from the accelerating unit state information table as the current accelerating function unit to check. The system compares the function UUID of the current acceleration function unit with the function UUID specified in the user demand to judge whether the function meets the demand. At the same time, the system calculates whether the number of allocated contexts of the AFU is less than the total number of contexts, and if so, indicates that at least one allocatable context exists. When the current acceleration function unit meets the user requirement and the current acceleration function unit has an assignable context, the current acceleration function unit is determined to be the target acceleration function unit, and the traversal process is terminated. If either condition is not met, the system will continue to traverse the next entry in the table until a target AFU is found that meets the condition or the complete table is traversed.
Therefore, the embodiment of the invention ensures that only the AFU with the required function and the idle resource can be selected, thereby fundamentally avoiding the error distribution. In addition, the sequential traversal mechanism stops when the first AFU meeting the condition is found, so that the scheduling overhead is reduced, waiting or retrying caused by resource mismatch is avoided, and the service response time is shortened.
According to one embodiment of the invention, after judging whether the current acceleration functional unit meets the user requirement and whether the current acceleration functional unit has an allocatable context, the method further comprises the steps of determining a new acceleration functional unit from at least one acceleration functional unit based on the acceleration unit state information table again if the current acceleration functional unit does not meet the user requirement or the current acceleration functional unit does not have the allocatable context, taking the new acceleration functional unit as the current acceleration functional unit, and re-executing the step of judging whether the current acceleration functional unit meets the user requirement and whether the current acceleration functional unit has the allocatable context until the target acceleration functional unit is obtained.
Specifically, after the current acceleration function unit performs function matching and resource availability judgment, if any condition is not satisfied, the system does not immediately terminate the allocation flow, and based on the acceleration unit state information table, selects the next AFU that has not been checked from the table as a new acceleration function unit, and assigns it as the new current acceleration function unit. The system then re-performs the steps of determining whether the functions match and determining whether an allocable context exists. The process continues iterating through all AFU entries in the sequentially traversing table until the first AFU is found that meets both functional requirements and resource availability conditions, at which point it is determined to be the target acceleration functional unit and the loop is exited. If the AFU meeting the condition is not found yet after traversing the complete table, the resource is determined to be insufficient, and the allocation failure is returned.
Therefore, the embodiment of the invention introduces a circulation traversing mechanism, inspects each AFU until an available target is found, avoids resource discovery omission caused by inspection sequence, realizes load balancing across a plurality of AFU instances, prevents imbalance phenomenon of idle global resources caused by overload of local resources, and provides reliable resource guarantee for high concurrent and dynamic user loads.
In step S103, the virtual function device state information table is traversed to determine a target virtual function device that matches the target acceleration function unit, and the target acceleration function unit and the target virtual function device are bound.
According to one embodiment of the invention, traversing a virtual function device state information table to determine a target virtual function device matched with a target acceleration function unit comprises determining a current virtual function device from at least one virtual function device based on the virtual function device state information table, judging whether the current virtual function device is not bound with the acceleration function unit function, judging whether the register access space capacity of the current virtual function device is larger than or equal to the occupied space capacity of a context configuration register of the target acceleration function unit, judging whether the supportable interrupt number of the current virtual function device is larger than or equal to the interrupt number of the target acceleration function unit, and taking the current virtual function device as the target virtual function device if the current virtual function device is not bound with the acceleration function unit function, the register access space capacity of the current virtual function device is larger than or equal to the occupied space capacity of a context configuration register of the target acceleration function unit, and the supportable interrupt number of the current virtual function device is larger than or equal to the interrupt number of the target acceleration function unit.
Specifically, after determining the target AFU, the system begins traversing the virtual function device state information table for a matching VF. The system first selects a VF as the current virtual function device for evaluation. The evaluation process includes three determinations of parallel conditions, a first check if the VF is unbound to the acceleration unit function to ensure that it can be allocated, a second comparison of whether the VF's register access space capacity is greater than or equal to the target AFU context configuration register's occupied space capacity to ensure that the VF has sufficient address space to map the AFU's configuration registers, and a third check if the VF's supportable number of interrupts is greater than or equal to the target AFU's required number of interrupts to ensure interrupt communication capability matches. When these three conditions are met simultaneously, the system determines the current virtual function device as the target virtual function device. If either condition is not met, the system will continue to check the next VF in the table until a target VF is found that meets all conditions.
Thus, the hardware resource configurations of different VFs may be different, and if only the binding state is checked and the register space and interrupt capability are ignored, the bound VFs may not work normally. For example, if the BAR space of the VF is insufficient, the configuration register of the AFU cannot be mapped completely, so that the configuration fails, and if the interrupt support number of the VF is insufficient, the task completion interrupt of the AFU cannot be sent, so that the task is blocked. By introducing two conditions of the space capacity of the register and the interrupt number, the embodiment of the invention avoids the error caused by mismatching of resources in running, not only improves the stability of the system, but also avoids the faults caused by configuration errors and improves the utilization rate of the VF.
According to one embodiment of the invention, after judging whether the current virtual function device is not bound with the accelerating unit function and whether the register access space capacity of the current virtual function device is larger than or equal to the occupied space capacity of the context configuration register of the target accelerating function unit and whether the supportable interrupt number of the current virtual function device is larger than or equal to the interrupt number of the target accelerating function unit, the method further comprises the steps of judging whether the current virtual function device is bound with the accelerating unit function or whether the register access space capacity of the current virtual function device is smaller than the occupied space capacity of the context configuration register of the target accelerating function unit or whether the supportable interrupt number of the current virtual function device is smaller than the interrupt number of the target accelerating function unit, determining new virtual function device from at least one accelerating function unit based on the accelerating unit state information table again, and executing the new virtual function device as the current virtual function device again to judge whether the current virtual function device is not bound with the accelerating unit function and whether the register access space capacity of the current virtual function device is larger than or equal to the occupied space capacity of the context configuration register of the target accelerating function unit or not and whether the supportable interrupt number of the current virtual function device is larger than or equal to the target accelerating function unit is obtained.
Specifically, in the case that the current virtual function device has bound the acceleration unit function, or the register access space capacity of the current virtual function device is smaller than the occupied space capacity of the context configuration register of the target acceleration function unit, or the number of supportable interrupts of the current virtual function device is smaller than the number of interrupts of the target acceleration function unit, the system selects the next VF that has not been checked from the table as a new virtual function device based on the virtual function device state information table, and assigns it as the new current virtual function device. Then, the system re-executes the judgment flow, the process continues iteration, sequentially traverses all VF entries in the table until the VF of the judgment condition is judged, at the moment, the VF is determined to be the target virtual function device, and the loop is exited. If the VF meeting the conditions is not found yet after traversing the complete table, judging that no bindable resource exists, and returning to the allocation failure.
Therefore, the embodiment of the invention checks each VF by introducing a cyclic traversal mechanism until an available target is found, thereby avoiding omission caused by checking sequence, preventing global service interruption caused by mismatching of local resources and providing reliable resource allocation guarantee.
According to one embodiment of the invention, binding a target acceleration functional unit and a target virtual functional device includes storing a mapping relationship between the target acceleration functional unit and the target virtual functional device to a virtual functional device and acceleration functional unit mapping table.
Specifically, after the target AFU and the target VF are determined, a binding operation is executed, and the association information of the binding is written into the mapping table of the virtual function device and the acceleration function unit. Specifically, the system generates a new mapping entry containing the identification number of the target VF (vf_index), the identification number of the target AFU (AFU _index), and the context number (context_number) allocated inside the target AFU for the VF. The entry is stored in a corresponding location of a mapping table, and updating of the table enables a subsequent hardware access path (such as register access, interrupt transmission, memory access) to be routed precisely according to the mapping relationship, thereby realizing exclusive and isolated access of the VF to the AFU.
Thus, in the static binding architecture of the related art, the connection of the VF and the AFU is hardwired or firmware preset, lacks flexibility and is difficult to support resource sharing. According to the embodiment of the invention, the binding relation is stored into the mapping table, so that the binding relation is converted from the static configuration into the dynamic modification state, the resource utilization rate is improved, and the safety, stability and maintainability of the system are enhanced.
According to the control method of the accelerator equipment provided by the embodiment of the application, whether the user demand is received is judged, if the user demand is received, the target acceleration functional unit is determined from at least one acceleration functional unit based on the user demand by traversing the acceleration unit state information table, the target virtual functional equipment matched with the target acceleration functional unit is determined by traversing the virtual functional equipment state information table, and the target acceleration functional unit and the target virtual functional equipment are bound. Therefore, the problem that when the acceleration function unit is not used for accelerating the task, the corresponding acceleration function unit is completely idle, so that the utilization rate of the acceleration function unit is low is solved.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The control method of the accelerator apparatus provided by the present invention is described above in detail. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (15)

1. An accelerator device characterized in that the accelerator device is divided into a static area comprising virtual function device state information and acceleration unit state information and a dynamic area comprising at least one acceleration function unit, wherein,
The acceleration unit state information table is configured to store state information of at least one acceleration function unit within the dynamic area, the virtual function device state information is configured to store state information of at least one virtual function device, wherein,
The accelerator is used for traversing the accelerating unit state information table according to the received user requirements to determine a target accelerating function unit from the at least one accelerating function unit, traversing the virtual function device state information table to determine target virtual function devices matched with the target accelerating function unit, and binding the target accelerating function unit and the target virtual function devices.
2. The accelerator apparatus according to claim 1, characterized in that the acceleration functional unit comprises:
the configuration module comprises at least one group of context configuration registers, wherein at least one group of context configuration registers comprises a starting command register, and the starting command register is configured to start a corresponding acceleration task according to a received setting signal;
The scheduling module comprises a state register and a scheduling unit, wherein the state register is configured to represent whether the context configuration register is configured or not, and the scheduling unit is configured to execute corresponding scheduling actions based on a preset scheduling strategy;
the execution module comprises a context register and an execution unit, wherein the context register is respectively configured to record a context sequence number which is currently scheduled to be executed, and the execution unit is configured to read context configuration register information corresponding to the context sequence number from the configuration module and execute functional tasks corresponding to the acceleration functional unit.
3. The accelerator apparatus according to claim 2, further comprising:
The register access interface is configured to receive a first access request sent by the virtual function device and map the first access request to a configuration module of a corresponding acceleration function unit;
An interrupt interface, configured to receive an interrupt request sent by the acceleration functional unit, map the interrupt request to a corresponding virtual functional device, and send the interrupt request;
A memory access interface configured to receive a second access request of the acceleration functional unit and map to a target memory storage space.
4. The accelerator device of claim 2, wherein the maximum number of groups of context configuration registers is determined based on a maximum number of virtual function devices that a corresponding accelerating function can support binding.
5. The accelerator apparatus of claim 1, wherein the static region further comprises:
A virtual function device and acceleration function unit mapping table configured to store a mapping relationship between the target acceleration function unit and the target virtual function device.
6. The accelerator device of claim 5, wherein the static region further comprises a register access routing module and a register access arbitration module, wherein,
The register access routing module is configured to query a mapping table of the virtual function device and the acceleration function unit according to a first access request sent by the virtual function device, determine a corresponding acceleration function unit number and a context register group serial number, generate a new access request according to the corresponding acceleration function unit number and the context register group serial number and a target context register address in the first access request, and transmit the new access request to the register access arbitration module;
The register access arbitration module is configured to determine a corresponding acceleration functional unit according to a corresponding acceleration functional unit number in the new access request, and send the context register group serial number and the target context register address in the new access request to the configuration module of the corresponding acceleration functional unit through the register access interface of the corresponding acceleration functional unit, so as to perform register access.
7. The accelerator device of claim 6, wherein the static region further comprises an interrupt interface arbitration module and an interrupt routing module, wherein,
The interrupt interface arbitration module is configured to add a corresponding acceleration function unit number to the received interrupt request, and transmit the interrupt request added with the acceleration function unit number to the interrupt routing module;
the interrupt routing module is configured to query the mapping table of the virtual function device and the acceleration function unit according to the interrupt request added with the acceleration function unit number, determine the corresponding virtual function device, and send the interrupt request to the corresponding virtual function device.
8. The accelerator device of claim 7, wherein the static region further comprises a memory access arbitration module and a memory access module, wherein,
The memory access arbitration module is configured to add a corresponding acceleration function unit number to the received second access request, and transmit the second access request added with the acceleration function unit number to the memory access module;
The memory access module is configured to query the virtual function device and the acceleration function unit mapping table according to the second access request added with the acceleration function unit number, determine a memory space starting address and a memory space address length of a target memory storage space, and access the target memory storage space according to the memory space starting address and the memory space address length.
9. The accelerator device of claim 1, wherein the state information of the virtual function device comprises at least one of a virtual function device identification number, a number of supportable interrupts, an access space capacity of registers contained by the virtual function device, a priority weight, and a binding state with the acceleration function unit.
10. The accelerator device of claim 9, wherein the status information of the acceleration functional unit comprises at least one of an acceleration unit index number, an acceleration unit function identification number, a space usage capacity of at least one set of context configuration registers, a maximum supportable number of contexts, an allocable number of contexts, an allocated number of context numbers, and an interrupt number.
11. A control method of an accelerator apparatus, characterized in that the method is applied to an accelerator apparatus according to any one of claims 1 to 10, wherein the method comprises the steps of:
judging whether a user demand is received or not;
If the user demand is received, traversing the acceleration unit state information table to determine a target acceleration functional unit from the at least one acceleration functional unit based on the user demand;
And traversing the virtual function device state information table to determine target virtual function devices matched with the target acceleration function units, and binding the target acceleration function units and the target virtual function devices.
12. The method of claim 11, wherein traversing the acceleration unit state information table to determine a target acceleration function from the at least one acceleration function based on the user demand comprises:
Determining a current acceleration function unit from the at least one acceleration function unit based on the acceleration unit state information table;
judging whether the current acceleration functional unit meets the user requirement or not, and judging whether the current acceleration functional unit has an allocatable context or not;
And if the current acceleration functional unit meets the user requirement and the current acceleration functional unit has the allocatable context, taking the current acceleration functional unit as the target acceleration functional unit.
13. The method of claim 12, further comprising, after determining whether the current acceleration functional unit meets the user demand and whether the current acceleration functional unit has an allocatable context:
If the current acceleration functional unit does not meet the user requirement or the current acceleration functional unit does not have an allocatable context, determining a new acceleration functional unit from the at least one acceleration functional unit based on the acceleration unit state information table again;
And taking the new acceleration functional unit as the current acceleration functional unit, and re-executing the step of judging whether the current acceleration functional unit meets the user requirement and whether the current acceleration functional unit has an allocable context or not until the target acceleration functional unit is obtained.
14. The method of claim 11, wherein traversing the virtual function device state information table determines a target virtual function device that matches the target acceleration function unit comprises:
Determining a current virtual function device from at least one virtual function device based on the virtual function device state information table;
Judging whether the current virtual function device is not bound with the function of the acceleration unit, whether the register access space capacity of the current virtual function device is larger than or equal to the occupied space capacity of the context configuration register of the target acceleration function unit, and whether the supportable interrupt number of the current virtual function device is larger than or equal to the interrupt number of the target acceleration function unit;
And if the current virtual function device is not bound with the function of the accelerating unit, the register access space capacity of the current virtual function device is larger than or equal to the occupied space capacity of the context configuration register of the target accelerating function unit, and the supportable interrupt number of the current virtual function device is larger than or equal to the interrupt number of the target accelerating function unit, the current virtual function device is used as the target virtual function device.
15. The method of claim 14, wherein after determining whether the current virtual function device is unbound to an acceleration unit function and whether a register access space capacity of the current virtual function device is greater than or equal to a footprint capacity of a context configuration register of the target acceleration function unit and whether a number of supportable interrupts of the current virtual function device is greater than or equal to a number of interrupts of the target acceleration function unit, further comprising:
If the current virtual function device has bound an acceleration unit function, or the register access space capacity of the current virtual function device is smaller than the occupied space capacity of the context configuration register of the target acceleration function unit, or the supportable interrupt number of the current virtual function device is smaller than the interrupt number of the target acceleration function unit, determining a new virtual function device from the at least one acceleration function unit based on the acceleration unit state information table again;
And taking the new virtual function device as the current virtual function device, and re-executing the step of judging whether the current virtual function device is not bound with the function of the accelerating unit, whether the register access space capacity of the current virtual function device is larger than or equal to the occupied space capacity of the context configuration register of the target accelerating function unit, and whether the supportable interrupt number of the current virtual function device is larger than or equal to the interrupt number of the target accelerating function unit until the target virtual function device is obtained.
CN202511556607.2A 2025-10-29 2025-10-29 Accelerator equipment and control methods for accelerator equipment Pending CN121029323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511556607.2A CN121029323A (en) 2025-10-29 2025-10-29 Accelerator equipment and control methods for accelerator equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511556607.2A CN121029323A (en) 2025-10-29 2025-10-29 Accelerator equipment and control methods for accelerator equipment

Publications (1)

Publication Number Publication Date
CN121029323A true CN121029323A (en) 2025-11-28

Family

ID=97758251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511556607.2A Pending CN121029323A (en) 2025-10-29 2025-10-29 Accelerator equipment and control methods for accelerator equipment

Country Status (1)

Country Link
CN (1) CN121029323A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150378892A1 (en) * 2013-09-26 2015-12-31 Hitachi, Ltd. Computer system and memory allocation adjustment method for computer system
CN116737618A (en) * 2023-08-14 2023-09-12 浪潮电子信息产业股份有限公司 FPGA architecture, devices, data processing methods, systems and storage media
CN118897818A (en) * 2024-09-27 2024-11-05 苏州元脑智能科技有限公司 Accelerator device and implementation method thereof, electronic device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150378892A1 (en) * 2013-09-26 2015-12-31 Hitachi, Ltd. Computer system and memory allocation adjustment method for computer system
CN116737618A (en) * 2023-08-14 2023-09-12 浪潮电子信息产业股份有限公司 FPGA architecture, devices, data processing methods, systems and storage media
CN118897818A (en) * 2024-09-27 2024-11-05 苏州元脑智能科技有限公司 Accelerator device and implementation method thereof, electronic device and storage medium

Similar Documents

Publication Publication Date Title
US5487170A (en) Data processing system having dynamic priority task scheduling capabilities
US11216304B2 (en) Processing system for scheduling and distributing tasks and its acceleration method
US7827331B2 (en) IO adapter and data transferring method using the same
CN114138422B (en) Scalable NVMe storage virtualization method and system
EP0380857B1 (en) Arbitrating communication requests in a multi-processor computer system
CN112052068A (en) Method and device for binding CPU (central processing unit) of Kubernetes container platform
US9875139B2 (en) Graphics processing unit controller, host system, and methods
JPH0227697B2 (en)
JPS61109164A (en) Bus control
JP2009508247A (en) Method and system for bus arbitration
CN115576716A (en) A Method of Memory Management Based on Multi-Process
JP2002041449A (en) Bus system and method for adjusting execution order thereof
US20200334178A1 (en) Processing system with round-robin mechanism and its memory access method
US11301297B2 (en) Processing system for dispatching tasks and memory access method thereof
US20200334079A1 (en) Processing system for managing process and its acceleration method
CN121029323A (en) Accelerator equipment and control methods for accelerator equipment
JP3255759B2 (en) Multiprocessor system
CN116107697B (en) Method and system for communication between different operating systems
US8135878B1 (en) Method and apparatus for improving throughput on a common bus
CN115981844A (en) Task management method and device
CN120448074B (en) Data processing methods, devices, storage media, and products based on execution engines
CN112231250B (en) Performance isolation of storage devices
CN120386750B (en) Method, apparatus, medium and article for data processing
JPH08278953A (en) Exclusive control method in computer system
JP2780662B2 (en) Multiprocessor system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination