[go: up one dir, main page]

CN118963718B - Interface calling device, method, equipment and medium of neural network processing unit - Google Patents

Interface calling device, method, equipment and medium of neural network processing unit Download PDF

Info

Publication number
CN118963718B
CN118963718B CN202411422690.XA CN202411422690A CN118963718B CN 118963718 B CN118963718 B CN 118963718B CN 202411422690 A CN202411422690 A CN 202411422690A CN 118963718 B CN118963718 B CN 118963718B
Authority
CN
China
Prior art keywords
processing unit
neural network
network processing
application program
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411422690.XA
Other languages
Chinese (zh)
Other versions
CN118963718A (en
Inventor
龚国辉
寻迎亚
隋强
刘往
夏一民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Greatwall Galaxy Technology Co ltd
Original Assignee
Hunan Greatwall Galaxy Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Greatwall Galaxy Technology Co ltd filed Critical Hunan Greatwall Galaxy Technology Co ltd
Priority to CN202411422690.XA priority Critical patent/CN118963718B/en
Publication of CN118963718A publication Critical patent/CN118963718A/en
Application granted granted Critical
Publication of CN118963718B publication Critical patent/CN118963718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • G06F8/24Object-oriented
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4411Configuring for operating with peripheral devices; Loading of device drivers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Neurology (AREA)
  • Stored Programmes (AREA)

Abstract

本发明涉及神经网络处理单元的接口调用装置、方法、设备和介质,通过标准化函数调用接口和封装驱动细节,用户只需使用简单的API即可操作NPU,无需了解其底层复杂的驱动和硬件细节,大大降低了开发难度。封装后的API使用标准编程语言,使得用户无需理解自定义结构体和数据类型,接口设计直观,易于使用,方便用户快速部署和评估性能。提供了从设备初始化、网络加载、前处理数据、推理执行到资源释放的完整应用处理流程,确保了系统在各个阶段都能高效和有序地运行。封装的API包括初始化和销毁网络上下文的功能,确保资源在分配和释放时得到有效管理,避免资源泄漏和内存不足的问题,高效调用NPU并提高了系统的可靠性。

The present invention relates to an interface calling device, method, equipment and medium of a neural network processing unit. Through standardized function calling interfaces and encapsulated driver details, users only need to use simple APIs to operate NPUs without having to understand the complex underlying driver and hardware details, which greatly reduces the difficulty of development. The encapsulated API uses a standard programming language, so that users do not need to understand custom structures and data types. The interface design is intuitive and easy to use, which facilitates users to quickly deploy and evaluate performance. A complete application processing flow from device initialization, network loading, pre-processing data, inference execution to resource release is provided, ensuring that the system can run efficiently and orderly at all stages. The encapsulated API includes the functions of initializing and destroying network contexts, ensuring that resources are effectively managed during allocation and release, avoiding resource leakage and insufficient memory problems, efficiently calling the NPU and improving the reliability of the system.

Description

Interface calling device, method, equipment and medium of neural network processing unit
Technical Field
The invention belongs to the technical field of artificial intelligence and embedded systems, and relates to an interface calling device, an interface calling method, interface calling equipment and interface calling media of a neural network processing unit.
Background
In the field of modern artificial intelligence and embedded systems, a neural Network Processing Unit (NPU) is used as a dedicated hardware accelerator, so that the efficiency and performance of neural network reasoning can be remarkably improved. NPUs typically work in conjunction with other processors (e.g., digital signal processors, DSPs) to perform complex computational tasks through high-speed data transmission and processing.
In existing NPU call technologies for practical applications, for example, some embedded systems use specific APIs (Application Programming Interface, application programming interfaces) to communicate with the NPU, which often require developers to know specific custom constructs, data types, parameter settings, and partial driver function call sequences, which are complicated to use. Therefore, in order to solve the technical problems of complex NPU call and lack of complete processing flow, an efficient and compact solution needs to be provided.
Disclosure of Invention
Aiming at the problems in the traditional method, the invention provides an interface calling device of a neural network processing unit, an interface calling method of the neural network processing unit, computer equipment and a computer readable storage medium, which can efficiently call NPU and provide a complete processing flow.
In order to achieve the above object, the embodiment of the present invention adopts the following technical scheme:
in one aspect, an interface calling device of a neural network processing unit is provided, including a network file context list, an opening device application program interface, a closing device application program interface, an initializing network application program interface, a destroying network application program interface and a network reasoning application program interface;
the network file context list is used for packaging configuration information and parameters required by invoking the neural network processing unit, wherein the configuration information and the parameters comprise the configuration parameters of the neural network processing unit, memory allocation information, model parameters and an input/output data buffer address;
Opening an application program interface of the device to encapsulate interaction logic with a bottom layer driver of the neural network processing unit device, and the interaction logic is used for triggering the bottom layer driver of the neural network processing unit device to perform initialization driving so as to ensure that the neural network processing unit is in an available state;
the initializing network application program interface is used for creating a network object according to the network reasoning data structure input by the user, distributing computing resources and initializing a network context;
The network reasoning application program interface is used for copying all input data required by the network reasoning data structure to the memory space planned by the neural network processing unit equipment, respectively connecting all input buffer areas and output buffer areas of the network to the effective buffer areas of the neural network processing unit, after refreshing the input data buffer areas, informing the neural network processing unit to operate the network and waiting for the reasoning operation to be completed, refreshing the output data buffer areas and copying the reasoning result data to the memory space planned by the user;
The destroying network application program interface is used for destroying the network object according to the network file context list and releasing the distributed computing resources;
And closing the device application program interface to release all resources occupied by the neural network processing unit and control the neural network processing unit to be powered off.
On the other hand, the invention also provides an interface calling method of the neural network processing unit, which is applied to the interface calling device of the neural network processing unit, and the interface calling method of the neural network processing unit comprises the following steps:
calling and opening an application program interface of the equipment to initialize the drive, configuring a software environment of the neural network processing unit and powering on the equipment of the neural network processing unit;
invoking an initializing network application program interface to create a network object according to a network reasoning data structure input by a user, distributing computing resources and initializing a network context;
The circulation part comprises the steps of obtaining input data, preprocessing data, calling a network reasoning application program interface, post-processing data and judging whether to exit the circulation or not;
when the loop exit part is judged, a destroying network application program interface is called to destroy the network object and release the allocated computing resource;
and calling an application program interface of the closing device to release all resources occupied by the neural network processing unit and control the neural network processing unit to be powered off.
In yet another aspect, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the interface calling method of the neural network processing unit when the computer program is executed.
In yet another aspect, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the interface invoking method of the neural network processing unit described above.
One of the above technical solutions has the following advantages and beneficial effects:
According to the interface calling device, the method, the equipment and the medium of the neural network processing unit, through standardized function calling interfaces and encapsulation driving details, a user can operate the NPU only by using a simple API (application program interface), and the user does not need to know the complicated driving and hardware details of the bottom layer, so that the development difficulty is greatly reduced. The packaged API uses a standard programming language, so that a user does not need to understand a custom structure body and data types, the interface design is visual, the use is easy, and the user can conveniently and rapidly deploy and evaluate the performance. The complete application processing flow from equipment initialization, network loading, preprocessing data and reasoning execution to resource release is provided, and the system can be ensured to run efficiently and orderly in all stages. The encapsulated API comprises the functions of initializing and destroying the network context, so that the resources are effectively managed during allocation and release, the problems of resource leakage and insufficient memory are avoided, and the reliability of the system is improved.
Compared with the prior art, the user can conveniently and efficiently call the NPU, and when the DSP and the NPU work cooperatively, the efficient data processing and resource management are realized. In addition, the NPU application development process is simplified, and the usability and the reusability of the interface are improved, so that the user development experience is improved, and the development complexity is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments or the conventional techniques of the present invention, the drawings required for the descriptions of the embodiments or the conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a schematic block diagram of an interface calling device of a neural network processing unit according to an embodiment;
FIG. 2 is a flow diagram that may be implemented by the open device API in one embodiment;
FIG. 3 is a flow diagram that may be implemented by the close device API in one embodiment;
FIG. 4 is a flow diagram that is implemented by the network reasoning API in one embodiment;
FIG. 5 is a schematic diagram of a neural network reasoning process in one embodiment;
fig. 6 is a flow chart of an interface calling method of the neural network processing unit in one embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
It is noted that reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Embodiments of the present invention will be described in detail below with reference to the attached drawings in the drawings of the embodiments of the present invention.
In one embodiment, as shown in fig. 1, an interface calling apparatus 100 of a neural network processing unit is provided, which may include a network file context list 11, an open device application program interface 13, a close device application program interface 15, an initialize network application program interface 17, a destroy network application program interface 19, and a network inference application program interface 21. The network file context list 11 is used for packaging configuration information and parameters required for invoking the neural network processing unit, wherein the configuration information and parameters comprise the neural network processing unit configuration parameters, memory allocation information, model parameters and input/output data buffer addresses. The open device application program interface 13 encapsulates interaction logic with the bottom driver of the neural network processing unit device, and is used for triggering the bottom driver of the neural network processing unit device to perform initialization driving, so as to ensure that the neural network processing unit is in a usable state.
The initializing web application interface 17 is used to create a web object from the web inference data structure entered by the user, allocate computing resources and initialize the web context. The network reasoning application program interface 21 is used for copying all input data required by the network reasoning data structure to the memory space planned by the neural network processing unit device, connecting all input buffer areas and output buffer areas of the network to the effective buffer areas of the neural network processing unit respectively, after refreshing the input data buffer areas, notifying the neural network processing unit to operate the network and waiting for the reasoning operation to be completed, refreshing the output data buffer areas and copying the reasoning result data to the memory space planned by the user, and the network reasoning application program interface 21 supports multiple calls. The destruction network application interface 19 is configured to destroy the network object and release the allocated computing resource according to the network file context list 11. The shutdown device application program interface 15 is used to release all resources occupied by the neural network processing unit and to control the neural network processing unit to power down.
It will be understood that, as shown in fig. 1, the interface calling apparatus 100 of the neural network processing unit includes a network file context list 11 and five Application Program Interfaces (APIs), which are respectively an open device application program interface 13 (hereinafter referred to as an open device API), a close device application program interface 15 (hereinafter referred to as a close device API), an initialize network application program interface 17 (hereinafter referred to as an initialize network API), a destroy network application program interface 19 (hereinafter referred to as a destroy network API), and a network reasoning application program interface 21 (hereinafter referred to as a network reasoning API).
The interface calling device 100 of the neural network processing unit can operate the NPU by standardizing the function calling interface and packaging the driving details, so that a user can operate the NPU by using a simple API (application program interface) without knowing the complicated driving and hardware details of the bottom layer, and the development difficulty is greatly reduced. The packaged API uses a standard programming language, so that a user does not need to understand a custom structure body and data types, the interface design is visual, the use is easy, and the user can conveniently and rapidly deploy and evaluate the performance. The complete application processing flow from equipment initialization, network loading, preprocessing data and reasoning execution to resource release is provided, and the system can be ensured to run efficiently and orderly in all stages. The encapsulated API comprises the functions of initializing and destroying the network context, so that the resources are effectively managed during allocation and release, the problems of resource leakage and insufficient memory are avoided, and the reliability of the system is improved.
Compared with the prior art, the user can conveniently and efficiently call the NPU, and when the DSP and the NPU work cooperatively, the efficient data processing and resource management are realized. In addition, the NPU application development process is simplified, and the usability and the reusability of the interface are improved, so that the user development experience is improved, and the development complexity is reduced.
The network file context list 11 contains all configuration information and parameters required for invoking the NPU, and these configuration information and parameters may include NPU configuration parameters, memory allocation information, model parameters, and input/output data buffer addresses, for example.
By focusing these configuration information and parameters into one list, parameter management when the NPU is called is simplified, so that the user only needs to process one list when calling the NPU related API, and does not need to provide each parameter one by one. Moreover, because the network file context list 11 encapsulates all necessary configuration information and parameters, different APIs can multiplex the same network file context list 11, thereby avoiding the occurrence of repeated codes, improving the reusability and maintainability of the codes and reducing the redundancy of the codes. Moreover, by defining and using the unified network file context list 11, the consistency of all NPU related API calls is ensured, all APIs follow the same parameter transmission mode, and system crashes or errors caused by parameter transmission errors are reduced. This consistent design improves the reliability and stability of the system. In addition, the network file context list 11 not only contains parameters required by initialization and running, but also includes information of resource allocation and release, so that a user can conveniently initialize and destroy the network context, and efficient management and utilization of resources are ensured.
And opening the device API, wherein the API encapsulates interaction logic with the bottom layer driver of the NPU device, and a user can trigger the bottom layer driver to perform necessary initialization operation by calling the device API. Through the encapsulation, a user does not need to know the implementation details of the bottom layer driver of the NPU equipment, and can finish the opening operation of the equipment by only calling a standardized API.
In one embodiment, when the open device API queries the device state, if the open device API has been called once, the open device API displays that the neural network processing unit device has been opened, and enters the operation of the reset device, and the reset device is used for resetting the state register inside the neural network processing unit.
It will be understood that, as shown in fig. 2, the operation of calling the open device API to mainly implement includes device state query and reset device, where when the device state query is performed, if the open device API has been called once, it indicates that the NPU device has been opened, and directly enters the reset device of the next step, where the reset device is used to reset the state register in the NPU, so that the NPU quickly enters the latest ready state. If the display device is not turned on, a setup operation is performed, including powering up the NPU, setting the NPU clock frequency, and initializing the NPU software environment.
Specifically, in general, when the NPU device is in the energy-saving mode, the NPU is in a power-off state, and in order to make the NPU work normally, the NPU device needs to be powered on, for example, a power control register of the NPU is read and written, so that the NPU device is powered on, and is made to enter a working state. And setting the clock frequency of the NPU, namely reading and writing a clock control register of the NPU, and configuring the clock frequency of the NPU.
In one embodiment, when the device API configures the clock frequency of the neural network processing unit by reading and writing the clock control register of the neural network processing unit, if the user-defined clock frequency exists, the clock frequency value of the neural network processing unit is configured by using the user-defined clock frequency, otherwise, the clock frequency value of the neural network processing unit is configured by using the default clock frequency.
It will be appreciated that if the user has set a custom clock frequency, a user-defined clock frequency value is employed, and if the user has not set a clock frequency, a default clock frequency value is used. And initializing the NPU software environment, opening the equipment API comprises the step of setting the NPU software environment, such as the steps of distributing and initializing a plurality of sections of memory areas required by the NPU before the NPU begins to work, wherein the configuration operations are realized by the inside of the equipment API, so that the NPU is ensured to operate in a correct environment, and the stability and the reliability of the system are improved.
In one embodiment, when the device state query is performed by the shutdown device API, if the neural network processing unit device has entered a low power consumption state, no operation needs to be performed.
Closing the device API-as shown in fig. 3, a device status query is first made, and if the device has entered a low power state (i.e., power saving mode), no action needs to be performed. And if the equipment does not enter the low-power consumption state, performing two operations of releasing NPU resources and controlling the NPU to be powered off. And before the NPU equipment is closed, the closing equipment API is responsible for releasing all resources occupied by the NPU, including memory, cache and other hardware resources, so that the resources are ensured to be properly managed and released when the NPU equipment is closed, and resource leakage and system instability are prevented. These release operations are implemented internally by the shutdown device API, ensuring the integrity and validity of resource management. And controlling the NPU to be powered off, namely closing the device API, and reading and writing a power control register of the NPU by calling the bottom layer drive of the NPU device to power off the NPU device so as to enable the NPU device to enter a low-power consumption or dormant state. Through the outage operation, the overall power consumption of the system is reduced, and the service life of equipment is prolonged.
Initializing a network API for creating a network object from a network reasoning data structure (NBG) entered by a user. The initialisation network API first receives the user entered NBG, creates a network object from the data and associates the network object to the network file context list 11. The network object contains the structure and parameter information of the network model and is the basis for carrying out network reasoning. The initialize network API then begins to allocate resources and initialize network stack space. According to the network file context list 11, the initialization network API needs to allocate necessary computing resources including memory, cache and other hardware resources after creating the network object. Specifically, initializing the network API initializes the network stack space, and deploys all the operating resources into the internal memory pool, thereby ensuring that the resources required in the network reasoning process are sufficient and orderly managed.
Then, the network API is initialized to apply for memory space for parameters in the network file context list 11 that require memory space. In the initialization process, the initialization network API can check a parameter list in the network, determine which parameters need to be allocated with memory space, and apply for the appropriate memory space for the parameters through an internal memory management mechanism, so as to ensure that the parameters can be correctly accessed and used in the network reasoning process. Then, initializing a network API to allocate internal memory resources for NBG, deploying the resources of all operations to an internal memory pool, and configuring/generating a command buffer area for NBG and a repair command buffer area of the resources in internal memory allocation.
In one embodiment, when the network API is initialized and the network reasoning data structure input by the user is verified, if the network data of the network reasoning data structure is detected to be inconsistent with the current network model or the network reasoning data structure does not accord with the established requirement of the system, error information is returned, and the network data of the network reasoning data structure comprises a data format and a version.
It can be understood that, finally, the initializing network API verifies the NBG (user input), and in order to ensure the correctness and consistency of the network reasoning, the initializing network API verifies the network reasoning data structure input by the user, which specifically may include checking whether the network reasoning data structure meets the set requirements of the system, whether the data format and version are consistent with the current network model, and if inconsistent or inconsistent with the set requirements are found, the initializing network API returns corresponding error information to prompt the user to correct.
Destroying the network API for releasing the network object. Destroying the network API will first destroy the network object created in the initialization process according to the network file context list 11, ensuring that all data structures related to the network model are released correctly. By destroying the network objects, memory and other resources associated with the network model may be efficiently reclaimed. After destroying the network object, the destroying network API needs to release the memory pool and the network stack space distributed in the initialization process, specifically, the destroying network API calls a memory management module according to the network file context list 11 to release all memories used in the network reasoning process, so as to ensure that the memory resources are correctly recovered. Finally, the destroying network API releases the parameter memory space, and in the initializing process, the memory space allocated for the network parameter needs to be released in the destroying process, so that the destroying network API traverses the network file context list 11 to release the memory space allocated for the parameters one by one, thereby ensuring no memory leakage.
And the network reasoning API is used for assisting the NPU to conduct network reasoning and manage the reasoning result. Specifically, as shown in fig. 4, the steps mainly implemented by the network reasoning API may include:
Copying all input data required by NBG to a memory space planned by NPU equipment;
Connecting all inputs and outputs of the network to the NPU effective buffer area respectively, when connecting, the driver will repair the network command buffer area to fill the buffer area address;
The NPU is operated to refresh the input data cache, so that the latest input data can be ensured to be used by a network;
Informing the NPU to execute a network reasoning task and starting to perform reasoning operation according to a preset network model and parameters;
The NPU is operated to refresh the output data cache, so that the output reasoning result data is ensured to be ready for a user to read;
Copying the reasoning result data from the memory of the NPU equipment to the memory area appointed by the user.
It will be appreciated that the network inference API may call multiple times, each time a call is made, the network inference API will use the current additional input buffers and output buffers for reasoning. The design allows the user to perform multiple reasoning operations in the same network context, has high flexibility and adapts to different reasoning requirements.
It will be appreciated that the interface calling device 100 of the neural network processing unit provides a complete NPU call flow for managing the initialization, execution and resource release of the NPU. As shown in fig. 5, the process includes the opening and closing of the device, the initialization and destruction of the network, and the preprocessing, reasoning and post-processing of the data, the network reasoning data structure (NBG) being represented by the NBG file.
The method comprises the steps of firstly calling an equipment API, initializing a drive, configuring an NPU software environment and powering on the NPU equipment, wherein the NPU equipment is ensured to be in a usable state, and the NPU equipment is ready for subsequent operation. Then, according to binary network reasoning data structure (NBG) input by user, calling initialization network API to create network object, distributing computing resource and initializing network context.
Next, a loop portion is entered, including obtaining input data, preprocessing the data, calling a network reasoning API, post-processing the data, and determining whether to exit. Wherein, the input data is obtained from an external source (such as a sensor or a file) and is needed to be inferred. The preprocessing data is to preprocess the input data, including normalization and quantization operations, so as to meet the input requirements of the network model. The network reasoning API is called to transfer the preprocessed input data to the NPU for reasoning, and the reasoning result is stored in the output buffer. The post-processing data is to post-process the reasoning result, including inverse quantization, classification and recognition, to generate finally output reasoning result data. Whether to exit is determined by determining whether to exit the loop portion based on a specific condition (e.g., all input data is processed or an exit signal is received). If it is determined to continue the loop, then the step of acquiring the input data is returned. If the loop is judged to be exited, then the destruction network API is called so as to destroy the network object through the destruction network API after the loop is exited, and the resources distributed in the initialization process are released, so that all the resources are ensured to be correctly released, and the resource leakage is prevented.
And finally, calling an API of the closing device to release the NPU resource, closing the drive and powering off the NPU, wherein the step ensures that the NPU device is in a closed state and the system resource is properly managed.
Compared with the traditional technology, the design standardizes the function call interface, such as providing a user-friendly function call interface, simplifies the call of the NPU by the user, and does not need to know the underlying drive and data structure. The function calling interface is decoupled from the NPU device bottom layer driver, a user does not need to know a self-defined structure body, data types, parameter setting and driving function calling sequence, and can directly call by adopting a standard C language interface, so that the user can use the function calling interface conveniently. And a complete set of network reasoning processing flows are provided, such as defining a network file context list 11, simplifying the call of NPU related APIs, ensuring the uniformity and convenience of configuration, memory and parameter management, and covering complete application flows of equipment opening, network initialization, preprocessing data, reasoning execution, post-processing data and resource release.
Therefore, through standardized function call interfaces and package driving details, a user can operate the NPU only by using a simple API (application program interface), and does not need to know the complicated driving and hardware details of the bottom layer, so that the development difficulty is greatly reduced. The packaged API uses a standard C language, so that a user does not need to understand a custom structure body and data types, the interface design is visual, the use is easy, and the user can conveniently and rapidly deploy and evaluate the performance. The complete application flow from equipment initialization, network loading, preprocessing data and reasoning execution to resource release is provided, and the system can be ensured to run efficiently and orderly in all stages. The encapsulated API comprises the functions of initializing and destroying the network context, so that the resources are effectively managed during allocation and release, the problems of resource leakage and insufficient memory are avoided, and the reliability of the system is improved.
The modules in the interface calling device 100 of the neural network processing unit may be all or partially implemented by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a device with a data processing function, or may be stored in a memory of the device in software, so that the processor may call and execute operations corresponding to the above modules, where the device may be, but is not limited to, various computer devices existing in the art.
In one embodiment, as shown in fig. 6, an interface calling method of a neural network processing unit is provided, which is applied to the interface calling device 100 of the neural network processing unit, and the interface calling method of the neural network processing unit may include the following processing steps S10 to S18:
s10, calling and opening an application program interface of the equipment to initialize the drive, configuring a software environment of the neural network processing unit and powering on the equipment of the neural network processing unit;
S12, calling an initializing network application program interface to create a network object according to a network reasoning data structure input by a user, distributing computing resources and initializing a network context;
s14, entering a circulation part to acquire reasoning result data, wherein the circulation part comprises the steps of acquiring input data, preprocessing data, calling a network reasoning application program interface, post-processing data and judging whether to exit circulation or not;
s16, when judging that the circulation part is exited, calling a destroying network application program interface to destroy the network object and releasing the allocated computing resource;
And S18, calling an application program interface of the closing device to release all resources occupied by the neural network processing unit and controlling the neural network processing unit to be powered off.
According to the interface calling method of the neural network processing unit, through the standardized function calling interface and the encapsulation driving details, a user can operate the NPU only by using a simple API (application program interface), and the complicated driving and hardware details of the bottom layer of the NPU are not required to be known, so that the development difficulty is greatly reduced. The packaged API uses a standard programming language, so that a user does not need to understand a custom structure body and data types, the interface design is visual, the use is easy, and the user can conveniently and rapidly deploy and evaluate the performance. The complete application processing flow from equipment initialization, network loading, preprocessing data and reasoning execution to resource release is provided, and the system can be ensured to run efficiently and orderly in all stages. The encapsulated API comprises the functions of initializing and destroying the network context, so that the resources are effectively managed during allocation and release, the problems of resource leakage and insufficient memory are avoided, and the reliability of the system is improved.
Compared with the prior art, the user can conveniently and efficiently call the NPU, and when the DSP and the NPU work cooperatively, the efficient data processing and resource management are realized. In addition, the NPU application development process is simplified, and the usability and the reusability of the interface are improved, so that the user development experience is improved, and the development complexity is reduced.
In one embodiment, the method for invoking an interface of the neural network processing unit may further include the steps of:
and when the continuous circulation part is judged, returning to the step of acquiring the input data.
For specific definitions and explanations of the interface invoking method of the neural network processing unit, reference may be made to the corresponding definitions and explanations of the interface invoking device 100 of the neural network processing unit, which are not repeated herein.
It should be understood that, although the steps in fig. 6 are shown in sequence as indicated by arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps of fig. 6 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, a computer device is provided, comprising a memory and a processor, wherein the memory stores a computer program, the processor when executing the computer program realizes the processing steps of calling an initialization driver of an application program interface of an opening device, configuring a software environment of a neural network processing unit and powering on the neural network processing unit, calling the initialization network application program interface to create a network object according to a network reasoning data structure input by a user, distributing computing resources and initializing a network context, entering a circulation part to acquire reasoning result data, acquiring input data, preprocessing data, calling the network reasoning application program interface, post-processing data and judging whether to exit the circulation, calling the destruction network application program interface to destroy the network object and release the distributed computing resources when judging to exit the circulation part, calling a closing device application program interface to release all resources occupied by the neural network processing unit and controlling the neural network processing unit to power off.
In one embodiment, the processor may further implement the steps or sub-steps added in the embodiments of the method for invoking an interface of a neural network processing unit when executing the computer program.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, performs the processing steps of invoking an open device application program interface initialization driver, configuring a software environment of a neural network processing unit and powering on the neural network processing unit device, invoking the initialization network application program interface to create a network object, allocate computing resources and initialize a network context according to a network reasoning data structure input by a user, entering a loop portion to obtain reasoning result data, the loop portion including obtaining input data, pre-processing data, invoking the network reasoning application program interface, post-processing data, and determining whether to exit the loop, invoking destroying the network application program interface to destroy the network object and releasing the allocated computing resources when determining to exit the loop portion, invoking a close device application program interface to release all resources occupied by the neural network processing unit and controlling the neural network processing unit to power down.
In one embodiment, the computer program may further implement the steps or sub-steps added in the embodiments of the method for invoking interfaces of a neural network processing unit, when the computer program is executed by a processor.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus dynamic random access memory (Rambus DRAM, RDRAM for short), and interface dynamic random access memory (DRDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it is possible for those skilled in the art to make several variations and modifications without departing from the spirit of the present invention, which fall within the protection scope of the present invention. The scope of the invention should therefore be pointed out in the appended claims.

Claims (9)

1. An interface calling device of a neural network processing unit is characterized by comprising a network file context list, an equipment application program interface opening, an equipment application program interface closing, a network application program interface initializing, a network application program interface destroying and a network reasoning application program interface;
The network file context list is used for packaging configuration information and parameters required by invoking the neural network processing unit, wherein the configuration information and parameters comprise the neural network processing unit configuration parameters, memory allocation information, model parameters and input/output data buffer addresses;
The application program interface of the opening device encapsulates interaction logic with a bottom layer driver of the neural network processing unit device, and is used for triggering the bottom layer driver of the neural network processing unit device to perform initialization driving so as to ensure that the neural network processing unit is in an available state;
The initializing network application program interface is used for creating a network object according to a network reasoning data structure input by a user, distributing computing resources and initializing a network context;
The network reasoning application program interface is used for copying all input data required by the network reasoning data structure to a memory space planned by the neural network processing unit equipment, respectively connecting all input buffer areas and output buffer areas of the network to an effective buffer area of the neural network processing unit, after refreshing the input data buffer areas, informing the neural network processing unit to operate the network and waiting for the reasoning operation to be completed, refreshing the output data buffer areas and copying the reasoning result data to the memory space planned by the user;
the destroying network application program interface is used for destroying network objects and releasing distributed computing resources according to the network file context list;
And the closing equipment application program interface is used for releasing all resources occupied by the neural network processing unit and controlling the neural network processing unit to be powered off.
2. The interface calling device of claim 1, wherein the open device application program interface displays that the neural network processing unit device has been opened and enters an operation of a reset device when the open device application program interface has been called once during the device status query, and the reset device is configured to reset a status register inside the neural network processing unit.
3. The interface calling device of claim 2, wherein when the open device application program interface configures the clock frequency of the neural network processing unit by reading and writing a clock control register of the neural network processing unit, if the user-defined clock frequency exists, the user-defined clock frequency is used to configure the clock frequency value of the neural network processing unit, otherwise, the default clock frequency is used to configure the clock frequency value of the neural network processing unit.
4. An interface calling device of a neural network processing unit according to any one of claims 1 to 3, wherein the shutdown device application program interface is configured to perform no operation if the neural network processing unit device has entered a low power consumption state when performing a device status query.
5. The interface calling device of claim 4, wherein the initializing web application program interface, when verifying the web reasoning data structure inputted by the user, returns an error message if it is checked that the web data of the web reasoning data structure is inconsistent with the current web model or the web reasoning data structure does not meet the predetermined requirement of the system, and the web data of the web reasoning data structure includes a data format and a version.
6. An interface calling method of a neural network processing unit, characterized by being applied to the interface calling device of the neural network processing unit according to any one of claims 1 to 5, the interface calling method of the neural network processing unit comprising the steps of:
Calling and opening an application program interface of the equipment to initialize the drive, configuring a software environment of the neural network processing unit, powering on equipment of the neural network processing unit, and ensuring that the neural network processing unit is in an available state;
invoking an initializing network application program interface to create a network object according to a network reasoning data structure input by a user, distributing computing resources and initializing a network context;
The method comprises the steps of entering a circulation part to obtain reasoning result data, wherein the circulation part comprises the steps of obtaining input data, preprocessing data, calling a network reasoning application program interface, post-processing data and judging whether to exit circulation or not;
When the loop part is judged to be exited, a destroying network application program interface is called to destroy the network object and release the allocated computing resource;
and calling an application program interface of the closing device to release all resources occupied by the neural network processing unit and control the neural network processing unit to be powered off.
7. The method of interface invocation of a neural network processing unit of claim 6, further comprising the step of:
and returning to the step of acquiring the input data when the circulation part is judged to be continued.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the interface invoking method of the neural network processing unit of claim 6 or 7 when the computer program is executed.
9. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the interface invoking method of a neural network processing unit as claimed in claim 6 or 7.
CN202411422690.XA 2024-10-12 2024-10-12 Interface calling device, method, equipment and medium of neural network processing unit Active CN118963718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411422690.XA CN118963718B (en) 2024-10-12 2024-10-12 Interface calling device, method, equipment and medium of neural network processing unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411422690.XA CN118963718B (en) 2024-10-12 2024-10-12 Interface calling device, method, equipment and medium of neural network processing unit

Publications (2)

Publication Number Publication Date
CN118963718A CN118963718A (en) 2024-11-15
CN118963718B true CN118963718B (en) 2024-12-31

Family

ID=93401372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411422690.XA Active CN118963718B (en) 2024-10-12 2024-10-12 Interface calling device, method, equipment and medium of neural network processing unit

Country Status (1)

Country Link
CN (1) CN118963718B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186678A (en) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 Hardware adaptation device and method based on deep learning
CN118394327A (en) * 2023-02-22 2024-07-26 比亚迪股份有限公司 Intelligent driving end side model deployment method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9720703B2 (en) * 2012-11-26 2017-08-01 International Business Machines Corporation Data driven hardware chips initialization via hardware procedure framework
WO2023044869A1 (en) * 2021-09-26 2023-03-30 华为技术有限公司 Control method for artificial intelligence chip, and related device
CN114398040A (en) * 2021-12-24 2022-04-26 上海商汤科技开发有限公司 Neural network reasoning method, device, computer equipment and storage medium
CN115712517B (en) * 2022-09-30 2024-04-16 北京地平线机器人技术研发有限公司 Fault processing method and device for neural network processor
CN115982110B (en) * 2023-03-21 2023-08-29 北京探境科技有限公司 File running method, file running device, computer equipment and readable storage medium
CN116841706A (en) * 2023-05-23 2023-10-03 中国电信股份有限公司广东研究院 Inference task scheduling method and device and computer equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186678A (en) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 Hardware adaptation device and method based on deep learning
CN118394327A (en) * 2023-02-22 2024-07-26 比亚迪股份有限公司 Intelligent driving end side model deployment method and device

Also Published As

Publication number Publication date
CN118963718A (en) 2024-11-15

Similar Documents

Publication Publication Date Title
US20230025917A1 (en) Task scheduling method and apparatus
US6463565B1 (en) Method for designing object-oriented table driven state machines
CN103207797B (en) Capsule type customized updating method based on universal extensible firmware interface firmware system
US20160202961A1 (en) Specialization of Generic Methods
CN115658277B (en) Task scheduling method and device, electronic equipment and storage medium
US12026549B2 (en) Control unit for a vehicle and an operating system scheduling method thereof
US10579300B2 (en) Information handling system firmware persistent memory runtime reclaim
CN101645020A (en) Virtual operating system creation method
US20230289187A1 (en) Method and apparatus for rectifying weak memory ordering problem
US7779043B2 (en) Extensible mechanism for object composition
CN112256249A (en) Method and equipment for expanding Android system function and computer storage medium
US20120089571A1 (en) Computer process management
CN117093352A (en) Template-based computing cluster job scheduling system, method and device
CN118963718B (en) Interface calling device, method, equipment and medium of neural network processing unit
CN115390920A (en) Event stream processing method and device, computer equipment and storage medium
CN117369975A (en) Loading method of general computing engine
EP1524597A1 (en) Method for managing threads in a memory-constrained system
US8321606B2 (en) Systems and methods for managing memory using multi-state buffer representations
CN114064225A (en) Self-adaptive scheduling method, device, computer storage medium and system
WO2024108907A1 (en) Data processing method and apparatus, ai chip, electronic device, and storage medium
CN111459570B (en) PDF plug-in loading method, electronic equipment and computer storage medium
CN114546642B (en) Task execution method, device, computer equipment, storage medium and program product
CN115061738B (en) Service module initialization method, equipment and medium for android componentization
TWI739284B (en) Console application control management method and system
CN119203540A (en) Simulation method, device and computer equipment based on universal code framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant