CN116720566A

CN116720566A - A model performance optimization method and related equipment

Info

Publication number: CN116720566A
Application number: CN202210189438.3A
Authority: CN
Inventors: 胡勇; 徐以旭; 文长春
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2023-09-08

Abstract

This application discloses a model performance optimization method, which is applied to a terminal. In the method, the terminal sends a first request to the server; within the first time, receives feedback information for the first request sent by the server, and the feedback information It can be used to determine the first operation mode of each operator among multiple operators of the neural network model. Through this method, the optimal operation method of each operator among the multiple operators of the terminal's neural network model can be obtained based on the terminal-cloud interaction, so that the terminal performs operations on the neural network model using the optimal operation method. During operation, better operating performance can be obtained without the need for developers to perform complex performance optimization operations for various different types and models of equipment that the neural network model may be deployed on, which improves the optimization efficiency of the neural network model in the terminal.

Description

A model performance optimization method and related equipment

技术领域Technical field

本申请涉及人工智能技术领域，具体涉及一种模型性能优化方法以及相关设备。This application relates to the field of artificial intelligence technology, specifically to a model performance optimization method and related equipment.

背景技术Background technique

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个分支，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人，自然语言处理，计算机视觉，决策与推理，人机交互，推荐与搜索，AI基础理论等。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence. Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.

随着AI的快速发展，神经网络模型(例如，深度神经网络模型)被部署于各种设备中，以在图像、视频以及语音等多种信息的处理与分析中发挥作用。然而，不同设备中的计算单元的组成不同，而不同的计算单元所擅长的计算场景不同，导致同一神经网络模型运行于不同的设备或者计算单元时的性能差异较大。此外，设备中的硬件和软件版本常常会不断地更新迭代，也会影响神经网络模型在设备中的运行性能。With the rapid development of AI, neural network models (for example, deep neural network models) are deployed in various devices to play a role in the processing and analysis of various information such as images, videos, and voices. However, the composition of computing units in different devices is different, and different computing units are good at different computing scenarios, resulting in large performance differences when the same neural network model is run on different devices or computing units. In addition, the hardware and software versions in the device are often constantly updated and iterated, which also affects the running performance of the neural network model in the device.

目前，开发者为了在多种设备上获得神经网络模型的较佳性能，需要将训练后得到的神经网络模型分别部署于多种设备并分别进行针对性地测试优化，优化效率较低，工作量较大。Currently, in order to obtain better performance of neural network models on multiple devices, developers need to deploy the trained neural network models on multiple devices and conduct targeted testing and optimization respectively. The optimization efficiency is low and the workload is high. larger.

发明内容Contents of the invention

本申请提供一种模型性能优化方法，以解决目前的开发者为了在多种设备上获得神经网络模型的较佳性能，需要将训练后得到的神经网络模型分别部署于多种设备并分别进行针对性地测试优化，优化效率较低的问题。本申请还提供了相应的装置、设备、计算机可读存储介质和计算机程序产品等。This application provides a model performance optimization method to solve the problem that in order to obtain better performance of neural network models on multiple devices, developers currently need to deploy the neural network models obtained after training on multiple devices and perform targeted operations respectively. Test optimizations comprehensively and optimize problems with low efficiency. This application also provides corresponding devices, equipment, computer-readable storage media, computer program products, etc.

本申请第一方面提供一种模型性能优化方法，应用于终端，该方法包括：向服务器发送第一请求，第一请求携带有神经网络模型的算子信息和神经网络模型的运行环境信息；在第一时间内，接收服务器发送的针对第一请求的反馈信息，反馈信息能够用于确定神经网络模型的多个算子中的每个算子的第一运算方式。The first aspect of this application provides a method for optimizing model performance, which is applied to a terminal. The method includes: sending a first request to a server, where the first request carries operator information of the neural network model and operating environment information of the neural network model; Within the first time, feedback information sent by the server in response to the first request is received, and the feedback information can be used to determine the first operation mode of each operator among multiple operators of the neural network model.

在第一方面中，第一运算方式用于指示算子在终端进行运算时的运行参数的设置方式。示例性地，该运行参数可以包括算子对应的计算单元的运行频率。In the first aspect, the first operation mode is used to indicate the setting mode of the operating parameters when the operator performs operations on the terminal. For example, the operating parameters may include the operating frequency of the computing unit corresponding to the operator.

反馈信息能够用于确定神经网络模型的多个算子中的每个算子的第一运算方式具体可以指：反馈信息能够用于确定神经网络模型中的每个算子对应的第一计算单元，以及神经网络模型中的每个算子在对应的第一计算单元的第一运算方式。此时，该反馈信息可以包括以下两种情况中的任意一种：(1)该反馈信息中可以包括神经网络模型对应的模型运算策略的信息，该模型运算策略包括神经网络模型的多个算子中的每个算子对应的第一运算方式。(2)该反馈信息包括神经网络模型对应的算子性能信息，并且，反馈信息中包括的算子性能信息满足指定的优化条件。在这一情况中，反馈信息中包括的算子性能信息满足指定的优化条件可以理解为反馈信息中包括足够的算子性能信息。在一种示例中，确定反馈信息中包括的算子性能信息满足指定的优化条件可以是反馈信息中包含的算子性能信息与终端的每个计算单元相关，从而能够支持终端从终端的计算单元中，确定对相应算子的运行性能较优的计算单元为相应算子的第一计算单元，以及确定对相应算子的运行性能较优的运算方式为相应算子的第一运算方式。The feedback information can be used to determine the first operation mode of each operator in the neural network model. Specifically, the feedback information can be used to determine the first calculation unit corresponding to each operator in the neural network model. , and the first operation method of each operator in the neural network model in the corresponding first calculation unit. At this time, the feedback information may include either of the following two situations: (1) The feedback information may include information about the model operation strategy corresponding to the neural network model. The model operation strategy includes multiple calculations of the neural network model. The first operation method corresponding to each operator in the operator. (2) The feedback information includes operator performance information corresponding to the neural network model, and the operator performance information included in the feedback information meets the specified optimization conditions. In this case, if the operator performance information included in the feedback information satisfies the specified optimization conditions, it can be understood that the feedback information includes sufficient operator performance information. In one example, determining that the operator performance information included in the feedback information satisfies the specified optimization condition may be that the operator performance information included in the feedback information is related to each computing unit of the terminal, thereby being able to support the terminal from the computing unit of the terminal. , it is determined that the computing unit with better operating performance for the corresponding operator is the first computing unit of the corresponding operator, and the operation method that is determined to have better operation performance with the corresponding operator is the first operation method of the corresponding operator.

由上述可知，该第一方面中，针对终端的神经网络模型，可以基于服务器的相关反馈信息，获得该神经网络模型的多个算子中的每个算子的较优的运算方式(既第一运算方式)，使得终端以相应的第一运算方式进行关于神经网络模型的运算操作时，能够获得较好的运行性能，而无需开发者针对神经网络模型可能要部署的各种不同类型和型号的设备，进行复杂的性能优化操作，提升了神经网络模型在终端的优化效率。As can be seen from the above, in the first aspect, for the neural network model of the terminal, the optimal operation method (i.e., the first operation method) of each of the multiple operators of the neural network model can be obtained based on the relevant feedback information of the server. (one operation mode), so that when the terminal performs operations on the neural network model in the corresponding first operation mode, it can obtain better operating performance without the need for developers to deploy various different types and models of the neural network model. equipment, perform complex performance optimization operations, and improve the optimization efficiency of the neural network model in the terminal.

在第一方面的一种可能的实现方式中，方法还包括：基于反馈信息，获得待处理数据的第一运算结果。In a possible implementation of the first aspect, the method further includes: obtaining a first operation result of the data to be processed based on the feedback information.

该种可能的实现方式中，待处理数据可以来自神经网络模型对应的应用程序。示例性地，该应用程序可以实现人脸识别功能，此时，应用程序中的待处理数据为待处理图像。或者，该应用程序可以实现文本翻译功能，此时，应用程序中的待处理数据为文字信息。In this possible implementation, the data to be processed can come from an application program corresponding to the neural network model. For example, the application can implement the face recognition function. At this time, the data to be processed in the application is the image to be processed. Alternatively, the application can implement a text translation function, in which case the data to be processed in the application is text information.

在第一方面的一种可能的实现方式中，多个算子中的每个算子的第一运算方式包括相应算子在对应的第一计算单元的运算方式，第一计算单元为终端的计算单元中的一个或多个。In a possible implementation of the first aspect, the first operation mode of each operator among the plurality of operators includes the operation mode of the corresponding operator in the corresponding first calculation unit, and the first calculation unit is the terminal. One or more of the computational units.

该种可能的实现方式中，计算单元的类型可以包括中央处理器(centralprocessing unit，CPU)，也可以包括图形处理器(graphics processing unit，GPU)、神经网络处理器(neural-network processing unit，NPU)、张量处理器(tensor processingunit，TPU)中的一种或多种。第一计算单元为终端的计算单元中的一个或多个，当第一计算单元为终端的计算单元中的多个时，各个第一计算单元的类型可以相同，也可以不同，并且各个第一计算单元的硬件配置可以相同，也可以不同。In this possible implementation, the type of computing unit may include a central processing unit (CPU), a graphics processing unit (GPU), or a neural network processing unit (NPU). ), one or more of tensor processing unit (TPU). The first computing unit is one or more of the computing units of the terminal. When the first computing unit is a plurality of the computing units of the terminal, the type of each first computing unit may be the same or different, and each first computing unit may be of the same or different type. The hardware configuration of the computing units can be the same or different.

在第一方面的一种可能的实现方式中，反馈信息还携带有第一参考信息；该方法还包括：在获得第一运算结果之后，当第一性能信息和第一参考信息之间的差异符合预设条件的情况下，向服务器发送第一性能信息，其中，第一性能信息用于指示至少一个算子关于相应第一运算方式的运行性能。In a possible implementation of the first aspect, the feedback information also carries first reference information; the method further includes: after obtaining the first operation result, when the difference between the first performance information and the first reference information If the preset conditions are met, first performance information is sent to the server, where the first performance information is used to indicate the operating performance of at least one operator with respect to the corresponding first operation mode.

该种可能的实现方式中，第一性能信息和第一参考信息之间的差异符合预设条件可以指示第一性能信息和第一参考信息之间的差异较大。终端在实际运行时对应的算子性能信息与关系列表中对应的算子性能信息差异较大时，服务器可以基于终端实际运行时的算子性能信息对服务器中存储的相应算子性能信息进行更新。In this possible implementation manner, if the difference between the first performance information and the first reference information meets the preset condition, it may indicate that the difference between the first performance information and the first reference information is relatively large. When the corresponding operator performance information when the terminal is actually running is significantly different from the corresponding operator performance information in the relationship list, the server can update the corresponding operator performance information stored in the server based on the operator performance information when the terminal is actually running. .

在第一方面的一种可能的实现方式中，方法还包括：在第一时间内，未接收到反馈信息，或者，反馈信息不能用于确定神经网络模型的多个算子中的每个算子的第一运算方式，则基于预设的神经网络模型的多个算子中的每个算子的第二运算方式，获得待处理数据的第二运算结果。In a possible implementation of the first aspect, the method further includes: no feedback information is received within the first time, or the feedback information cannot be used to determine each of the multiple operators of the neural network model. The second operation mode of each operator in the preset neural network model is used to obtain the second operation result of the data to be processed.

该种可能的实现方式中，服务器发送的反馈信息不能用于确定神经网络模型的多个算子中的每个算子的第一运算方式可以是该反馈信息不能支持终端确定出神经网络模型中的全部算子对应的第一运算方式。In this possible implementation, the feedback information sent by the server cannot be used to determine the first operation method of each operator in the neural network model. The feedback information cannot support the terminal to determine the operator in the neural network model. The first operation mode corresponding to all operators of .

针对神经网络模型中的某一算子，该反馈信息不能用于确定该算子对应的第一运算方式可以是该反馈信息中不包含该算子的信息，或者，可以是该反馈信息中包含的该算子的算子性能信息不完整，例如，缺少该算子关于终端中的某些计算单元和/或某些运算方式的性能信息等。For a certain operator in the neural network model, the feedback information cannot be used to determine the first operation method corresponding to the operator. The feedback information may not contain information about the operator, or the feedback information may contain The operator performance information of the operator is incomplete, for example, the operator's performance information about certain computing units and/or certain computing methods in the terminal is missing, etc.

该种可能的实现方式中，终端可以在未能从服务器获得针对神经网络模型的优化运行策略时，基于终端中已有的模型运行策略来确定神经网络模型的多个算子中的每个算子对应的第二运算方式，从而优化神经网络模型的运行性能。也即是说，该种可能的实现方式中，神经网络模型的多个算子中的每个算子对应的第二运算方式可以是由终端基于神经网络模型的算子信息和运行环境信息，以及终端中本地存储的模型运行策略而确定的。In this possible implementation, when the terminal fails to obtain the optimal operation strategy for the neural network model from the server, it can determine each of the multiple operators of the neural network model based on the existing model operation strategy in the terminal. sub-corresponding second operation mode, thereby optimizing the operating performance of the neural network model. That is to say, in this possible implementation, the second operation mode corresponding to each operator in the neural network model may be based on the operator information and running environment information of the neural network model by the terminal. And determined by the model running strategy stored locally in the terminal.

该种可能的实现方式中，在下次载入神经网络模型时，可以重新执行向服务器发送第一请求的操作以及后续操作，以再次尝试对该神经网络模型进行性能优化。In this possible implementation, when the neural network model is loaded next time, the operation of sending the first request to the server and subsequent operations can be re-executed to try to optimize the performance of the neural network model again.

在第一方面的一种可能的实现方式中，该方法还包括：在获得第二运算结果之后，向服务器发送第二性能信息，第二性能信息用于指示至少一个算子关于相应第二运算方式的运行性能。In a possible implementation of the first aspect, the method further includes: after obtaining the second operation result, sending second performance information to the server, where the second performance information is used to indicate at least one operator about the corresponding second operation. mode operating performance.

该种可能的实现方式中，在一些示例中，终端未能从服务器获得能够用于确定神经网络模型的多个算子中的每个算子对应的第一运算方式的反馈信息，可能是服务器中缺少该神经网络模型对应的算子性能信息，或者服务器中该神经网络模型的算子性能不完整。此时，终端向服务器发送第二性能信息，可以在服务器缺失相关算子性能信息时，根据第二性能信息进行补充。当然，在另一些示例中，也可能是终端与服务器之间的通信连接出现故障，导致终端不能接收到反馈信息。此时，终端可以不往服务器发送第二性能信息，也可以是终端向服务器发送第二性能信息，但服务器不根据第二性能信息更新所存储的算子性能信息。In this possible implementation, in some examples, the terminal fails to obtain feedback information from the server that can be used to determine the first operation mode corresponding to each of the multiple operators of the neural network model. It may be that the server The operator performance information corresponding to the neural network model is missing in the server, or the operator performance of the neural network model in the server is incomplete. At this time, the terminal sends the second performance information to the server, and when the server lacks relevant operator performance information, it can supplement it based on the second performance information. Of course, in other examples, it may be that the communication connection between the terminal and the server fails, causing the terminal to be unable to receive feedback information. At this time, the terminal may not send the second performance information to the server, or the terminal may send the second performance information to the server, but the server does not update the stored operator performance information based on the second performance information.

在第一方面的一种可能的实现方式中，终端中安装有应用程序和目标软件开发工具包SDK，应用程序与神经网络模型对应；上述步骤：向服务器发送第一请求，包括：通过应用程序向目标SDK发送第二请求，第二请求包含神经网络模型的信息；当目标SDK中包含目标配置信息的情况下，基于目标配置信息和第二请求生成第一请求，且目标SDK向服务器发送第一请求，目标配置信息用于指示终端能够向服务器发送的关于神经网络模型的信息内容。In a possible implementation of the first aspect, an application program and a target software development kit SDK are installed in the terminal, and the application program corresponds to the neural network model; the above step: sending a first request to the server, including: through the application program Send a second request to the target SDK, and the second request contains the information of the neural network model; when the target SDK contains the target configuration information, the first request is generated based on the target configuration information and the second request, and the target SDK sends the third request to the server. In a request, the target configuration information is used to indicate the information content about the neural network model that the terminal can send to the server.

该种可能的实现方式中，目标配置信息用于指示已开启通过端云交互来实现模型性能优化的功能。这样，终端中的各个应用程序可以根据各自的需求，通过目标SDK，开启通过端云交互来实现模型性能优化的功能，从而通过目标SDK，为终端中的各个不同的应用程序提供与服务器进行交互以获得优化后的神经网络模型的功能，而不需要在开发阶段在各个不同应用程序中设置与服务器进行交互的相关功能，提升了应用程序的开发效率，并且便于终端执行与服务器进行交互的相关操作。In this possible implementation, the target configuration information is used to indicate that the function of optimizing model performance through device-cloud interaction has been enabled. In this way, each application in the terminal can enable the function of optimizing model performance through device-cloud interaction according to their own needs through the target SDK, thereby providing interaction with the server for each different application in the terminal through the target SDK. In order to obtain the functions of the optimized neural network model, there is no need to set up related functions to interact with the server in different applications during the development stage, which improves the development efficiency of the application and facilitates the terminal execution of related functions to interact with the server. operate.

在第一方面的一种可能的实现方式中，反馈信息中包含针对神经网络模型的算子性能信息和/或模型运算策略的信息，算子性能信息包括相应算子关于指定参数的运行性能的信息，模型运算策略包括神经网络模型的多个算子中的每个算子的第一运算方式。In a possible implementation of the first aspect, the feedback information includes operator performance information and/or model operation strategy information for the neural network model, and the operator performance information includes the operating performance of the corresponding operator with respect to the specified parameters. The information, the model operation strategy includes the first operation method of each operator in the plurality of operators of the neural network model.

该种可能的实现方式中，指定参数可以认为是算子的运行性能的影响因素。其中，不同的算子对应的指定参数可以不同。示例性地，指定参数可以为以下参数中的一种或多种：算子参数(例如：输入数据的维度、输出数据的维度、卷积核的大小)、AI推理框架的版本、计算单元的类型、计算单元的运行频率。In this possible implementation, the specified parameters can be considered as factors affecting the operating performance of the operator. Among them, the specified parameters corresponding to different operators can be different. For example, the specified parameters may be one or more of the following parameters: operator parameters (for example: dimensions of input data, dimensions of output data, size of convolution kernel), version of the AI inference framework, computing unit Type, operating frequency of the computing unit.

在第一方面的一种可能的实现方式中，神经网络模型未与终端的硬件关联。In a possible implementation of the first aspect, the neural network model is not associated with the hardware of the terminal.

该种可能的实现方式中，神经网络模型未与终端的硬件关联指该神经网络模型未针对终端的芯片等硬件进行适配。这样，可以保证神经网络模型在各种设备部署时的兼容性和普适性。此外，开发者无需在开发阶段分别针对不同类型和型号的终端，对神经网络模型在终端的运行性能进行针对性地优化，降低了开发者在开发阶段的工作量，提升了开发效率。In this possible implementation, the neural network model is not associated with the hardware of the terminal, which means that the neural network model is not adapted to the hardware of the terminal such as the chip. In this way, the compatibility and universality of the neural network model can be ensured when deployed on various devices. In addition, developers do not need to specifically optimize the operating performance of the neural network model on terminals for different types and models of terminals during the development stage, which reduces the developer's workload during the development stage and improves development efficiency.

本申请第二方面提供一种模型性能优化方法，应用于服务器，该方法包括：接收终端发送的第一请求，第一请求携带有神经网络模型的算子信息和神经网络模型的运行环境信息；向终端发送反馈信息，反馈信息基于算子信息和运行环境信息而得到，并且，反馈信息能够用于确定神经网络模型的多个算子中的每个算子的第一运算方式。A second aspect of this application provides a model performance optimization method, applied to a server. The method includes: receiving a first request sent by a terminal, where the first request carries operator information of the neural network model and operating environment information of the neural network model; Feedback information is sent to the terminal, and the feedback information is obtained based on the operator information and the running environment information, and the feedback information can be used to determine the first operation mode of each operator among the multiple operators of the neural network model.

在第一方面中，服务器在接收到终端发送的第一请求之后，可以根据第一请求中携带的算子信息和运行环境信息，查询神经网络模型对应的算子性能信息，若查询到神经网络模型对应的算子性能信息，则可以根据神经网络模型对应的算子性能信息，获得反馈信息。In the first aspect, after receiving the first request sent by the terminal, the server can query the operator performance information corresponding to the neural network model based on the operator information and operating environment information carried in the first request. If the neural network model is query Based on the operator performance information corresponding to the model, feedback information can be obtained based on the operator performance information corresponding to the neural network model.

在第二方面的一种可能的实现方式中，该方法还包括：接收终端发送的第一性能信息，第一性能信息和第一参考信息之间的差异符合预设条件，反馈信息中携带有第一参考信息；基于第一性能信息，更新服务器中的相应算子性能信息。In a possible implementation of the second aspect, the method further includes: receiving first performance information sent by the terminal, the difference between the first performance information and the first reference information meets a preset condition, and the feedback information carries First reference information; based on the first performance information, update the corresponding operator performance information in the server.

该种可能的实现方式中，终端在实际运行时的算子性能信息与服务器中对应的算子性能信息差异较大时，服务器可以基于终端实际运行时的算子性能信息对服务器中存储的相应算子性能信息进行更新。In this possible implementation, when the operator performance information when the terminal is actually running is significantly different from the corresponding operator performance information in the server, the server can calculate the corresponding operator performance information stored in the server based on the operator performance information when the terminal is actually running. Operator performance information is updated.

在第二方面的一种可能的实现方式中，该方法还包括：若服务器未获得反馈信息，则存储神经网络模型的算子信息和运行环境信息。In a possible implementation of the second aspect, the method further includes: if the server does not obtain feedback information, storing operator information and operating environment information of the neural network model.

该种可能的实现方式中，服务器不能获得反馈信息时，存储神经网络模型的算子信息和运行环境信息，从而指示服务器缺失相关的算子性能信息。服务器存储神经网络模型的算子信息和运行环境信息，可以便于服务器对于所缺失的相关算子性能信息进行维护和更新。In this possible implementation, when the server cannot obtain feedback information, it stores the operator information and operating environment information of the neural network model, thereby indicating that the server is missing relevant operator performance information. The server stores the operator information and operating environment information of the neural network model, which can facilitate the server to maintain and update the missing relevant operator performance information.

在第二方面的一种可能的实现方式中，该方法还包括：接收终端发送的第二性能信息，第二性能信息用于指示至少一个算子关于相应第二运算方式的运行性能；根据第二性能信息、神经网络模型的算子信息和运行环境信息，更新服务器。In a possible implementation of the second aspect, the method further includes: receiving second performance information sent by the terminal, the second performance information being used to indicate the operating performance of at least one operator with respect to the corresponding second operation mode; according to the first 2. Performance information, operator information and operating environment information of the neural network model, and update the server.

该种可能的实现方式中，服务器可以接收到来自终端的第二性能信息，以对服务器中所缺少的算子性能信息进行补充，从而使得服务器中存储的信息更为完备。In this possible implementation, the server can receive the second performance information from the terminal to supplement the operator performance information missing in the server, thereby making the information stored in the server more complete.

本申请第三方面提供一种模型性能优化装置，应用于终端，该装置具有实现上述第一方面或第一方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现，也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块，例如：发送模块、接收模块以及处理模块。The third aspect of the present application provides a model performance optimization device, which is applied to a terminal. The device has the function of implementing the method of the above-mentioned first aspect or any of the possible implementation methods of the first aspect. This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, such as: sending module, receiving module and processing module.

本申请第四方面提供一种终端，该终端包括至少一个处理器、存储器以及存储在存储器中并可在处理器上运行的计算机执行指令，当计算机执行指令被处理器执行时，处理器执行如上述第一方面或第一方面任意一种可能的实现方式的方法。A fourth aspect of the present application provides a terminal, which includes at least one processor, a memory, and computer-executable instructions stored in the memory and executable on the processor. When the computer-executable instructions are executed by the processor, the processor executes as follows: Methods of the above first aspect or any possible implementation of the first aspect.

本申请第五方面提供一种存储一个或多个计算机执行指令的计算机可读存储介质，当计算机执行指令被处理器执行时，处理器执行如上述第一方面或第一方面任意一种可能的实现方式的方法。A fifth aspect of the present application provides a computer-readable storage medium that stores one or more computer-executable instructions. When the computer-executable instructions are executed by a processor, the processor executes any of the above-mentioned first aspects or possible methods of the first aspect. Ways to implement it.

本申请第六方面提供一种存储一个或多个计算机执行指令的计算机程序产品，当计算机执行指令被处理器执行时，处理器执行如上述第一方面或第一方面任意一种可能的实现方式的方法。A sixth aspect of the present application provides a computer program product that stores one or more computer execution instructions. When the computer execution instructions are executed by a processor, the processor executes the above first aspect or any possible implementation of the first aspect. Methods.

本申请第七方面提供了一种芯片系统，该芯片系统包括处理器，用于支持终端实现上述第一方面或第一方面任意一种可能的实现方式中所涉及的功能。在一种可能的设计中，芯片系统还可以包括存储器，存储器用于保存计算机设备必要的程序指令和数据。该芯片系统，可以由芯片构成，也可以包含芯片和其他分立器件。A seventh aspect of the present application provides a chip system. The chip system includes a processor and is used to support a terminal to implement the functions involved in the above-mentioned first aspect or any possible implementation manner of the first aspect. In a possible design, the chip system may also include a memory, which is used to store necessary program instructions and data for the computer device. The chip system may be composed of chips, or may include chips and other discrete devices.

本申请第八方面提供一种模型性能优化装置，该装置可以应用于服务器，该装置具有实现上述第二方面或第二方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现，也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块，例如：接收模块以及发送模块。The eighth aspect of this application provides a model performance optimization device, which can be applied to a server, and has the function of implementing the above second aspect or any of the possible implementation methods of the second aspect. This function can be implemented by hardware, or it can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, such as a receiving module and a sending module.

本申请第九方面提供一种服务器，该服务器包括至少一个处理器、存储器以及存储在存储器中并可在处理器上运行的计算机执行指令，当计算机执行指令被处理器执行时，处理器执行如上述第二方面或第二方面任意一种可能的实现方式的方法。A ninth aspect of the present application provides a server, which includes at least one processor, a memory, and computer-executable instructions stored in the memory and executable on the processor. When the computer-executable instructions are executed by the processor, the processor executes: The above second aspect or any possible implementation method of the second aspect.

本申请第十方面提供一种存储一个或多个计算机执行指令的计算机可读存储介质，当计算机执行指令被处理器执行时，处理器执行如上述第二方面或第二方面任意一种可能的实现方式的方法。A tenth aspect of the present application provides a computer-readable storage medium that stores one or more computer-executable instructions. When the computer-executable instructions are executed by a processor, the processor executes any of the above-mentioned second aspects or possible second aspects. Ways to implement it.

本申请第十一方面提供一种存储一个或多个计算机执行指令的计算机程序产品，当计算机执行指令被处理器执行时，处理器执行如上述第二方面或第二方面任意一种可能的实现方式的方法。An eleventh aspect of the present application provides a computer program product that stores one or more computer execution instructions. When the computer execution instructions are executed by a processor, the processor executes the above second aspect or any possible implementation of the second aspect. way method.

本申请第十二方面提供了一种芯片系统，该芯片系统包括处理器，用于支持服务器实现上述第二方面或第二方面任意一种可能的实现方式中所涉及的功能。在一种可能的设计中，芯片系统还可以包括存储器，存储器用于保存计算机设备必要的程序指令和数据。该芯片系统，可以由芯片构成，也可以包含芯片和其他分立器件。A twelfth aspect of the present application provides a chip system. The chip system includes a processor and is used to support the server in implementing the functions involved in the above-mentioned second aspect or any possible implementation manner of the second aspect. In a possible design, the chip system may also include a memory, which is used to store necessary program instructions and data for the computer device. The chip system may be composed of chips, or may include chips and other discrete devices.

其中，第三方面至第七方面或者其中任一种可能实现方式所带来的技术效果可参见第一方面或第一方面的相关可能实现方式所带来的技术效果，第八方面至第十二方面或者其中任一种可能实现方式所带来的技术效果可参见第二方面或第二方面的相关可能实现方式所带来的技术效果，此处不再赘述。Among them, the technical effects brought by the third to seventh aspects or any one of the possible implementation methods can be found in the technical effects brought by the first aspect or the related possible implementation methods of the first aspect, and the eighth to tenth aspects For the technical effects brought by the two aspects or any one of the possible implementation methods, please refer to the technical effects brought by the second aspect or the related possible implementation methods of the second aspect, and will not be described again here.

附图说明Description of the drawings

图1是本发明实施例提供的一种人工智能主体框架示意图；Figure 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present invention;

图2是本发明实施例提供的服务器和终端的一种示例性示意图；Figure 2 is an exemplary schematic diagram of a server and a terminal provided by an embodiment of the present invention;

图3是本发明实施例提供的服务器和终端之间的信息交互流程的一种示例性示意图；Figure 3 is an exemplary schematic diagram of the information interaction process between the server and the terminal provided by the embodiment of the present invention;

图4是本发明实施例提供的服务器与多个终端进行信息交互的一种示例性示意图；Figure 4 is an exemplary schematic diagram of information interaction between a server and multiple terminals provided by an embodiment of the present invention;

图5是本发明实施例提供的服务器和终端之间的信息交互流程的另一种示例性示意图；Figure 5 is another exemplary schematic diagram of the information interaction process between the server and the terminal provided by the embodiment of the present invention;

图6是本发明实施例提供的服务器和终端之间的信息交互流程的又一种示例性示意图；Figure 6 is another exemplary schematic diagram of the information interaction process between the server and the terminal provided by the embodiment of the present invention;

图7是本发明实施例提供的应用程序、目标SDK以及服务器之间的信息交互流程的一种示例性示意图；Figure 7 is an exemplary schematic diagram of the information interaction process between the application program, the target SDK and the server provided by the embodiment of the present invention;

图8是本申请实施例提供的模型性能优化装置的一实施例示意图；Figure 8 is a schematic diagram of an embodiment of the model performance optimization device provided by the embodiment of the present application;

图9是本申请实施例提供的模型性能优化装置的另一实施例示意图；Figure 9 is a schematic diagram of another embodiment of the model performance optimization device provided by the embodiment of the present application;

图10是本申请实施例提供的终端的一结构示意图；Figure 10 is a schematic structural diagram of a terminal provided by an embodiment of the present application;

图11是本申请实施例提供的服务器的一结构示意图。Figure 11 is a schematic structural diagram of a server provided by an embodiment of the present application.

具体实施方式Detailed ways

下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释，而非旨在限定本发明。The embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention. The terms used in the embodiments of the present invention are only used to explain specific embodiments of the present invention and are not intended to limit the present invention.

下面结合附图，对本申请的实施例进行描述。本领域普通技术人员可知，随着技术的发展和新场景的出现，本申请实施例提供的技术方案对于类似的技术问题，同样适用。The embodiments of the present application are described below with reference to the accompanying drawings. Persons of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

本申请中，“至少一个”是指一个或者多个，“多个”是指两个或两个以上。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B的情况，其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”或其类似表达，是指的这些项中的任意组合，包括单项或复数项的任意组合。本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换，这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。In this application, "at least one" refers to one or more, and "plurality" refers to two or more. "And/or" describes the relationship between associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the related objects are in an "or" relationship. "At least one of the following" or similar expressions thereof refers to any combination of these items, including any combination of single or plural items. The terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances, and are merely a way of distinguishing objects with the same attributes in describing the embodiments of the present application. Furthermore, the terms "include" and "having" and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, product or apparatus comprising a series of elements need not be limited to those elements, but may include not explicitly other elements specifically listed or inherent to such processes, methods, products or equipment.

由于本申请实施例涉及人工智能领域，为了便于理解，下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。Since the embodiments of the present application relate to the field of artificial intelligence, in order to facilitate understanding, the relevant terms and related concepts such as neural networks involved in the embodiments of the present application are first introduced below.

请参见图1，图1示出的为人工智能主体框架的一种结构示意图，下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中，“智能信息链”反映从数据的获取到处理的一列过程。举例来说，可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中，数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程，反映人工智能为信息技术产业带来的价值。Please refer to Figure 1. Figure 1 shows a structural schematic diagram of the main framework of artificial intelligence. The following is an analysis of the above artificial intelligence theme from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). framework is explained. Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.

(1)基础设施(1)Infrastructure

基础设施为人工智能系统提供计算能力支持，实现与外部世界的沟通，并通过基础平台实现支撑。通过传感器与外部沟通；计算能力由智能芯片(中央处理器(centralprocessing unit，CPU)、图形处理器(graphics processing unit，GPU)、神经网络处理器(neural-network processing unit，NPU)、张量处理器(tensor processing unit，TPU)、专用集成电路(application specific integrated circuit，ASIC)、现场可编程逻辑门阵列(field programmable gate array，FPGA)等硬件加速芯片)提供；基础平台包括分布式计算框架及网络等相关的平台保障和支持，可以包括云存储和计算、互联互通网络等。举例来说，传感器和外部沟通获取数据，这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms. Communicate with the outside through sensors; computing power is provided by smart chips (central processing unit (CPU), graphics processing unit (GPU), neural network processing unit (NPU), tensor processing It provides tensor processing unit (TPU), application specific integrated circuit (ASIC), field programmable gate array (FPGA) and other hardware acceleration chips); the basic platform includes distributed computing framework and Network and other related platform guarantees and support can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.

(2)数据(2)Data

基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本，还涉及到传统设备的物联网数据，包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3)数据处理(3)Data processing

数据处理通常包括数据训练，机器学习，深度学习，搜索，推理，决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

其中，机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.

推理是指在计算机或智能系统中，模拟人类的智能推理方式，依据推理控制策略，利用形式化的信息进行机器思维和求解问题的过程，典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.

决策是指智能信息经过推理后进行决策的过程，通常提供分类、排序、预测等功能。Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.

(4)通用能力(4) General ability

对数据经过上面提到的数据处理后，进一步基于数据处理的结果可以形成一些通用的能力，比如可以是算法或者一个通用系统，例如，翻译，文本的分析，计算机视觉的处理，语音识别，图像的识别等等。After the data is processed as mentioned above, some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.

(5)智能产品及行业应用(5) Intelligent products and industry applications

智能产品及行业应用指人工智能系统在各领域的产品和应用，是对人工智能整体解决方案的封装，将智能信息决策产品化、实现落地应用，其应用领域主要包括：智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.

(6)神经网络(6)Neural network

神经网络可以是由神经单元组成的，神经单元可以是指以xs(即输入数据)和截距1为输入的运算单元，该运算单元的输出可以为：The neural network can be composed of neural units. The neural unit can refer to an operation unit that takes xs (ie, input data) and intercept 1 as input. The output of the operation unit can be:

其中，s＝1、2、……n，n为大于1的自然数，Ws为xs的权重，b为神经单元的偏置。f为神经单元的激活函数(activation functions)，用于将非线性特性引入神经网络中，来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入，激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络，即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连，来提取局部接受域的特征，局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected to the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.

(7)算子(7)Operator

算子是指实现某种特定功能的函数。例如，以reshape算子为例，该算子用于对张量数据的形状进行重新诠释。又例如，以transpose算子为例，该算子用于调整张量数据的维度顺序。在本申请中，用于构建深度学习模型算法的常用的函数统称为算子,对任何函数进行某一项操作都可以认为是一个算子。比如卷积是一种积分变换的数学方法，比如是通过两个函数f1和f2生成的函数f3，则可以将f1、f2以及f3分别作为一个算子。An operator refers to a function that implements a specific function. For example, take the reshape operator, which is used to reinterpret the shape of tensor data. As another example, take the transpose operator, which is used to adjust the dimension order of tensor data. In this application, commonly used functions used to build deep learning model algorithms are collectively called operators, and any operation performed on any function can be considered an operator. For example, convolution is a mathematical method of integral transformation. For example, if the function f3 is generated by two functions f1 and f2, then f1, f2 and f3 can be used as an operator respectively.

随着AI的快速发展，神经网络模型已经广泛应用于诸如手机、电视、台式电脑、车载设备等各种设备中，以实现语音识别、图像处理等功能，极大地提升了设备性能，改善了用户的使用体验。With the rapid development of AI, neural network models have been widely used in various devices such as mobile phones, TVs, desktop computers, and vehicle-mounted equipment to achieve functions such as speech recognition and image processing, greatly improving device performance and improving user experience. usage experience.

然而，在实际部署过程中，对终端中的神经网络模型的运行性能时，常常存在以下问题：However, in the actual deployment process, when evaluating the running performance of the neural network model in the terminal, the following problems often exist:

(1)目前，各种不同应用场景下应用的神经网络模型的结构差异较大，例如，语音处理场景中，常常采用transformer模型，而图像处理场景中，常常采用深度神经网络(deepneuralnetwork，DNN)、循环神经网络(recurrent neural network,RNN)模型或者卷积神经网络(convolutional neural networks,CNN)模型，导致优化难度较大。(1) At present, the structures of neural network models used in various application scenarios are quite different. For example, in speech processing scenarios, the transformer model is often used, while in image processing scenarios, deep neural network (DNN) is often used. , recurrent neural network (RNN) model or convolutional neural networks (CNN) model, making optimization difficult.

(2)不同设备所包含的计算单元不同，而不同的计算单元(例如，CPU、GPU、NPU、TPU)所擅长的计算场景不同。例如，NPU对矩阵计算的计算性能较好，而CPU更擅长于逻辑运算、标量计算等，如矩阵计算占比较多的神经网络模型运行于NPU，相比于运行于CPU可获得几倍～几十倍的性能提升，导致同一神经网络模型运行于不同设备或计算单元时性能差异较大。(2) Different devices contain different computing units, and different computing units (for example, CPU, GPU, NPU, TPU) are good at different computing scenarios. For example, NPU has better computing performance for matrix calculations, while CPU is better at logical operations, scalar calculations, etc. For example, a neural network model with a large proportion of matrix calculations can achieve several times to several times better performance when running on NPU than running on CPU. The ten-fold performance improvement results in large performance differences when the same neural network model is run on different devices or computing units.

(3)设备中的硬件和软件版本常常会不断地更新迭代，而神经网络模型在设备中的运行性能会受到硬件和软件版本的影响，因此，为了保证神经网络模型在设备中的运行性能，开发者需要随着硬件和软件的更新迭代而不断进行神经网络模型的优化和适配工作，否则很可能会导致设备中的神经网络模型的运行性能受到较大影响，从而影响用户的使用体验。(3) The hardware and software versions in the device are often constantly updated and iterated, and the running performance of the neural network model in the device will be affected by the hardware and software versions. Therefore, in order to ensure the running performance of the neural network model in the device, Developers need to continuously optimize and adapt the neural network model as hardware and software are updated and iterated. Otherwise, the running performance of the neural network model in the device is likely to be greatly affected, thus affecting the user experience.

基于以上问题，在实际开发过程中，为了满足应用在各种不同设备中的部署需要，并在多种设备上获得神经网络模型的较佳性能，开发者需要将神经网络模型分别部署于多种设备并分别进行针对性地性能优化。Based on the above problems, in the actual development process, in order to meet the deployment needs of applications in various devices and obtain better performance of neural network models on multiple devices, developers need to deploy neural network models on multiple devices. equipment and perform targeted performance optimization respectively.

下面针对目前的一种常用的模型性能优化方式进行简单介绍。The following is a brief introduction to a currently commonly used model performance optimization method.

目前，针对某一设备，该设备在载入神经网络模型之后，可以对神经网络模型进行解析，此外，可能会对神经网络模型的结构进行简单的优化，例如对神经网络模型中的算子进行融合或者剪枝，并在简单的优化之后获得神经网络模型中的算子信息。在获得神经网络模型中的算子信息之后，可以将神经网络模型的算子信息下发至设备中的不同计算单元对应的计算库，从而获取计算库返回的计算单元支持性校验结果。该计算单元支持性校验结果可以包含计算单元对神经网络模型中的算子的运行速度、功耗等运行性能信息。设备在获得计算单元支持性校验结果之后，可以根据计算单元支持性校验结果进行决策，以确定神经网络模型中的算子如何分配至设备的计算单元中来进行运算。这样，设备可以根据该决策的结果，进行模型性能优化。Currently, for a certain device, after loading the neural network model, the device can analyze the neural network model. In addition, the structure of the neural network model may be simply optimized, such as optimizing the operators in the neural network model. Fusion or pruning, and obtain operator information in the neural network model after simple optimization. After obtaining the operator information in the neural network model, the operator information of the neural network model can be sent to the computing library corresponding to different computing units in the device, thereby obtaining the computing unit support verification results returned by the computing library. The computing unit support verification results may include operating performance information such as the computing unit's operating speed and power consumption for operators in the neural network model. After the device obtains the computing unit support verification results, it can make decisions based on the computing unit support verification results to determine how to allocate operators in the neural network model to the computing units of the device for calculations. In this way, the device can optimize model performance based on the results of this decision.

然而，这一针对某一设备的常用的模型性能优化方式会导致设备载入神经网络模型到应用神经网络模型进行运算的过程耗时较长，且设备中的模型性能优化策略通常较为简单，计算单元的支持型校验结果通常也较少，导致模型性能优化结果也较为不理想。However, this commonly used model performance optimization method for a certain device will cause the process from loading the neural network model to applying the neural network model to the device to perform calculations, which takes a long time, and the model performance optimization strategy in the device is usually relatively simple, and the calculation The support verification results of units are usually less, resulting in less than ideal model performance optimization results.

此外，针对特定设备进行优化后的神经网络模型的模型参数会与该设备深度耦合，导致该优化后的神经网络模型与其他设备的兼容性较差，因此，不同设备对应的优化结果通常不能通用，使得针对每个设备进行神经网络模型的性能优化操作通常都较为繁琐，无法简化。In addition, the model parameters of a neural network model optimized for a specific device will be deeply coupled with the device, resulting in poor compatibility between the optimized neural network model and other devices. Therefore, the optimization results corresponding to different devices are usually not universal. , making the performance optimization operation of neural network models for each device usually cumbersome and cannot be simplified.

可见，目前，对终端中的神经网络模型的运行性能的优化效率较低，工作量较大。It can be seen that at present, the optimization efficiency of the operating performance of the neural network model in the terminal is low and the workload is large.

基于此，本申请实施例提供一种模型性能优化方法，以解决目前的开发者为了在多种设备上获得神经网络模型的较佳性能，需要将训练后得到的神经网络模型分别部署于多种设备并分别进行针对性地测试优化，优化效率较低的问题。Based on this, embodiments of the present application provide a model performance optimization method to solve the problem that in order to obtain better performance of neural network models on multiple devices, developers currently need to deploy the trained neural network models on multiple devices. The equipment is tested and optimized separately to optimize problems with low efficiency.

下面针对本申请实施例的模型性能优化方法进行具体介绍。The following is a detailed introduction to the model performance optimization method in the embodiment of this application.

本申请实施例的模型性能优化方法可以应用于端云协同系统中，该端云协同系统包括终端和服务器。The model performance optimization method of the embodiment of the present application can be applied to a device-cloud collaboration system, which includes a terminal and a server.

示例性地，该终端可以是手机(mobile phone)、平板电脑(pad)、带无线收发功能的电脑、虚拟现实(virtual reality，VR)终端、增强现实(augmented reality，AR)终端、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端、以物联网(internet of things，IoT)中的无线终端等。For example, the terminal can be a mobile phone (mobile phone), a tablet computer (pad), a computer with wireless transceiver functions, a virtual reality (VR) terminal, an augmented reality (AR) terminal, an industrial control ( Wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in remote medical, wireless terminals in smart grid, and transportation safety Wireless terminals, wireless terminals in smart cities (smart cities), wireless terminals in smart homes (smart homes), wireless terminals in the Internet of Things (IoT), etc.

本申请实施例中的服务器可以是一个服务器或者是服务器集群。The server in the embodiment of this application may be a server or a server cluster.

终端和服务器之间可以基于指定的无线通信方式或者有线通信方式进行数据传输。终端和服务器之间的通信方式可以有多种，本申请实施例在此不做限定。Data transmission can be carried out between the terminal and the server based on the specified wireless communication method or wired communication method. There may be multiple communication methods between the terminal and the server, which are not limited in the embodiments of this application.

如图2所示，为服务器和终端的一种示例性示意图。As shown in Figure 2, it is an exemplary schematic diagram of a server and a terminal.

图2中，服务器201可以与多个终端(例如图2所示的终端211、终端212和终端213)进行信息交互。其中，终端211、终端212和终端213的设备类型以及设备型号可以相同，也可以不同。服务器201还可以与数据存储系统连接。In Figure 2, the server 201 can interact with multiple terminals (such as the terminal 211, the terminal 212 and the terminal 213 shown in Figure 2). Among them, the equipment types and equipment models of terminal 211, terminal 212 and terminal 213 may be the same or different. Server 201 may also be connected to a data storage system.

终端中安装有应用程序，该应用程序中可以包含神经网络模型。该应用程序可以是终端从应用市场或者其他途径下载并安装的，也可以是终端出厂时自带的。An application is installed in the terminal, and the application can contain a neural network model. The application can be downloaded and installed by the terminal from the application market or other channels, or it can be included in the terminal when it leaves the factory.

在一些实施例中，神经网络模型未与终端的硬件关联。In some embodiments, the neural network model is not associated with the terminal's hardware.

也即是说，神经网络模型未针对终端的芯片等硬件进行适配。这样，可以保证神经网络模型在各种设备部署时的兼容性和普适性。此外，开发者无需在开发阶段分别针对不同类型和型号的终端，对神经网络模型在终端的运行性能进行针对性地优化，降低了开发者在开发阶段的工作量，提升了开发效率。In other words, the neural network model is not adapted to the terminal's hardware such as chips. In this way, the compatibility and universality of the neural network model can be ensured when deployed on various devices. In addition, developers do not need to specifically optimize the operating performance of the neural network model on terminals for different types and models of terminals during the development stage, which reduces the developer's workload during the development stage and improves development efficiency.

本申请实施例并不限定该神经网络模型的类型。示例性地，该神经网络模型可以是回归模型、DNN、CNN或者RNN。The embodiments of this application do not limit the type of the neural network model. For example, the neural network model may be a regression model, DNN, CNN or RNN.

应用程序对应的神经网络模型可以包含于应用程序的安装包中，也可以是在终端安装该应用程序之后，再基于应用程序中的指定配置信息将神经网络模型下载至终端中。该神经网络模型可以用于对应用程序中的待处理数据进行处理。The neural network model corresponding to the application program can be included in the installation package of the application program, or after the terminal installs the application program, the neural network model can be downloaded to the terminal based on the specified configuration information in the application program. This neural network model can be used to process data to be processed in applications.

该模型性能优化方法可以应用于人工智能领域的自然语言处理领域、图像处理领域以及音视频处理领域等多种信息处理领域中。该应用程序的具体功能以及神经网络模型的功能在此不做限定。This model performance optimization method can be applied to various information processing fields such as natural language processing, image processing, audio and video processing in the field of artificial intelligence. The specific functions of the application and the functions of the neural network model are not limited here.

举例来说，该应用程序可以实现人脸识别功能，此时，该应用程序中可以包括用于人脸识别的神经网络模型，则通过该神经网络模型可以对应用程序获取到的待处理图像进行处理，获得相应的人脸识别结果，此时，应用程序中的待处理数据为待处理图像。或者，该应用程序可以实现文本翻译功能，则通过该神经网络模型可以对应用程序获取到的文字信息进行翻译，获得翻译结果，此时，应用程序中的待处理数据为文字信息。For example, the application can implement the face recognition function. At this time, the application can include a neural network model for face recognition, and the image to be processed obtained by the application can be processed through the neural network model. Process to obtain the corresponding face recognition results. At this time, the data to be processed in the application is the image to be processed. Alternatively, the application can implement a text translation function, and the text information obtained by the application can be translated through the neural network model to obtain the translation result. At this time, the data to be processed in the application is text information.

参考图3，具体地，本申请实施例的模型性能优化方法可以包括步骤301-305。Referring to Figure 3, specifically, the model performance optimization method in the embodiment of the present application may include steps 301-305.

步骤301，终端向服务器发送第一请求。Step 301: The terminal sends a first request to the server.

第一请求携带有神经网络模型的算子信息和神经网络模型的运行环境信息。The first request carries operator information of the neural network model and operating environment information of the neural network model.

本申请实施例中，服务器可以提供指定的应用程序接口(applicationprogramming interface，API)，终端调用该API并基于与服务器的通信连接，向服务器发送第一请求。In this embodiment of the present application, the server can provide a designated application programming interface (API), and the terminal calls the API and sends the first request to the server based on the communication connection with the server.

通常来说，算子信息可以包括神经网络模型中的每个算子的信息。Generally speaking, operator information can include information about each operator in the neural network model.

其中，神经网络模型中的算子的数量和类型可以基于实际应用场景来确定，在此不做限制。例如，神经网络模型中的算子可以包含卷积算子、池化算子、激活算子、全连接算子中的一种或多种，并且，每一种算子可以包括一个或多个，相同类型的算子之间的算子参数可以相同，也可以不同。举例来说，神经网络模型中可以包括两个卷积算子、两个池化算子和一个全连接算子。神经网络模型中的算子可以对应神经网络中的层(layer)。Among them, the number and type of operators in the neural network model can be determined based on actual application scenarios, and are not limited here. For example, the operators in the neural network model may include one or more of convolution operators, pooling operators, activation operators, and fully connected operators, and each operator may include one or more , the operator parameters between operators of the same type can be the same or different. For example, the neural network model can include two convolution operators, two pooling operators and a fully connected operator. Operators in the neural network model can correspond to layers in the neural network.

算子信息可以包括相应算子的以下信息中的一种或多种：算子类型、算子的权重信息、一个或多个算子参数信息。该算子参数信息与算子的权重不同，示例性地，该算子参数信息可以包括算子的输入数据和/或输出数据的维度、算子中的元素的相关参数(例如卷积算子中卷积核的大小)的信息。The operator information may include one or more of the following information of the corresponding operator: operator type, operator weight information, and one or more operator parameter information. The operator parameter information is different from the weight of the operator. For example, the operator parameter information may include the input data and/or the dimensions of the output data of the operator, and the relevant parameters of the elements in the operator (such as the convolution operator the size of the convolution kernel).

此外，在一些示例中，第一请求中还可以包括神经网络模型的算子之间的关联关系信息。In addition, in some examples, the first request may also include association relationship information between operators of the neural network model.

算子之间的关联关系信息可以包括神经网络模型中的各个算子之间的连接关系以及各个算子之间的数据流向等信息。算子之间的关联关系信息由神经网络模型的图结构来表示。The association information between operators may include information such as the connection relationship between operators in the neural network model and the data flow direction between operators. The correlation information between operators is represented by the graph structure of the neural network model.

第一请求中的算子信息以及算子之间的关联关系信息的具体形式在此不做限定。The specific forms of the operator information and the association information between operators in the first request are not limited here.

示例性地，第一请求中可以以列表的形式，记录神经网络模型中的每个算子的算子信息。或者，第一请求中可以包含完整的神经网络模型，此时，该完整的神经网络模型不仅包含每个算子的算子信息，还包含算子之间的关联关系信息。For example, the first request may record the operator information of each operator in the neural network model in the form of a list. Alternatively, the first request may include a complete neural network model. In this case, the complete neural network model not only includes operator information of each operator, but also includes association relationship information between operators.

神经网络模型对应的运行环境信息可以包括神经网络模型在终端进行运行时，所关联的硬件信息和/或软件信息。The running environment information corresponding to the neural network model may include hardware information and/or software information associated with the neural network model when it is run on the terminal.

示例性地，软件信息可以包括神经网络模型对应的应用程序的信息和/或AI推理框架的信息。其中，该AI推理框架用于控制神经网络模型在终端的运行过程，例如，该AI推理框架可以载入并解析神经网络模型，获得神经网络模型中的算子的信息，然后获得神经网络模型的算子对应的二进制可执行文件，再分配至终端的计算单元进行运算。应用程序的信息可以包括应用程序的名称、标识、版本号等信息中的一种或多种，AI推理框架的信息可以包括AI推理框架的名称、标识、版本号等信息中的一种或多种。For example, the software information may include information about the application program corresponding to the neural network model and/or information about the AI inference framework. Among them, the AI inference framework is used to control the running process of the neural network model on the terminal. For example, the AI inference framework can load and parse the neural network model, obtain the information of the operators in the neural network model, and then obtain the information of the neural network model. The binary executable file corresponding to the operator is then allocated to the computing unit of the terminal for calculation. The information of the application program may include one or more of the name, identifier, version number and other information of the application program. The information of the AI inference framework may include one or more of the name, identifier, version number and other information of the AI inference framework. kind.

硬件信息可以包括终端中神经网络模型对应的芯片的信息，该芯片的信息可以包括芯片的供应商、芯片型号、芯片所包含的计算单元的信息中的一种或多种。The hardware information may include information about the chip corresponding to the neural network model in the terminal. The chip information may include one or more of the chip supplier, chip model, and information about the computing unit included in the chip.

芯片中的计算单元的类型可以包括CPU，也可以包括GPU、NPU、TPU中的一种或多种。芯片所包含的计算单元的信息可以包括计算单元的类型、型号、性能参数(例如计算单元的运行频率)、计算单元对算子的支持情况等信息中的一种或多种。The type of computing unit in the chip may include a CPU, or one or more of a GPU, NPU, and TPU. Information about the computing unit included in the chip may include one or more of the type, model, performance parameters (such as the operating frequency of the computing unit), the computing unit's support for operators, and other information.

此外，第一请求中还可以携带有终端的名称、国际移动设备识别码(International mobile equipment identity，IMEI)等标识信息，以便于服务器识别第一请求的发送方。In addition, the first request may also carry identification information such as the name of the terminal and the International Mobile Equipment Identity (IMEI), so that the server can identify the sender of the first request.

本申请实施例中，触发终端向服务器发送第一请求的触发条件可以有多种情况，在此不做限制。In this embodiment of the present application, the triggering conditions that trigger the terminal to send the first request to the server may be in various situations, which are not limited here.

例如，神经网络模型与应用程序对应，可以是终端每次启动应用程序时，向服务器发送第一请求；或者，可以是在终端首次启动应用程序时，向服务器发送第一请求；或者，可以在终端启动应用程序，并首次应用程序神经网络模型时，向服务器发送第一请求；或者，可以是终端启动应用程序，并且检测到终端中未存储有对运行性能进行优化后的神经网络模型的信息时，向服务器发送第一请求；或者，可以是终端在应用程序运行时，周期性地触发向服务器发送第一请求的操作；或者，也可以是在检测到神经网络模型的更新信息之后，向服务器发送第一请求。For example, the neural network model corresponds to the application program. The terminal may send a first request to the server every time it starts the application program; or it may send the first request to the server when the terminal starts the application program for the first time; or it may When the terminal starts the application program and applies the neural network model for the first time, it sends the first request to the server; or, the terminal may start the application program and detects that the information of the neural network model that optimizes the running performance is not stored in the terminal. When the application is running, the first request is sent to the server; or, the terminal can periodically trigger the operation of sending the first request to the server while the application is running; or, after detecting the updated information of the neural network model, the terminal can send the first request to the server. The server sends the first request.

步骤302，服务器接收终端发送的第一请求。Step 302: The server receives the first request sent by the terminal.

服务器在接收到终端发送的第一请求之后，可以根据第一请求中携带的算子信息和运行环境信息，查询神经网络模型对应的算子性能信息，若查询到神经网络模型对应的算子性能信息，则可以根据神经网络模型对应的算子性能信息，获得反馈信息。After receiving the first request sent by the terminal, the server can query the operator performance information corresponding to the neural network model based on the operator information and operating environment information carried in the first request. If the operator performance information corresponding to the neural network model is queried, information, you can obtain feedback information based on the operator performance information corresponding to the neural network model.

示例性地，服务器可以包括算子性能信息，以及每个算子性能信息对应的索引信息。这样，可以将第一请求中携带的算子信息和运行环境信息与每个索引信息进行匹配，若第一请求中携带的算子信息和运行环境信息与某一索引信息匹配成功，则可以将该匹配成功的索引信息对应的算子性能信息作为神经网络模型对应的算子性能信息，从而获得反馈信息。For example, the server may include operator performance information and index information corresponding to each operator performance information. In this way, the operator information and running environment information carried in the first request can be matched with each index information. If the operator information and running environment information carried in the first request are successfully matched with a certain index information, then the operator information and running environment information carried in the first request can be matched with a certain index information. The operator performance information corresponding to the successfully matched index information is used as the operator performance information corresponding to the neural network model, thereby obtaining feedback information.

其中，算子性能信息包括相应算子关于指定参数的运行性能的信息。Among them, the operator performance information includes information about the operating performance of the corresponding operator with respect to specified parameters.

指定参数可以认为是算子的运行性能的影响因素。由于算子相关的运算操作是由终端中的计算单元来执行，因此指定参数通常包括与计算单元相关的参数。其中，不同的算子对应的指定参数可以不同。示例性地，指定参数可以为以下参数中的一种或多种：算子的数据维度(例如：输入数据的维度、输出数据的维度)、算子参数、AI推理框架的版本、计算单元的类型、计算单元的运行频率。The specified parameters can be considered as factors affecting the operating performance of the operator. Since the operation operations related to the operator are performed by the computing unit in the terminal, the specified parameters usually include parameters related to the computing unit. Among them, the specified parameters corresponding to different operators can be different. For example, the specified parameter may be one or more of the following parameters: the data dimension of the operator (for example: the dimension of the input data, the dimension of the output data), operator parameters, version of the AI inference framework, computing unit Type, operating frequency of the computing unit.

算子参数与算子的类型相关，不同类型的算子中的算子参数可以不同。以卷积算子为例，卷积算子的算子参数可以包括卷积算子中的卷积核的大小、卷积核在卷积运算中的移动步长等。通常来说，虽然目前的神经网络模型的类型和结构多种多样，但所涉及的算子的类型较为有限，同种类型的算子的算子参数以及数据维度的选择范围都较为有限，并且在很多情况下可能相同。因此，任一类型的算子的算子性能信息通常可以用于多种包含该算子的神经网络模型。Operator parameters are related to the type of operator, and the operator parameters in different types of operators can be different. Taking the convolution operator as an example, the operator parameters of the convolution operator may include the size of the convolution kernel in the convolution operator, the moving step size of the convolution kernel in the convolution operation, etc. Generally speaking, although the types and structures of current neural network models are diverse, the types of operators involved are relatively limited, and the selection ranges of operator parameters and data dimensions of the same type of operators are relatively limited, and Probably the same in many cases. Therefore, operator performance information for any type of operator can generally be used in a variety of neural network models containing that operator.

步骤303，若服务器根据算子信息和运行环境信息，获得针对第一请求的反馈信息，则向终端发送反馈信息。Step 303: If the server obtains feedback information for the first request based on the operator information and running environment information, it sends the feedback information to the terminal.

反馈信息能够用于确定神经网络模型的多个算子中的每个算子的第一运算方式。The feedback information can be used to determine the first operation mode of each of the multiple operators of the neural network model.

第一运算方式用于指示算子在终端进行运算时的运行参数的设置方式。该运行参数可以包括对算子对应的运算性能有影响的参数。示例性地，该运算参数可以包括与计算单元相关的参数。The first operation mode is used to indicate the setting mode of the operating parameters when the operator performs operations on the terminal. The operating parameters may include parameters that affect the operation performance corresponding to the operator. Exemplarily, the operation parameters may include parameters related to the computing unit.

在一些实施例中，多个算子中的每个算子的第一运算方式包括相应算子在对应的第一计算单元的运算方式，第一计算单元为终端的计算单元中的一个或多个。In some embodiments, the first operation mode of each operator in the plurality of operators includes the operation mode of the corresponding operator in the corresponding first computing unit, and the first computing unit is one or more of the computing units of the terminal. indivual.

由于终端中，算子对应的运算通常是通过终端中的计算单元来执行，因此，可以认为神经网络模型的多个算子中的每个算子的第一运算方式具体是指：Since in the terminal, the operations corresponding to the operators are usually performed by the computing unit in the terminal, it can be considered that the first operation method of each operator among the multiple operators of the neural network model specifically refers to:

神经网络模型的多个算子中的每个算子对应的第一计算单元的第一运算方式。The first operation mode of the first calculation unit corresponding to each operator in the neural network model.

此时，第一运算方式用于指示算子在对应的第一计算单元进行运算时的运行参数的设置方式。示例性地，该运行参数可以包括计算单元的运行频率。At this time, the first operation mode is used to indicate the setting mode of the operating parameters when the operator performs operations on the corresponding first computing unit. By way of example, the operating parameters may include the operating frequency of the computing unit.

第一计算单元为终端的计算单元中的一个或多个，当第一计算单元为终端的计算单元中的多个时，各个第一计算单元的类型可以相同，也可以不同，并且各个第一计算单元的硬件配置可以相同，也可以不同。The first computing unit is one or more of the computing units of the terminal. When the first computing unit is a plurality of the computing units of the terminal, the type of each first computing unit may be the same or different, and each first computing unit may be of the same or different type. The hardware configuration of the computing units can be the same or different.

反馈信息能够用于确定神经网络模型的多个算子中的每个算子的第一运算方式具体可以指：反馈信息能够用于确定反馈信息能够用于确定神经网络模型的多个算子中的每个算子对应的第一计算单元，以及神经网络模型的多个算子中的每个算子在对应的第一计算单元的第一运算方式。The first operation method in which the feedback information can be used to determine each of the multiple operators of the neural network model may specifically refer to: the feedback information can be used to determine that the feedback information can be used to determine the multiple operators of the neural network model. a first computing unit corresponding to each operator, and a first operation mode of each operator in the corresponding first computing unit of the neural network model.

此时，该反馈信息可以包括以下两种情况中的任意一种：At this time, the feedback information can include either of the following two situations:

(1)该反馈信息中可以包括神经网络模型对应的模型运算策略的信息，该模型运算策略包括神经网络模型的多个算子中的每个算子对应的第一计算单元，以及神经网络模型的多个算子中的每个算子在对应的第一计算单元的第一运算方式。(1) The feedback information may include information about the model operation strategy corresponding to the neural network model. The model operation strategy includes the first computing unit corresponding to each of the multiple operators of the neural network model, and the neural network model. Each operator in the plurality of operators is in the first operation mode of the corresponding first calculation unit.

(2)该反馈信息包括神经网络模型对应的算子性能信息，并且，反馈信息中包括的算子性能信息满足指定的优化条件。(2) The feedback information includes operator performance information corresponding to the neural network model, and the operator performance information included in the feedback information meets the specified optimization conditions.

在这一情况中，反馈信息中包括的算子性能信息满足指定的优化条件可以理解为反馈信息中包括足够的算子性能信息。在一种示例中，确定反馈信息中包括的算子性能信息满足指定的优化条件可以是反馈信息中包含的算子性能信息与终端的每个计算单元相关，从而能够支持终端从终端的计算单元中，确定对相应算子的运行性能较优的计算单元为相应算子的第一计算单元，以及确定对相应算子的运行性能较优的运算方式为相应算子的第一运算方式。In this case, if the operator performance information included in the feedback information satisfies the specified optimization conditions, it can be understood that the feedback information includes sufficient operator performance information. In one example, determining that the operator performance information included in the feedback information satisfies the specified optimization condition may be that the operator performance information included in the feedback information is related to each computing unit of the terminal, thereby being able to support the terminal from the computing unit of the terminal. , it is determined that the computing unit with better operating performance for the corresponding operator is the first computing unit of the corresponding operator, and the operation method that is determined to have better operation performance with the corresponding operator is the first operation method of the corresponding operator.

在一些实施例中，反馈信息中包含针对神经网络模型的算子性能信息和/或模型运算策略的信息，算子性能信息包括相应算子关于指定参数的运行性能的信息，模型运算策略包括神经网络模型的多个算子中的每个算子的第一运算方式。In some embodiments, the feedback information includes operator performance information and/or model operation strategy information for the neural network model. The operator performance information includes information about the operating performance of the corresponding operator with respect to specified parameters. The model operation strategy includes neural network model operation strategy information. The first operation mode of each operator in the network model.

其中，在一种示例中，该反馈信息可以包括神经网络模型的算子性能信息。神经网络模型对应的算子性能信息可以指示神经网络模型中的算子以至少一种运算方式进行运算时的运行性能。In one example, the feedback information may include operator performance information of the neural network model. The operator performance information corresponding to the neural network model can indicate the operating performance of the operators in the neural network model when operating in at least one operation mode.

而在另一种示例中，该反馈信息可以包括神经网络模型的多个算子中的每个算子对应的第一运算方式的信息。In another example, the feedback information may include information about the first operation mode corresponding to each of the multiple operators of the neural network model.

步骤304，终端在第一时间内，接收服务器发送的反馈信息。Step 304: The terminal receives the feedback information sent by the server within the first time.

第一时间相对于发送第一请求的时间之间的时间间隔可以根据实际应用程序场景来确定。示例性地，该第一时间相对于发送第一请求的时间之间的时间间隔可以由开发者在终端预先配置；或者，也可以是服务器基于服务器获得反馈信息所需时长等信息预先确定并发送至终端，再由终端进行配置。The time interval between the first time and the time of sending the first request may be determined according to actual application scenarios. For example, the time interval between the first time and the time of sending the first request can be pre-configured by the developer on the terminal; or it can also be pre-determined and sent by the server based on information such as the time required for the server to obtain feedback information. to the terminal, and then configure it from the terminal.

若终端在第一时间内，未接收到服务器发送的第一请求的反馈信息，或者终端接收到的反馈信息不能用于确定神经网络模型的多个算子中的每个算子的第一运算方式，则参考图5中的步骤503以及后续步骤的相关实施例的描述，在此不再赘述。If the terminal does not receive the feedback information of the first request sent by the server within the first time, or the feedback information received by the terminal cannot be used to determine the first operation of each operator among the multiple operators of the neural network model In this way, please refer to the description of relevant embodiments of step 503 in Figure 5 and subsequent steps, which will not be described again here.

反馈信息能够用于确定神经网络模型的多个算子中的每个算子的第一运算方式，可以包括以下两种情况中的任意一种：Feedback information can be used to determine the first operation mode of each operator in the neural network model, which can include either of the following two situations:

(1)该反馈信息中可以包括神经网络模型对应的模型运算策略的信息。(1) The feedback information may include information on the model operation strategy corresponding to the neural network model.

该模型运算策略包括神经网络模型的多个算子中的每个算子对应的第一运算方式。The model operation strategy includes a first operation method corresponding to each operator in the neural network model.

此时，终端根据该反馈信息，能够直接获得神经网络模型的多个算子中的每个算子对应的第一运算方式的信息。At this time, based on the feedback information, the terminal can directly obtain information about the first operation mode corresponding to each of the multiple operators of the neural network model.

在这一情况中，反馈信息中包括的算子性能信息满足指定的优化条件可以理解为反馈信息中包括足够的算子性能信息。In this case, if the operator performance information included in the feedback information satisfies the specified optimization conditions, it can be understood that the feedback information includes sufficient operator performance information.

而对于终端来说，确定反馈信息中包括的算子性能信息满足指定的优化条件的方式可以有多种方式。For the terminal, there are many ways to determine that the operator performance information included in the feedback information satisfies the specified optimization conditions.

在一种示例中，可以是反馈信息中包括神经网络模型的每个算子的算子性能信息，并且，反馈信息中包含的算子性能信息与终端的每个计算单元相关，从而能够支持终端从终端的计算单元中，确定对相应算子的运行性能较优的计算单元为相应算子的第一计算单元，以及确定对相应算子的运行性能较优的运算方式为相应算子的第一运算方式。In one example, the feedback information may include operator performance information of each operator of the neural network model, and the operator performance information included in the feedback information is related to each computing unit of the terminal, thereby being able to support the terminal. From the computing units of the terminal, it is determined that the computing unit with better operating performance for the corresponding operator is the first computing unit of the corresponding operator, and the computing method that has better operating performance for the corresponding operator is determined to be the third computing unit of the corresponding operator. A method of operation.

举例来说，神经网络模型中包含卷积算子，终端的芯片中有CPU、NPU和GPU。For example, the neural network model contains convolution operators, and the terminal chips include CPU, NPU and GPU.

若反馈信息中包含卷积算子在CPU的算子性能信息，以及卷积算子在NPU的算子性能信息，但不包含卷积算子在GPU的算子性能信息，则可以认为该反馈信息中，关于卷积算子的算子性能信息并不完备，没有包括足够的算子性能信息，从而认为反馈信息中的算子性能信息不满足指定的优化条件。If the feedback information includes the operator performance information of the convolution operator on the CPU and the operator performance information of the convolution operator on the NPU, but does not include the operator performance information of the convolution operator on the GPU, then the feedback can be considered In the information, the operator performance information about the convolution operator is incomplete and does not include enough operator performance information. Therefore, it is considered that the operator performance information in the feedback information does not meet the specified optimization conditions.

而若反馈信息中包含卷积算子在CPU的算子性能信息、卷积算子在NPU的算子性能信息以及卷积算子在GPU的算子性能信息，则可以认为该反馈信息中，关于卷积算子的算子性能信息较为完备，满足指定的优化条件，因此反馈信息中，针对卷积算子有足够的算子性能信息。And if the feedback information includes the operator performance information of the convolution operator on the CPU, the operator performance information of the convolution operator on the NPU, and the operator performance information of the convolution operator on the GPU, then it can be considered that the feedback information, The operator performance information about the convolution operator is relatively complete and meets the specified optimization conditions. Therefore, the feedback information contains sufficient operator performance information for the convolution operator.

此时，终端可以比对卷积算子在CPU的算子性能信息、卷积算子在NPU的算子性能信息以及卷积算子在GPU的算子性能信息，从中选出运行性能最佳的计算单元作为卷积算子的第一计算单元。At this time, the terminal can compare the operator performance information of the convolution operator on the CPU, the operator performance information of the convolution operator on the NPU, and the operator performance information of the convolution operator on the GPU, and select the one with the best operating performance. The calculation unit serves as the first calculation unit of the convolution operator.

在一种示例中，确定可以是一一针对神经网络模型的多个算子中的每个算子，确定每个算子对应的第一计算单元，以及每个算子在对应的第一计算单元的第一运算方式。In one example, the determination may be to determine, for each operator among the multiple operators of the neural network model, the first calculation unit corresponding to each operator, and the corresponding first calculation unit for each operator. The first operation mode of the unit.

在另一种示例中，也可以对神经网络模型的算子进行组合，获得N个第一算子集合，然后确定每个第一算子集合所对应的第一计算单元，以及每个第一算子集合在各自对应的第一计算单元的第一运算方式，其中，N为正整数，每个第一算子集合包含至少一个算子。对各个算子进行组合的策略可以有多种，在此不做限制。示例性地，可以基于算子的类型、个数以及算子之间的关系，对各个算子进行组合，获得N个第一算子集合。例如，神经网络模型中包括多个相邻的卷积算子，并且该多个相邻的卷积算子的算子维度等算子参数均相同或者接近，则可以将该多个相邻的卷积算子进行组合，获得一个第一算子集合，该第一算子集合中的卷积算子可以由同一第一计算单元来运行。In another example, the operators of the neural network model can also be combined to obtain N first operator sets, and then the first computing unit corresponding to each first operator set and each first The first operation mode of the operator set in each corresponding first calculation unit, where N is a positive integer, and each first operator set contains at least one operator. There can be many strategies for combining each operator, which are not limited here. For example, based on the type and number of operators and the relationship between operators, each operator can be combined to obtain N first operator sets. For example, if the neural network model includes multiple adjacent convolution operators, and the operator parameters such as operator dimensions of the multiple adjacent convolution operators are the same or close, then the multiple adjacent convolution operators can be The convolution operators are combined to obtain a first operator set, and the convolution operators in the first operator set can be run by the same first computing unit.

通常来说，神经网络模型在终端的运行性能很大程度上由神经网络模型的多个算子中的每个算子的运行性能确定，而神经网络模型中的算子由不同的计算单元来执行，导致神经网络模型被跨计算单元执行所造成的影响通常较小，通常不是神经网络模型的运行性能的决定性因素。也即是说，如果神经网络模型中的每个算子被分配到对应的运行性能最佳的计算单元来执行计算，那么，神经网络模型在终端的整体运行性能通常为最优的运行性能，或者较优的运行性能。Generally speaking, the operating performance of the neural network model on the terminal is largely determined by the operating performance of each of the multiple operators of the neural network model, and the operators in the neural network model are determined by different computing units. Execution, the impact caused by causing the neural network model to be executed across computing units is usually small and is not usually a determining factor in the operational performance of the neural network model. That is to say, if each operator in the neural network model is assigned to the corresponding computing unit with the best operating performance to perform calculations, then the overall operating performance of the neural network model on the terminal is usually the optimal operating performance. Or better operating performance.

因此，本申请实施例中，针对终端的神经网络模型，可以基于服务器的相关反馈信息，获得该神经网络模型的多个算子中的每个算子的较优的运算方式(即第一运算方式)，使得终端以第一运算方式进行关于神经网络的运算时，能够获得较好的运行性能，而无需开发者针对神经网络模型可能要部署的各种不同类型和型号的设备，进行复杂的性能优化操作，提升了神经网络模型在终端的优化效率。Therefore, in the embodiment of the present application, for the neural network model of the terminal, the optimal operation method (i.e., the first operation method) of each of the multiple operators of the neural network model can be obtained based on the relevant feedback information of the server. mode), so that when the terminal performs calculations on the neural network in the first calculation mode, it can obtain better operating performance, without the need for developers to perform complex operations on various types and models of equipment that may be deployed in the neural network model. Performance optimization operations improve the optimization efficiency of neural network models on the terminal.

在一些实施例中，方法还包括步骤305。In some embodiments, the method further includes step 305.

步骤305，终端基于反馈信息，获得待处理数据的第一运算结果。Step 305: The terminal obtains the first operation result of the data to be processed based on the feedback information.

其中，待处理数据可以来自神经网络模型对应的应用程序。待处理数据的类型在此不做限定。示例性地，该应用程序可以实现人脸识别功能，此时，应用程序中的待处理数据为待处理图像。或者，该应用程序可以实现文本翻译功能，此时，应用程序中的待处理数据为文字信息。Among them, the data to be processed can come from the application program corresponding to the neural network model. The type of data to be processed is not limited here. For example, the application can implement the face recognition function. At this time, the data to be processed in the application is the image to be processed. Alternatively, the application can implement a text translation function, in which case the data to be processed in the application is text information.

本申请实施例中，可以基于神经网络模型中的每个算子对应的第一计算单元和第一运算方式，对相应算子进行编译，获得能够被对应的第一计算单元高效加载和运行的二进制文件。这样，在获得待处理数据时，可以根据神经网络模型的结构，确定各个第一计算单元的运行时序，并且基于该运行时序，通过各个第一计算单元加载和运行相应算子的二进制文件，以获得针对待处理数据的第一运行结果。In the embodiment of the present application, the corresponding operator can be compiled based on the first computing unit and the first computing method corresponding to each operator in the neural network model to obtain an algorithm that can be efficiently loaded and run by the corresponding first computing unit. binary file. In this way, when obtaining the data to be processed, the running timing of each first computing unit can be determined according to the structure of the neural network model, and based on the running timing, the binary file of the corresponding operator is loaded and run through each first computing unit to Obtain the first running result for the data to be processed.

在一些示例中，在获得每个算子的能够被对应的第一计算单元高效加载和运行的二进制文件之后，终端可以存储该二进制文件，从而在后续多次接收到对各组待处理数据的处理指令时，便于第一计算单元高效加载和运行。并且，该二进制文件在存储于终端之后，不仅可以在神经网络模型对应的应用程序的本次运行过程中根据需要而反复加载和运行，还可以在该应用程序的下一次乃至后续的多次运行过程中，反复加载和运行。In some examples, after obtaining a binary file of each operator that can be efficiently loaded and run by the corresponding first computing unit, the terminal can store the binary file, thereby receiving multiple subsequent requests for each set of data to be processed. When processing instructions, it is convenient for the first computing unit to load and run efficiently. Moreover, after the binary file is stored in the terminal, it can not only be loaded and run repeatedly as needed during the current running of the application corresponding to the neural network model, but can also be run in the next or subsequent runs of the application. During the process, it is loaded and run repeatedly.

而在另一些示例中，在获得每个算子的能够被对应的第一计算单元高效加载和运行的二进制文件之后，由于该二进制文件的大小较大，或者终端中的存储资源较少，终端只在神经网络模型对应的应用程序的本次运行过程中通过内存等方式暂时存储该二进制文件，而在应用程序本次运行结束之后，从终端中删除该二进制文件。在这一示例中，终端可以在每次启动应用程序时，均触发向服务器发送第一请求的操作以及后续操作，以在每次启动应用程序时，重新确定每个神经网络模型中的算子对应的第一运算单元以及相应的第一运行方式，并获得相应的二进制文件以用于第一计算单元进行加载和运行。In other examples, after obtaining the binary file of each operator that can be efficiently loaded and run by the corresponding first computing unit, due to the large size of the binary file or the small storage resources in the terminal, the terminal The binary file is only temporarily stored in the memory during the current running of the application corresponding to the neural network model, and the binary file is deleted from the terminal after the current running of the application. In this example, the terminal can trigger the operation of sending the first request to the server every time the application is started, and subsequent operations to re-determine the operators in each neural network model every time the application is started. The corresponding first computing unit and the corresponding first operating mode are obtained, and the corresponding binary file is obtained for loading and running by the first computing unit.

本申请实施例中，服务器可以与多个终端进行信息交互，从而分别为多个终端的神经网络模型进行性能优化。In this embodiment of the present application, the server can interact with multiple terminals to perform performance optimization for the neural network models of multiple terminals.

举例来说，如图4所示，为服务器与多个终端进行信息交互的一种示例性示意图。For example, as shown in Figure 4, it is an exemplary schematic diagram of information interaction between a server and multiple terminals.

其中，服务器40可以分别与终端41、终端42和终端43进行信息交互，以分别为终端41、终端42和终端43提供模型性能优化功能。其中，终端41中的应用程序1的神经网络模型10中的算子可以分别通过CPU411、GPU412以及NPU413进行计算，终端42中的应用程序2的神经网络模型20中的算子可以分别通过CPU421和GPU422进行计算，而终端43中的应用程序3的神经网络模型30中的算子可以分别通过CPU431和NPU432进行计算。The server 40 can interact with the terminal 41, the terminal 42 and the terminal 43 respectively to provide model performance optimization functions for the terminal 41, the terminal 42 and the terminal 43 respectively. Among them, the operators in the neural network model 10 of the application program 1 in the terminal 41 can be calculated by the CPU 411, GPU 412 and NPU 413 respectively, and the operators in the neural network model 20 of the application program 2 in the terminal 42 can be calculated by the CPU 421 and NPU 413 respectively. The GPU 422 performs calculations, and the operators in the neural network model 30 of the application program 3 in the terminal 43 can be calculated by the CPU 431 and the NPU 432 respectively.

可见，通过本申请实施例，可以通过服务器高效快捷地为多个终端各自对应的神经网络模型分别提供较优的运行方式，优化效率较高。It can be seen that through the embodiments of this application, the server can efficiently and quickly provide optimal operating modes for the corresponding neural network models of multiple terminals, and the optimization efficiency is high.

在一些实施例中，如图5所示，在上述步骤302之后，该模型性能优化方法还包括：In some embodiments, as shown in Figure 5, after the above step 302, the model performance optimization method further includes:

步骤503，终端在第一时间内，未接收到反馈信息，或者，反馈信息不能用于确定神经网络模型的多个算子中的每个算子的第一运算方式，则基于预设的神经网络模型的多个算子中的每个算子的第二运算方式，获得待处理数据的第二运算结果。Step 503: If the terminal does not receive feedback information within the first time, or the feedback information cannot be used to determine the first operation mode of each operator among the multiple operators of the neural network model, then the terminal performs the operation based on the preset neural network model. The second operation mode of each operator in the network model obtains the second operation result of the data to be processed.

其中，神经网络模型的多个算子中的每个算子的第二运算方式具体可以指：神经网络模型的多个算子中的每个算子在对应的第二计算单元的第二运算方式。Wherein, the second operation method of each operator among the multiple operators of the neural network model may specifically refer to: the second operation method of each operator among the multiple operators of the neural network model in the corresponding second calculation unit. Way.

第二计算单元为终端的计算单元中的一个或多个。当第二计算单元为终端的计算单元中的多个时，各个第二计算单元的类型可以相同，也可以不同，并且各个第二计算单元的硬件配置可以相同，也可以不同。The second computing unit is one or more of the computing units of the terminal. When the second computing unit is a plurality of computing units of the terminal, the type of each second computing unit may be the same or different, and the hardware configuration of each second computing unit may be the same or different.

可以理解的是，本申请实施例中的第二计算单元和第一计算单元均属于终端的计算单元，第一计算单元的确定策略与第二计算单元的确定策略通常相互独立，因此，第一计算单元和第二计算单元可以相同，也可以不同，第一计算单元对应的第一运算方式与第二计算单元对应的第二运算方式可以相同，也可以不同，在此不做限定。It can be understood that the second computing unit and the first computing unit in the embodiment of the present application both belong to the computing unit of the terminal. The determination strategy of the first computing unit and the determination strategy of the second computing unit are usually independent of each other. Therefore, the first computing unit The calculation unit and the second calculation unit may be the same or different. The first calculation method corresponding to the first calculation unit and the second calculation method corresponding to the second calculation unit may be the same or different, and are not limited here.

其中，终端未接收到反馈信息的原因可能有多种，例如可能是终端与服务器之间的通信连接出现故障，也可能是服务器未发送该反馈信息。There may be many reasons why the terminal does not receive the feedback information. For example, the communication connection between the terminal and the server may fail, or the server may not send the feedback information.

终端在第一时间内接收到的反馈信息不能用于确定神经网络模型的多个算子中的每个算子对应的第一运算方式，可能是终端与服务器之间的通信连接的传输速度较慢，导致终端未能完全接收服务器发送的反馈信息，也可能是服务器发送的反馈信息不能用于确定神经网络模型的多个算子中的每个算子对应的第一运算方式。The feedback information received by the terminal within the first time cannot be used to determine the first operation mode corresponding to each of the multiple operators of the neural network model. It may be that the transmission speed of the communication connection between the terminal and the server is relatively slow. Slow, causing the terminal to fail to fully receive the feedback information sent by the server, or it may be that the feedback information sent by the server cannot be used to determine the first operation mode corresponding to each of the multiple operators of the neural network model.

服务器发送的反馈信息不能用于确定神经网络模型的多个算子中的每个算子的第一运算方式可以是该反馈信息不能支持终端确定出神经网络模型中的全部算子对应的第一运算方式。The feedback information sent by the server cannot be used to determine the first operation method of each operator in the neural network model. The feedback information cannot support the terminal to determine the first operation method corresponding to all operators in the neural network model. Operation method.

针对神经网络模型中的某一算子，该反馈信息不能用于确定该算子对应的第一运算方式可以是该反馈信息中不包含该算子的信息，或者，可以是该反馈信息中包含的该算子的算子性能信息不完整，例如，缺少该算子关于终端中的某些计算单元和/或某些运算方式的性能信息等。此时，可能是服务器未查询到神经网络模型的多个算子所对应的完整的算子性能信息，也可能是终端未在第一时间内接收到服务器发送的完整信息。For a certain operator in the neural network model, the feedback information cannot be used to determine the first operation method corresponding to the operator. The feedback information may not contain information about the operator, or the feedback information may contain The operator performance information of the operator is incomplete, for example, the operator's performance information about certain computing units and/or certain computing methods in the terminal is missing, etc. At this time, it may be that the server has not queried the complete operator performance information corresponding to multiple operators of the neural network model, or it may be that the terminal has not received the complete information sent by the server within the first time.

本申请实施例中，神经网络模型的多个算子中的每个算子对应的第二运算方式的信息可以是终端中预先存储的，也可以是终端基于预设的运算策略，根据神经网络模型的算子信息和神经网络模型的运行环境信息而确定的。该预设的运算策略在此不做限制。In the embodiment of the present application, the information about the second operation mode corresponding to each of the multiple operators of the neural network model may be pre-stored in the terminal, or the terminal may calculate the operation method according to the neural network based on a preset operation strategy. It is determined by the operator information of the model and the operating environment information of the neural network model. The preset calculation strategy is not limited here.

通过本申请实施例，终端可以在未能从服务器获得针对神经网络模型的运行策略时，可以基于终端中已有的模型运行策略来确定神经网络模型的多个算子中的每个算子对应的第二运算方式，从而优化神经网络模型的运行性能。Through the embodiments of the present application, when the terminal fails to obtain the operation strategy for the neural network model from the server, it can determine the corresponding operator of each of the multiple operators of the neural network model based on the existing model operation strategy in the terminal. The second operation method, thereby optimizing the operating performance of the neural network model.

在下次载入神经网络模型时，可以重新执行向服务器发送第一请求的操作以及后续操作，以再次尝试对该神经网络模型进行性能优化。The next time the neural network model is loaded, the operation of sending the first request to the server and subsequent operations can be re-executed to try to optimize the performance of the neural network model again.

在一些实施例中，服务器存储有索引信息与算子性能信息之间的对应关系的信息，索引信息用于与算子信息和运行环境信息进行匹配，算子性能信息用于在对应的索引信息与算子信息和运行环境信息匹配成功时，获得反馈信息。In some embodiments, the server stores information on the correspondence between index information and operator performance information. The index information is used to match the operator information and the running environment information. The operator performance information is used in the corresponding index information. When the operator information and running environment information are successfully matched, feedback information is obtained.

在一种示例中，该索引信息与算子性能信息之间的对应关系可以以关系列表的形式进行存储。当该对应关系也可以有其他存储形式，在此不做限定。In one example, the correspondence between the index information and the operator performance information can be stored in the form of a relationship list. The corresponding relationship can also have other storage forms, which are not limited here.

下面以关系列表为例，对服务器中所存储的信息进行介绍。The following uses the relationship list as an example to introduce the information stored in the server.

本示例中，关系列表用于指示索引信息与算子性能信息之间的对应关系。通常来说，关系列表中的各个索引信息与各个算子性能信息一一对应。一个索引信息和对应的算子性能信息可以组成一组对应关系。关系列表中可以包括一组或多组对应关系。In this example, the relationship list is used to indicate the correspondence between index information and operator performance information. Generally speaking, each index information in the relationship list corresponds to each operator performance information one-to-one. An index information and corresponding operator performance information can form a set of correspondence relationships. The relationship list can include one or more sets of corresponding relationships.

示例性地，索引信息可以包括对应的算子的参考算子信息和参考运行环境信息，该参考算子信息和参考运行环境信息可以用于与第一请求中的算子信息和运行环境信息相匹配。For example, the index information may include reference operator information and reference running environment information of the corresponding operator. The reference operator information and reference running environment information may be used to compare the operator information and the running environment information in the first request. match.

参考算子信息包括以下信息中的一种或多种：Reference operator information includes one or more of the following information:

算子类型、算子的权重信息、一个或多个算子参数信息(例如：算子的输入数据的维度、输出数据的维度、卷积算子中卷积核的大小等)、算子之间的关联关系信息。Operator type, operator weight information, one or more operator parameter information (for example: the dimensions of the operator’s input data, the dimensions of the output data, the size of the convolution kernel in the convolution operator, etc.), the relationship information between them.

参考运行环境信息包括参考硬件运行环境信息和参考软件运行环境信息。The reference operating environment information includes reference hardware operating environment information and reference software operating environment information.

参考硬件运行环境信息包括以下信息中的一种或多种：Reference hardware operating environment information includes one or more of the following information:

芯片的供应商、芯片型号、芯片所包含的计算单元的类型、计算单元的型号、性能参数(例如计算单元的运行频率)、计算单元对算子的支持情况。The supplier of the chip, the chip model, the type of computing unit contained in the chip, the model of the computing unit, performance parameters (such as the operating frequency of the computing unit), and the computing unit's support for operators.

参考软件运行环境信息包括以下信息中的一种或多种：Reference software running environment information includes one or more of the following information:

神经网络模型对应的应用程序的信息(例如应用程序的名称、标识和/或版本号)、AI推理框架的信息(例如AI推理框架的名称、版本号)。Information about the application program corresponding to the neural network model (such as the name, identification and/or version number of the application program), and information about the AI inference framework (such as the name and version number of the AI inference framework).

当某一索引信息与第一请求中的算子信息和运行环境信息匹配成功时，该匹配成功的索引信息对应的算子性能信息可以用于获得反馈信息。其中，某一索引信息与第一请求中的算子信息和运行环境信息匹配成功可以是第一请求中的算子信息和运行环境信息在该索引信息中查找到相匹配的信息。When a certain index information successfully matches the operator information and running environment information in the first request, the operator performance information corresponding to the successfully matched index information can be used to obtain feedback information. The success of matching a certain index information with the operator information and running environment information in the first request may be that matching information is found in the index information between the operator information and the running environment information in the first request.

而当关系列表中的所有索引信息均与第一请求中的算子信息和运行环境信息匹配失败时，可以认为服务器不能获得反馈信息。When all index information in the relationship list fails to match the operator information and running environment information in the first request, it can be considered that the server cannot obtain feedback information.

本申请实施例中，通过关系列表，服务器可以快速高效地查询第一请求所关联的神经网络模型中的算子的算子性能信息，并在查询到神经网络模型中的算子的算子性能信息时，获得反馈信息。In the embodiment of this application, through the relationship list, the server can quickly and efficiently query the operator performance information of the operator in the neural network model associated with the first request, and after querying the operator performance information of the operator in the neural network model When receiving information, get feedback information.

并且，服务器可以对服务器中存储的算子性能信息以及算子性能信息对应的索引信息进行维护和更新。Moreover, the server can maintain and update the operator performance information stored in the server and the index information corresponding to the operator performance information.

下面针对几种场景，具体介绍该关系列表的具体维护和更新的方式。The following is a detailed introduction to the specific maintenance and update methods of the relationship list for several scenarios.

1、在一种场景中，服务器针对第一请求在服务器中未查询到第一请求的神经网络模型对应的算子性能信息时，根据终端后续上传的第二性能信息对服务器中的相应信息进行更新。1. In one scenario, when the server fails to query the operator performance information corresponding to the neural network model of the first request in the server for the first request, it performs the corresponding information in the server based on the second performance information subsequently uploaded by the terminal. renew.

具体地，如图5所示，在一些实施例中，该方法还包括步骤504-507：Specifically, as shown in Figure 5, in some embodiments, the method also includes steps 504-507:

步骤504，若服务器未获得反馈信息，则存储神经网络模型的算子信息和运行环境信息。Step 504: If the server does not obtain feedback information, it stores the operator information and operating environment information of the neural network model.

服务器存储神经网络模型的算子信息和运行环境信息，从而指示服务器缺失相关的算子性能信息。The server stores the operator information and running environment information of the neural network model, thereby indicating that the server is missing relevant operator performance information.

服务器存储神经网络模型的算子信息和运行环境信息，可以便于服务器对于所缺失的相关算子性能信息进行维护和更新，例如，可以在后续基于终端发送的第二性能信息以及相关的索引信息进行补充，还可以人为补充这部分算子性能信息以及对应的索引信息。The server stores the operator information and operating environment information of the neural network model, which can facilitate the server to maintain and update the missing relevant operator performance information. For example, it can be subsequently performed based on the second performance information and related index information sent by the terminal. In addition, you can also manually supplement this part of operator performance information and corresponding index information.

步骤505，终端在获得第二运算结果之后，向服务器发送第二性能信息。Step 505: After obtaining the second operation result, the terminal sends the second performance information to the server.

第二性能信息用于指示至少一个算子关于相应第二运算方式的运行性能。The second performance information is used to indicate the operating performance of at least one operator with respect to the corresponding second operation mode.

步骤506，服务器接收终端发送的第二性能信息。Step 506: The server receives the second performance information sent by the terminal.

步骤507，服务器根据第二性能信息、神经网络模型的算子信息和运行环境信息，更新服务器。Step 507: The server updates the server based on the second performance information, the operator information of the neural network model, and the operating environment information.

本申请实施例中，终端未能从服务器获得能够用于确定神经网络模型的多个算子中的每个算子对应的第一运算方式的反馈信息，此时，可能是服务器中缺少该神经网络模型对应的算子性能信息，或者服务器中该神经网络模型的算子性能不完整。因此，终端可以在获得第二运算结果之后，向服务器发送第二性能信息，该第二性能信息用于指示神经网络模型中的全部算子关于相应第二运算方式的运行性能，也可以仅指示神经网络模型中的部分算子关于相应第二运算方式的运行性能。第二性能信息可以为反馈信息所不涉及的算子性能信息，也即服务器可能缺少的算子性能信息。In the embodiment of the present application, the terminal fails to obtain feedback information from the server that can be used to determine the first operation mode corresponding to each of the multiple operators of the neural network model. At this time, it may be that the server lacks the neural network model. The operator performance information corresponding to the network model, or the operator performance of the neural network model in the server is incomplete. Therefore, after obtaining the second operation result, the terminal can send second performance information to the server. The second performance information is used to indicate the operating performance of all operators in the neural network model with respect to the corresponding second operation mode, or it can only indicate The operating performance of some operators in the neural network model with respect to the corresponding second operation mode. The second performance information may be operator performance information not involved in the feedback information, that is, operator performance information that the server may lack.

这样，服务器可以接收到来自终端的第二性能信息，以对服务器中所缺少的算子性能信息进行补充，从而使得服务器中的相应信息更为完备。具体地，可以根据神经网络模型的算子信息和运行环境信息获得索引信息，根据第二性能信息获得该索引信息对应的算子性能信息，然后将该索引信息和对应的算子性能信息作为一组对应关系添加于关系列表中。In this way, the server can receive the second performance information from the terminal to supplement the operator performance information missing in the server, thereby making the corresponding information in the server more complete. Specifically, the index information can be obtained according to the operator information and operating environment information of the neural network model, the operator performance information corresponding to the index information can be obtained according to the second performance information, and then the index information and the corresponding operator performance information can be used as one The group correspondence is added to the relationship list.

当然，在一些情况下，可能是终端与服务器之间的通信连接出现故障，导致终端不能接收到反馈信息。此时，终端可以不往服务器发送第二性能信息，也可以是终端向服务器发送该第二性能信息，但服务器不根据第二性能信息更新所存储的算子性能信息。Of course, in some cases, the communication connection between the terminal and the server may fail, causing the terminal to be unable to receive feedback information. At this time, the terminal may not send the second performance information to the server, or the terminal may send the second performance information to the server, but the server does not update the stored operator performance information based on the second performance information.

2、在另一种场景中，终端在实际运行时的算子性能信息与关系列表中对应的算子性能信息差异较大时，服务器可以基于终端实际运行时的算子性能信息对服务器中的相应算子性能信息进行更新。2. In another scenario, when the operator performance information when the terminal is actually running is significantly different from the corresponding operator performance information in the relationship list, the server can calculate the operator performance information in the server based on the operator performance information when the terminal is actually running. The corresponding operator performance information is updated.

具体地，在一些实施例中，反馈信息还携带有第一参考信息；Specifically, in some embodiments, the feedback information also carries first reference information;

如图6，在上述步骤305之后，该模型性能优化方法还包括步骤601-603：As shown in Figure 6, after the above step 305, the model performance optimization method also includes steps 601-603:

步骤601，终端在获得第一运算结果之后，当第一性能信息和第一参考信息之间的差异符合预设条件的情况下，向服务器发送第一性能信息。Step 601: After obtaining the first operation result, the terminal sends the first performance information to the server when the difference between the first performance information and the first reference information meets the preset conditions.

第一性能信息用于指示至少一个算子关于相应第一运算方式的运行性能。The first performance information is used to indicate the operating performance of at least one operator with respect to the corresponding first operation mode.

步骤602，服务器接收终端发送的第一性能信息。Step 602: The server receives the first performance information sent by the terminal.

步骤603，服务器基于第一性能信息，更新服务器中的相应算子性能信息。Step 603: The server updates the corresponding operator performance information in the server based on the first performance information.

本申请实施例中，第一参考信息可以指示相应算子对应的预期运行性能。该预期运行性能为以第一运算方式执行相应算子的运算操作时预计的运行性能。In this embodiment of the present application, the first reference information may indicate the expected operating performance corresponding to the corresponding operator. The expected operating performance is the expected operating performance when performing the operation operation of the corresponding operator in the first operation mode.

本申请实施例中，第一性能信息和第一参考信息之间的差异符合预设条件可以指示第一性能信息和第一参考信息之间的差异较大。In this embodiment of the present application, if the difference between the first performance information and the first reference information meets the preset condition, it may indicate that the difference between the first performance information and the first reference information is relatively large.

预设条件可以根据性能的描述方式来确定。Preset conditions can be determined based on how the performance is described.

例如，第一性能信息中，算子对应的运行性能通过对应的运行时长来描述，第一参考信息中，算子对应的预期运行性能也通过对应的运行时长来描述，则第一性能信息和第一参考信息之间的差异可以通过第一性能信息中的运行时长相对于第一参考信息中的运行时长的差值或者误差率来表示，若该差值或者误差率大于预设偏差阈值，则可以认为第一性能信息和第一参考信息之间的差异符合预设条件。For example, in the first performance information, the running performance corresponding to the operator is described by the corresponding running time, and in the first reference information, the expected running performance corresponding to the operator is also described by the corresponding running time, then the first performance information and The difference between the first reference information can be represented by the difference or error rate between the running time in the first performance information and the running time in the first reference information. If the difference or error rate is greater than the preset deviation threshold, Then it can be considered that the difference between the first performance information and the first reference information meets the preset conditions.

在实际场景中，终端在实际运行时对应的算子性能信息与关系列表中对应的算子性能信息差异较大时，可能是服务器中存储的相关算子性能信息存在较大误差，因此，终端向服务器发送第一性能信息，可以为服务器提供数据参考，及时纠正服务器中误差较大的算子性能信息。In actual scenarios, when the corresponding operator performance information of the terminal during actual operation is significantly different from the corresponding operator performance information in the relationship list, it may be that there is a large error in the relevant operator performance information stored in the server. Therefore, the terminal Sending the first performance information to the server can provide data reference for the server and promptly correct operator performance information with large errors in the server.

举例来说，第一参考信息中包含卷积算子在GPU中以第一频率进行运算时的参考运行时长为3毫秒，而终端中，在获得第一运算结果时，终端的GPU以第一频率对该卷积算子进行运算时的实际运行时长为2毫秒。此时，该实际运行时长相对于参考运行时长的误差率为50％，大于预设偏差阈值20％，可以认为该终端的GPU以第一频率对该卷积算子进行运算时的运行性能与第一参考信息的差异符合预设条件，因此，终端可以将GPU对该卷积算子的运行性能作为第一性能信息，并发送给服务器，再由服务器将服务器中存储的相应算子性能信息更改为2毫秒。For example, the first reference information includes a reference running time of 3 milliseconds when the convolution operator is operated in the GPU at the first frequency, and in the terminal, when the first operation result is obtained, the GPU of the terminal operates at the first frequency. The actual running time of frequency when operating this convolution operator is 2 milliseconds. At this time, the error rate of the actual running time relative to the reference running time is 50%, which is greater than the preset deviation threshold of 20%. It can be considered that the running performance of the terminal's GPU when operating the convolution operator at the first frequency is consistent with The difference in the first reference information meets the preset conditions. Therefore, the terminal can use the GPU's running performance of the convolution operator as the first performance information and send it to the server. The server then uses the corresponding operator performance information stored in the server. Change to 2 milliseconds.

在一些示例中，服务器在接收到第一性能信息之后，可以不马上基于第一性能信息更新服务器中的相应信息，而可以在服务器从多个终端接收到针对同一算子的实际性能信息时，再更新服务器中的相应算子性能信息，这样，可以通过多个终端对相关算子性能信息进行多次认证，并确定服务器中的相关算子性能信息确实误差较大时，才对服务器中的相应算子性能信息进行更新。In some examples, after receiving the first performance information, the server may not immediately update the corresponding information in the server based on the first performance information. Instead, when the server receives actual performance information for the same operator from multiple terminals, Then update the corresponding operator performance information in the server. In this way, the relevant operator performance information can be authenticated multiple times through multiple terminals, and only when it is determined that the relevant operator performance information in the server has indeed a large error, the operator performance information in the server will be verified. The corresponding operator performance information is updated.

3、可以手动更新服务器中的算子性能信息和相应的索引信息。3. You can manually update the operator performance information and corresponding index information in the server.

在终端中的芯片版本和/或软件版本发布更新版本、应用程序中的神经网络模型发布更新版本、算子的运算方式等有新的优化方式等情况下，服务器中的算子性能数据已无法满足终端的需要，此时，可以通过维护人员对服务器中存储的算子性能数据以及对应的索引信息进行更新。其中，可以根据待更新的情况，构建测试用例并进行算子性能测试，采集新的算子性能信息，从而根据采集到的算子性能信息更新服务器中的相关信息。When the chip version and/or software version in the terminal is updated, the neural network model in the application is updated, the operator's calculation method has new optimization methods, etc., the operator performance data in the server is no longer available. To meet the needs of the terminal, at this time, the operator performance data and corresponding index information stored in the server can be updated by maintenance personnel. Among them, according to the situation to be updated, test cases can be constructed and operator performance tests can be performed to collect new operator performance information, thereby updating relevant information in the server based on the collected operator performance information.

基于上述模型性能优化方法的任一实施例，下面对模型性能优化方法所涉及的终端的内部处理流程进行具体介绍。Based on any embodiment of the above model performance optimization method, the following is a detailed introduction to the internal processing flow of the terminal involved in the model performance optimization method.

在一些实施例中，终端中安装有应用程序和目标软件开发工具包(softwaredevelopment kit，SDK)，应用程序与神经网络模型对应向服务器发送第一请求，包括：In some embodiments, an application program and a target software development kit (SDK) are installed in the terminal, and the application program sends a first request to the server corresponding to the neural network model, including:

通过应用程序向目标SDK发送第二请求，第二请求包含神经网络模型的信息；Send a second request to the target SDK through the application, and the second request contains information about the neural network model;

当目标SDK中包含目标配置信息的情况下，基于目标配置信息和第二请求生成第一请求，且目标SDK向服务器发送第一请求，目标配置信息用于指示终端能够向服务器发送的关于神经网络模型的信息内容。When the target SDK contains target configuration information, the first request is generated based on the target configuration information and the second request, and the target SDK sends the first request to the server. The target configuration information is used to indicate the neural network information that the terminal can send to the server. The information content of the model.

如图7所示，为终端中的应用程序、目标软件开发工具包SDK与服务器的信息交互流程的一种示例性示意图。As shown in Figure 7, it is an exemplary schematic diagram of the information interaction process between the application program in the terminal, the target software development tool kit SDK, and the server.

本申请实施例中，目标配置信息用于指示已开启通过端云交互来实现模型性能优化的功能。In the embodiment of this application, the target configuration information is used to indicate that the function of optimizing model performance through device-cloud interaction has been enabled.

其中，应用程序可以通过目标SDK，开启上述任一实施例中基于端云交互来实现的模型性能优化功能，此时，目标SDK可以获得目标配置信息。Among them, the application can enable the model performance optimization function based on device-cloud interaction in any of the above embodiments through the target SDK. At this time, the target SDK can obtain the target configuration information.

在开启基于端云交互的模型性能优化功能之后，应用程序可以通过目标SDK提供的API，向目标SDK发送第二请求，该第二请求中包括神经网络模型的信息。After turning on the model performance optimization function based on device-cloud interaction, the application can send a second request to the target SDK through the API provided by the target SDK. The second request includes the information of the neural network model.

目标SDK基于目标配置信息以及神经网络模型的信息，生成第一请求，并可以通过目标SDK向服务器发送该第一请求。然后，可以通过目标SDK接收服务器发送的反馈信息，从而使得终端根据反馈信息执行后续的操作。The target SDK generates the first request based on the target configuration information and the neural network model information, and can send the first request to the server through the target SDK. Then, the feedback information sent by the server can be received through the target SDK, so that the terminal can perform subsequent operations based on the feedback information.

本申请实施例中，可以通过该目标SDK，为终端中的各个不同的应用程序提供与服务器进行交互以获得优化后的神经网络模型的功能，而不需要在开发阶段在各个不同应用程序中设置与服务器进行交互的相关功能，提升了应用程序的开发效率，并且便于终端执行与服务器进行交互的相关操作。In the embodiment of this application, the target SDK can be used to provide different applications in the terminal with the function of interacting with the server to obtain an optimized neural network model, without the need to set settings in different applications during the development stage. The related functions of interacting with the server improve the efficiency of application development and facilitate the terminal to perform related operations of interacting with the server.

以上，本申请实施例从多个方面介绍了模型性能优化方法，下面结合附图，介绍本申请的应用于终端的模型性能优化装置，以及应用于服务器的模型性能优化装置。As above, the embodiments of the present application have introduced the model performance optimization method from many aspects. The following describes the model performance optimization device of the present application applied to the terminal and the model performance optimization device of the present application with reference to the accompanying drawings.

如图8所示，本申请实施例提供一种模型性能优化装置80，该装置80可以应用于上述实施例中的终端。As shown in Figure 8, this embodiment of the present application provides a model performance optimization device 80, which can be applied to the terminal in the above embodiment.

该装置80的一实施例包括：An embodiment of the device 80 includes:

发送模块801，用于向服务器发送第一请求，第一请求携带有神经网络模型的算子信息和神经网络模型的运行环境信息；The sending module 801 is used to send a first request to the server, where the first request carries operator information of the neural network model and operating environment information of the neural network model;

接收模块802，用于在第一时间内，接收服务器发送的针对第一请求的反馈信息，反馈信息能够用于确定神经网络模型的多个算子中的每个算子的第一运算方式。The receiving module 802 is configured to receive feedback information for the first request sent by the server within the first time. The feedback information can be used to determine the first operation mode of each operator among the multiple operators of the neural network model.

可选地，反馈信息还携带有第一参考信息；Optionally, the feedback information also carries first reference information;

发送模块801还用于：The sending module 801 is also used for:

在获得第一运算结果之后，当第一性能信息和第一参考信息之间的差异符合预设条件的情况下，向服务器发送第一性能信息，其中，第一性能信息用于指示至少一个算子关于相应第一运算方式的运行性能。After obtaining the first calculation result, when the difference between the first performance information and the first reference information meets the preset condition, the first performance information is sent to the server, where the first performance information is used to indicate at least one calculation result. sub-relationship to the operating performance of the corresponding first operation mode.

可选地，装置80还包括处理模块803；Optionally, the device 80 also includes a processing module 803;

处理模块803用于：Processing module 803 is used for:

在第一时间内，未接收到反馈信息，或者，反馈信息不能用于确定神经网络模型的多个算子中的每个算子的第一运算方式，则基于预设的神经网络模型的多个算子中的每个算子的第二运算方式，获得待处理数据的第二运算结果。Within the first time, no feedback information is received, or the feedback information cannot be used to determine the first operation mode of each of the multiple operators of the neural network model, then the multiple operators based on the preset neural network model The second operation mode of each operator in the operators obtains the second operation result of the data to be processed.

可选地，发送模块801还用于：Optionally, the sending module 801 is also used to:

在获得第二运算结果之后，向服务器发送第二性能信息，第二性能信息用于指示至少一个算子关于相应第二运算方式的运行性能。After obtaining the second operation result, second performance information is sent to the server, and the second performance information is used to indicate the operating performance of at least one operator with respect to the corresponding second operation mode.

可选地，终端中安装有应用程序和目标软件开发工具包SDK，应用程序与神经网络模型对应；Optionally, an application program and a target software development kit SDK are installed in the terminal, and the application program corresponds to the neural network model;

发送模块801用于：Send module 801 is used for:

如图9所示，本申请实施例提供一种模型性能优化装置90，该装置90可以应用于上述实施例中的服务器。As shown in Figure 9, this embodiment of the present application provides a model performance optimization device 90, which can be applied to the server in the above embodiment.

该装置90的一实施例包括：An embodiment of the device 90 includes:

接收模块901，用于接收终端发送的第一请求，第一请求携带有神经网络模型的算子信息和神经网络模型的运行环境信息；The receiving module 901 is configured to receive the first request sent by the terminal, where the first request carries operator information of the neural network model and operating environment information of the neural network model;

发送模块902，用于向终端发送反馈信息，反馈信息基于算子信息和运行环境信息而得到，并且，反馈信息能够用于确定神经网络模型的多个算子中的每个算子的第一运算方式。The sending module 902 is used to send feedback information to the terminal. The feedback information is obtained based on the operator information and the running environment information, and the feedback information can be used to determine the first value of each operator among the multiple operators of the neural network model. Operation method.

可选地，装置90还包括更新模块903；Optionally, the device 90 also includes an update module 903;

接收模块901还用于接收终端发送的第一性能信息，第一性能信息和第一参考信息之间的差异符合预设条件，反馈信息中携带有第一参考信息；The receiving module 901 is also used to receive the first performance information sent by the terminal, the difference between the first performance information and the first reference information meets the preset conditions, and the feedback information carries the first reference information;

更新模块903用于基于第一性能信息，更新服务器中的相应算子性能信息。The update module 903 is used to update the corresponding operator performance information in the server based on the first performance information.

可选地，装置90还包括存储模块904；Optionally, the device 90 further includes a storage module 904;

存储模块904用于若服务器未获得反馈信息，则存储神经网络模型的算子信息和运行环境信息。The storage module 904 is used to store the operator information and running environment information of the neural network model if the server does not obtain feedback information.

可选地，接收模块901还用于接收终端发送的第二性能信息，第二性能信息用于指示至少一个算子关于相应第二运算方式的运行性能；Optionally, the receiving module 901 is also configured to receive second performance information sent by the terminal, where the second performance information is used to indicate the operating performance of at least one operator with respect to the corresponding second operation mode;

更新模块903用于根据第二性能信息、神经网络模型的算子信息和运行环境信息，更新服务器。The update module 903 is used to update the server according to the second performance information, the operator information of the neural network model and the running environment information.

图10所示，是本申请实施例提供的终端100的一种可能的逻辑结构示意图。该终端100用于实现上述任一实施例中所涉及的终端的功能。该终端100包括：存储器1001、处理器1002、通信接口1003以及总线1004。其中，存储器1001、处理器1002、通信接口1003通过总线1004实现彼此之间的通信连接。Figure 10 is a schematic diagram of a possible logical structure of the terminal 100 provided by the embodiment of the present application. The terminal 100 is used to implement the functions of the terminal involved in any of the above embodiments. The terminal 100 includes: a memory 1001, a processor 1002, a communication interface 1003 and a bus 1004. Among them, the memory 1001, the processor 1002, and the communication interface 1003 implement communication connections between each other through the bus 1004.

存储器1001可以是只读存储器(read only memory，ROM)、静态存储设备、动态存储设备或者随机存取存储器(random access memory，RAM)。存储器1001可以存储程序，当存储器1001中存储的程序被处理器1002执行时，处理器1002和通信接口1003用于执行上述的模型性能优化方法实施例的步骤301、304-305、503、505、601等。The memory 1001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 1001 can store programs. When the program stored in the memory 1001 is executed by the processor 1002, the processor 1002 and the communication interface 1003 are used to perform steps 301, 304-305, 503, 505, 601 etc.

处理器1002可以采用中央处理器(central processing unit，CPU)、微处理器、应用专用集成电路(application specific integrated circuit，ASIC)、图形处理器(graphics processing unit，GPU)、数字信号处理器(digital signal processing，DSP)、现成可编程门阵列(field programmable gate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件或者其任意组合，用于执行相关程序，以实现上述实施例中应用于终端的模型性能优化装置中的发送模块、接收模块以及处理模块等所需执行的功能，或者执行本申请方法实施例的模型性能优化方法实施例的步骤301、304-305、503、505、601等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1001，处理器1002读取存储器1001中的信息，结合其硬件执行上述的模型性能优化方法实施例的步骤301、304-305、503、505、601等。The processor 1002 may be a central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or a digital signal processor (Digital Signal Processor). signal processing (DSP), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components or any combination thereof, for executing relevant programs to achieve the above In the embodiment, the sending module, the receiving module and the processing module in the model performance optimization device applied to the terminal need to perform functions, or perform steps 301, 304-305, 304-305, etc. of the model performance optimization method embodiment of the method embodiment of the present application. 503, 505, 601, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 1001. The processor 1002 reads the information in the memory 1001, and performs steps 301, 304-305, 503, 505, 601, etc. of the above-mentioned model performance optimization method embodiment in conjunction with its hardware.

通信接口1003使用例如但不限于收发器一类的收发装置，来实现终端100与其他设备或通信网络之间的通信。例如，可以通过通信接口1003与服务器进行信息交互。The communication interface 1003 uses a transceiver device such as but not limited to a transceiver to implement communication between the terminal 100 and other devices or communication networks. For example, information can be exchanged with the server through the communication interface 1003.

总线1004可实现在终端100各个部件(例如，存储器1001、处理器1002以及通信接口1003)之间传送信息的通路。总线1004可以是外设部件互连标准(Peripheral ComponentInterconnect，PCI)总线或扩展工业标准结构(Extended Industry StandardArchitecture，EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示，图10中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The bus 1004 can implement a path for transmitting information between various components of the terminal 100 (eg, the memory 1001, the processor 1002, and the communication interface 1003). The bus 1004 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 10, but it does not mean that there is only one bus or one type of bus.

在本申请的另一实施例中，还提供一种计算机可读存储介质，计算机可读存储介质中存储有计算机执行指令，当设备的处理器执行该计算机执行指令时，设备执行上述图10中的处理器所执行的步骤。In another embodiment of the present application, a computer-readable storage medium is also provided. Computer-executable instructions are stored in the computer-readable storage medium. When the processor of the device executes the computer-executed instructions, the device executes the above-mentioned steps in Figure 10 The steps performed by the processor.

在本申请的另一实施例中，还提供一种计算机程序产品，该计算机程序产品包括计算机执行指令，该计算机执行指令存储在计算机可读存储介质中；当设备的处理器执行该计算机执行指令时，设备执行上述图10中的处理器所执行的步骤。In another embodiment of the present application, a computer program product is also provided. The computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; when the processor of the device executes the computer-executed instructions When , the device performs the steps performed by the processor in Figure 10 above.

在本申请的另一实施例中，还提供一种芯片系统，该芯片系统包括处理器，该处理器用于实现上述图10的处理器所执行的步骤。在一种可能的设计中，芯片系统还可以包括存储器，存储器，用于保存数据写入的装置必要的程序指令和数据。该芯片系统，可以由芯片构成，也可以包含芯片和其他分立器件。In another embodiment of the present application, a chip system is also provided. The chip system includes a processor, and the processor is configured to implement the steps performed by the processor in FIG. 10 . In a possible design, the chip system may also include a memory, a memory, a device for storing data writing, necessary program instructions and data. The chip system may be composed of chips, or may include chips and other discrete devices.

图11所示，是本申请实施例提供的服务器110的一种可能的逻辑结构示意图。该服务器110用于实现上述任一实施例中所涉及的服务器的功能。该服务器110包括：存储器1101、处理器1102、通信接口1103以及总线1104。其中，存储器1101、处理器1102、通信接口1103通过总线1104实现彼此之间的通信连接。Figure 11 is a schematic diagram of a possible logical structure of the server 110 provided by the embodiment of the present application. The server 110 is used to implement the functions of the server involved in any of the above embodiments. The server 110 includes: a memory 1101, a processor 1102, a communication interface 1103 and a bus 1104. Among them, the memory 1101, the processor 1102, and the communication interface 1103 implement communication connections between each other through the bus 1104.

存储器1101可以是只读存储器(read-only memory，ROM)、静态存储设备、动态存储设备或者随机存取存储器(random access memory，RAM)。存储器1101可以存储程序，当存储器1101中存储的程序被处理器1102执行时，处理器1102和通信接口1103用于执行上述的模型性能优化方法实施例的步骤302-303、504、506-507、602-603等。The memory 1101 may be a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 1101 can store programs. When the program stored in the memory 1101 is executed by the processor 1102, the processor 1102 and the communication interface 1103 are used to perform steps 302-303, 504, 506-507, 602-603 etc.

处理器1102可以采用中央处理器(central processing unit，CPU)、微处理器、特定应用集成电路(application-specific integrated circuit，ASIC)、图形处理器(graphics processing unit，GPU)、数字信号处理器(digital signal processor，DSP)、现成可编程门阵列(field-programmable gate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件或者其任意组合，用于执行相关程序，以实现上述实施例应用于服务器的模型性能优化装置中的接收模块、发送模块、存储模块以及更新模块等所需执行的功能，或者执行本申请方法实施例的模型性能优化方法实施例的步骤302-303、504、506-507、602-603等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1101，处理器1102读取存储器1101中的信息，结合其硬件执行上述的模型性能优化方法实施例的步骤302-303、504、506-507、602-603等。The processor 1102 may adopt a central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), a graphics processing unit (GPU), a digital signal processor ( digital signal processor (DSP), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or any combination thereof, for executing relevant programs to Realize the functions required to be performed by the receiving module, sending module, storage module, update module, etc. in the model performance optimization device of the server in the above embodiment, or perform step 302- of the model performance optimization method embodiment of the method embodiment of the present application. 303, 504, 506-507, 602-603, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 1101. The processor 1102 reads the information in the memory 1101 and performs steps 302-303, 504, 506-507, 602-603, etc. of the above-mentioned model performance optimization method embodiment in conjunction with its hardware.

通信接口1103使用例如但不限于收发器一类的收发装置，来实现服务器110与其他设备或通信网络之间的通信。例如，可以与上述任一实施例中所涉及的终端进行信息交互。The communication interface 1103 uses a transceiver device such as but not limited to a transceiver to implement communication between the server 110 and other devices or communication networks. For example, information interaction can be performed with the terminal involved in any of the above embodiments.

总线1104可实现在服务器110各个部件(例如，存储器1101、处理器1102以及通信接口1103)之间传送信息的通路。总线1104可以是外设部件互连标准(PeripheralComponent Interconnect，PCI)总线或扩展工业标准结构(Extended Industry StandardArchitecture，EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示，图11中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。Bus 1104 may implement a path for transmitting information between various components of server 110 (eg, memory 1101, processor 1102, and communication interface 1103). The bus 1104 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 11, but it does not mean that there is only one bus or one type of bus.

在本申请的另一实施例中，还提供一种计算机可读存储介质，计算机可读存储介质中存储有计算机执行指令，当设备的处理器执行该计算机执行指令时，设备执行上述图11中的处理器所执行的步骤。In another embodiment of the present application, a computer-readable storage medium is also provided. Computer-executable instructions are stored in the computer-readable storage medium. When the processor of the device executes the computer-executed instructions, the device executes the above-mentioned steps in Figure 11 The steps performed by the processor.

在本申请的另一实施例中，还提供一种计算机程序产品，该计算机程序产品包括计算机执行指令，该计算机执行指令存储在计算机可读存储介质中；当设备的处理器执行该计算机执行指令时，设备执行上述图11中的处理器所执行的步骤。In another embodiment of the present application, a computer program product is also provided. The computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; when the processor of the device executes the computer-executed instructions When , the device performs the steps performed by the processor in Figure 11 above.

在本申请的另一实施例中，还提供一种芯片系统，该芯片系统包括处理器，该处理器用于实现上述图11的处理器所执行的步骤。在一种可能的设计中，芯片系统还可以包括存储器，存储器，用于保存数据写入的装置必要的程序指令和数据。该芯片系统，可以由芯片构成，也可以包含芯片和其他分立器件。In another embodiment of the present application, a chip system is also provided. The chip system includes a processor, and the processor is configured to implement the steps performed by the processor in FIG. 11 . In a possible design, the chip system may also include a memory, a memory, a device for storing data writing, necessary program instructions and data. The chip system may be composed of chips, or may include chips and other discrete devices.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请实施例的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professionals and technicians may use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of the embodiments of the present application.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

在本申请实施例所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in the embodiments of this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。A unit described as a separate component may or may not be physically separate. A component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.

功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请实施例各个实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(read-only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。Functions may be stored in a computer-readable storage medium when implemented in the form of software functional units and sold or used as independent products. Based on this understanding, the technical solutions of the embodiments of the present application are essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .

以上，仅为本申请实施例的具体实施方式，但本申请实施例的保护范围并不局限于此。The above are only specific implementation modes of the embodiments of the present application, but the protection scope of the embodiments of the present application is not limited thereto.

Claims

1. A model performance optimization method, characterized in that, applied to a terminal, the method includes:

Send a first request to the server, where the first request carries operator information of the neural network model and operating environment information of the neural network model;

Within the first time, receive feedback information for the first request sent by the server, and the feedback information can be used to determine the first operation of each operator among multiple operators of the neural network model. Way.

2. The method of claim 1, further comprising: obtaining a first operation result of the data to be processed based on the feedback information.

3. The method according to claim 1 or 2, characterized in that the first operation mode of each operator in the plurality of operators includes the operation mode of the corresponding operator in the corresponding first calculation unit, so The first computing unit is one or more of the computing units of the terminal.

4. The method according to any one of claims 1-3, characterized in that the feedback information also carries first reference information;

The method also includes:

After obtaining the first operation result, when the difference between the first performance information and the first reference information meets a preset condition, the first performance information is sent to the server, wherein: The first performance information is used to indicate the operating performance of at least one of the operators with respect to the corresponding first operation mode.

5. The method according to any one of claims 1-4, characterized in that the method further includes:

Within the first time, if the feedback information is not received, or the feedback information cannot be used to determine the first operation mode of each operator among the multiple operators of the neural network model, then based on the preset The second operation mode of each operator in the neural network model is used to obtain the second operation result of the data to be processed.

6. The method of claim 5, further comprising:

After obtaining the second operation result, second performance information is sent to the server, where the second performance information is used to indicate the operating performance of at least one of the operators with respect to the corresponding second operation mode.

7. The method according to any one of claims 1 to 6, characterized in that an application program and a target software development kit SDK are installed in the terminal, and the application program corresponds to the neural network model;

The sending of the first request to the server includes:

Send a second request to the target SDK through the application program, where the second request contains information about the neural network model;

When the target SDK contains target configuration information, the first request is generated based on the target configuration information and the second request, and the target SDK sends the first request to the server, so The target configuration information is used to indicate the information content about the neural network model that the terminal can send to the server.

8. The method according to any one of claims 1 to 7, characterized in that the feedback information includes operator performance information and/or model operation strategy information for the neural network model, and the operator The performance information includes information on the operating performance of the corresponding operator with respect to specified parameters, and the model operation strategy includes a first operation method for each operator in the plurality of operators of the neural network model.

9. The method according to any one of claims 1 to 8, characterized in that the neural network model is not associated with the hardware of the terminal.

10. A model performance optimization method, characterized in that it is applied to a server, and the method includes:

Receive a first request sent by the terminal, where the first request carries operator information of the neural network model and operating environment information of the neural network model;

Send feedback information to the terminal, where the feedback information is obtained based on the operator information and the running environment information, and the feedback information can be used to determine each of the multiple operators of the neural network model. The first operation mode of an operator.

11. The method of claim 10, further comprising:

Receive the first performance information sent by the terminal, the difference between the first performance information and the first reference information meets a preset condition, and the feedback information carries the first reference information;

Based on the first performance information, corresponding operator performance information in the server is updated.

12. The method of claim 10, further comprising:

If the server does not obtain the feedback information, it stores the operator information and running environment information of the neural network model.

13. The method of claim 12, further comprising:

Receive second performance information sent by the terminal, where the second performance information is used to indicate the operating performance of at least one of the operators with respect to the corresponding second operation mode;

The server is updated according to the second performance information, the operator information and the running environment information of the neural network model.

14. A model performance optimization device, characterized in that it is applied to a terminal, and the device includes:

A sending module, configured to send a first request to the server, where the first request carries operator information of the neural network model and operating environment information of the neural network model;

A receiving module, configured to receive feedback information for the first request sent by the server within the first time, where the feedback information can be used to determine each of the multiple operators of the neural network model. The first mode of operation of sub.

15. The device according to claim 14, wherein the feedback information also carries first reference information;

The sending module is also used to:

16. The device according to claim 14 or 15, characterized in that the device further comprises a processing module;

The processing module is used for:

17. The device according to claim 16, characterized in that the sending module is also used to:

18. The device according to any one of claims 14 to 17, wherein an application program and a target software development kit SDK are installed in the terminal, and the application program corresponds to the neural network model;

The sending module is used for:

19. A model performance optimization device, characterized in that it is applied to a server, and the device includes:

A receiving module, configured to receive a first request sent by the terminal, where the first request carries operator information of the neural network model and operating environment information of the neural network model;

A sending module, configured to send feedback information to the terminal, where the feedback information is obtained based on the operator information and the running environment information, and the feedback information can be used to determine multiple parameters of the neural network model. The first operation mode of each operator in the operator.

20. The device according to claim 19, characterized in that the device further comprises an update module;

The receiving module is also configured to receive the first performance information sent by the terminal, the difference between the first performance information and the first reference information meets a preset condition, and the feedback information carries the first reference information;

The update module is configured to update corresponding operator performance information in the server based on the first performance information.

21. The device according to claim 19, wherein the device further comprises a storage module;

The storage module is used to store operator information and running environment information of the neural network model if the server does not obtain the feedback information.

22. The device according to claim 21, wherein the device further comprises an update module;

The receiving module is also configured to receive second performance information sent by the terminal, where the second performance information is used to indicate the operating performance of at least one of the operators with respect to the corresponding second operation mode;

The update module is configured to update the server according to the second performance information, the operator information of the neural network model and the operating environment information.

23. A terminal, characterized in that the terminal includes at least one processor, a memory and instructions stored on the memory and executable by the at least one processor, and the at least one processor executes the instructions , to implement the steps of the method described in any one of claims 1-9.

24. A computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the method of any one of claims 1-9 is implemented.

25. A server, characterized in that the server includes at least one processor, a memory and instructions stored on the memory and executable by the at least one processor, and the at least one processor executes the instructions , to implement the steps of the method described in any one of claims 10-13.

26. A computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the method of any one of claims 10-13 is implemented.