CN115033434B

CN115033434B - Method and device for calculating kernel performance theoretical value and storage medium

Info

Publication number: CN115033434B
Application number: CN202210643762.8A
Authority: CN
Inventors: 李力昭; 倪怡芳
Original assignee: Hygon Information Technology Co Ltd
Current assignee: Hygon Information Technology Co Ltd
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2023-05-26
Anticipated expiration: 2042-06-07
Also published as: CN115033434A

Abstract

The embodiment of the invention discloses a method, a device and a storage medium for calculating a theoretical value of kernel performance, wherein the method comprises the steps of obtaining a test case set, wherein the test case set is used for simulating different application scenes when the theoretical value of kernel performance is calculated; analyzing the content of each test case in the test case set to obtain each group of metadata, wherein different groups of metadata are hardware operations and related data executed on the kernel under the application scene simulated by different test cases; for each set of metadata: and carrying out theoretical performance calculation on the kernel performance according to the kernel configuration requirements and the group of metadata. The method is suitable for the scene of theoretical analysis of the performance of the kernel in the verification process of the processor chip, and can realize the accuracy, the high efficiency and the flexibility of theoretical performance calculation.

Description

A method, device and storage medium for calculating theoretical value of core performance

技术领域technical field

本发明涉及系统性能验证领域，尤其涉及一种内核性能理论值计算方法、装置及存储介质。The invention relates to the field of system performance verification, in particular to a method, device and storage medium for calculating a theoretical value of kernel performance.

背景技术Background technique

相较于CPU(central processing unit,中央处理器)，GPGPU(General-purposecomputing on graphics processing unit，通用图形处理器)作为一种超大规模的可编程芯片其优势主要体现在极强的并行运算能力和数据吞吐。由此，则就要求从性能角度考虑对GPGPU内部的流水线进行精心的架构设计，而这种架构设计的有效性和可实现性是需要在芯片的前端开发阶段经由性能验证来检验的。Compared with CPU (central processing unit, central processing unit), GPGPU (General-purpose computing on graphics processing unit, general-purpose graphics processing unit), as a super-large-scale programmable chip, its advantages are mainly reflected in its extremely strong parallel computing capability and data throughput. Therefore, it is required to carefully design the internal pipeline of GPGPU from the performance point of view, and the effectiveness and feasibility of this architecture design need to be tested by performance verification in the front-end development stage of the chip.

性能验证与功能验证最大的不同在于：性能验证需要一套完整的性能预期及测量方法来判断验证的有效性，这套方法的质量是性能验证的核心组件。The biggest difference between performance verification and functional verification is that performance verification requires a complete set of performance expectations and measurement methods to judge the effectiveness of verification, and the quality of this set of methods is the core component of performance verification.

目前，在性能验证过程中理论值的计算是按照不同的测试向量的要求手工由不同的测试人员逐一填入表格中，然后经由脚本读取表格抽取预期值并提供给系统做后续处理。然而，该种理论值计算方式存在如下缺点：基于单一的测试向量，散乱不系统化且有大量重复无效的计算；开发人员水平参差不齐且表格过于冗长难于检查，导致很多疏漏和错误；自动化程度极低且不稳定，由于对表格的读取无法定位追踪到具体表格页位置，从而导致出错很难解决。At present, in the process of performance verification, the calculation of the theoretical value is manually filled in the form by different testers one by one according to the requirements of different test vectors, and then read the form through the script to extract the expected value and provide it to the system for subsequent processing. However, this theoretical value calculation method has the following disadvantages: based on a single test vector, it is scattered and unsystematic and has a large number of repetitive and invalid calculations; the level of developers is uneven and the form is too long and difficult to check, resulting in many omissions and errors; automation The degree is extremely low and unstable. Because the reading of the form cannot be located and tracked to the specific form page position, it is difficult to solve the error.

发明内容Contents of the invention

有鉴于此，本发明实施例提供一种内核性能理论值计算方法、装置及存储介质，以实现处理器芯片内核性能验证中理论性能计算的准确性、高效性和灵活性。In view of this, an embodiment of the present invention provides a method, device and storage medium for calculating a theoretical value of core performance, so as to realize the accuracy, efficiency and flexibility of theoretical performance calculation in processor chip core performance verification.

第一方面，本发明实施例提供一种内核性能理论值计算方法，包括：In the first aspect, an embodiment of the present invention provides a method for calculating a theoretical value of kernel performance, including:

获取测试用例集，其中测试用例集用于内核性能理论值计算时的不同应用场景模拟；Obtain a test case set, where the test case set is used to simulate different application scenarios when calculating the theoretical value of kernel performance;

解析测试用例集中各测试用例的内容得到各组元数据，其中不同组元数据为在不同测试用例模拟的应用场景下对内核执行的硬件操作及相关数据；Analyze the content of each test case in the test case set to obtain each group of metadata, where different groups of metadata are the hardware operations and related data performed on the kernel in the application scenarios simulated by different test cases;

针对每组元数据：根据内核配置要求和本组元数据，对内核性能进行理论性能计算。For each set of metadata: Calculate the theoretical performance of the kernel performance based on the kernel configuration requirements and this set of metadata.

进一步的，在针对每组元数据：根据内核配置要求和本组元数据，对内核性能进行理论性能计算之前，所述方法还包括：Further, before performing theoretical performance calculation on kernel performance for each set of metadata: according to the kernel configuration requirements and this set of metadata, the method further includes:

根据本次内核性能理论计算对应的项目，读取对应的项目描述得到内核配置要求。Calculate the corresponding project according to this kernel performance theory, and read the corresponding project description to get the kernel configuration requirements.

进一步的，根据内核配置要求和本组元数据，对内核性能进行理论性能计算，包括：Further, according to the kernel configuration requirements and this group of metadata, theoretical performance calculations are performed on the kernel performance, including:

针对内核各模块的子函数：将本组元数据和内核配置要求传递给子函数；调用子函数基于内核配置要求和本组元数据进行对应内核模块性能理论值的计算；Subfunctions for each module of the kernel: pass this group of metadata and kernel configuration requirements to the subfunction; call the subfunction to calculate the theoretical value of the corresponding kernel module performance based on the kernel configuration requirements and this group of metadata;

在所有子函数调用完成后，汇总本组元数据输入下内核各模块的性能理论值计算结果，结合预设的相关需求预测性能瓶颈分布和/或带宽占用。After all the sub-function calls are completed, the performance theoretical value calculation results of each module of the kernel under the input of this group of metadata are summarized, and the performance bottleneck distribution and/or bandwidth occupancy are predicted in combination with the preset related requirements.

进一步的，在针对内核各模块的子函数：将本组元数据和内核配置要求传递给子函数之前，所述方法还包括：查找与本组元数据相关的内核至少一个模块的子函数；Further, before the subfunctions of each module of the kernel: before passing the set of metadata and kernel configuration requirements to the subfunctions, the method also includes: searching for the subfunctions of at least one module of the kernel related to the set of metadata;

针对内核各模块的子函数：将本组元数据和内核配置要求传递给子函数；调用子函数基于内核配置要求和本组元数据进行对应内核模块性能理论值的计算，包括：Subfunctions for each module of the kernel: pass this group of metadata and kernel configuration requirements to the subfunction; call the subfunction to calculate the theoretical performance value of the corresponding kernel module based on the kernel configuration requirements and this group of metadata, including:

针对查找到的内核各模块的子函数：将本组元数据和内核配置要求传递给子函数；调用子函数基于内核配置要求和本组元数据进行对应内核模块性能理论值的计算。For the found subfunctions of each module of the kernel: pass this group of metadata and kernel configuration requirements to the subfunction; call the subfunction to calculate the theoretical performance value of the corresponding kernel module based on the kernel configuration requirements and this group of metadata.

进一步的，查找与本组元数据相关的内核至少一个模块的子函数，包括：Further, find subfunctions of at least one module of the kernel related to this group of metadata, including:

预先建立多个内核模块的关键字与多个子函数之间的映射关系；Pre-establish the mapping relationship between keywords of multiple kernel modules and multiple sub-functions;

将本组元数据逐个匹配每个内核模块的关键字，与一个内核模块的关键字匹配成功时确定该元数据属于该内核模块的激励，该内核模块的关键字对应的子函数为该元数据相关的一个子函数。Match this group of metadata to the keyword of each kernel module one by one. When the keyword of a kernel module is successfully matched, it is determined that the metadata belongs to the stimulus of the kernel module, and the subfunction corresponding to the keyword of the kernel module is the metadata A related sub-function.

第二方面，本发明实施例提供一种内核性能理论值计算装置，包括：In a second aspect, an embodiment of the present invention provides a device for calculating a theoretical value of kernel performance, including:

测试用例集获取单元，用于获取测试用例集，其中测试用例集用于内核性能理论值计算时的不同应用场景模拟；The test case set acquisition unit is used to obtain the test case set, wherein the test case set is used for the simulation of different application scenarios when calculating the theoretical value of the kernel performance;

元数据生成单元，用于解析测试用例集中各测试用例的内容得到各组元数据，其中不同组元数据为在不同测试用例模拟的应用场景下对内核执行的硬件操作及相关数据；The metadata generation unit is used to analyze the content of each test case in the test case set to obtain each group of metadata, wherein different groups of metadata are the hardware operations and related data performed on the kernel under the application scenarios simulated by different test cases;

理论性能计算单元，用于针对每组元数据：根据内核配置要求和本组元数据，对内核性能进行理论性能计算。The theoretical performance calculation unit is used for each set of metadata: to calculate the theoretical performance of the kernel performance according to the kernel configuration requirements and the set of metadata.

进一步的，所述装置还包括内核配置要求获取单元，用于：Further, the device also includes a kernel configuration requirement acquisition unit, configured to:

在理论性能计算单元针对每组元数据：根据内核配置要求和本组元数据，对内核性能进行理论性能计算之前，根据本次内核性能理论计算对应的项目，读取对应的项目描述得到内核配置要求。For each set of metadata in the theoretical performance calculation unit: according to the kernel configuration requirements and this set of metadata, before performing theoretical performance calculations on the kernel performance, calculate the corresponding items according to this kernel performance theory, and read the corresponding item description to get the kernel configuration Require.

进一步的，理论性能计算单元用于根据内核配置要求和本组元数据，对内核性能进行理论性能计算，包括：Further, the theoretical performance calculation unit is used to calculate the theoretical performance of the kernel performance according to the kernel configuration requirements and this group of metadata, including:

进一步的，所述理论性能计算单元还用于：在针对内核各模块的子函数：将本组元数据和内核配置要求传递给子函数之前，查找与本组元数据相关的内核至少一个模块的子函数；Further, the theoretical performance calculation unit is also used to: before the sub-functions of each module of the kernel: before passing this group of metadata and kernel configuration requirements to the sub-functions, find the information of at least one module of the kernel related to this group of metadata subfunction;

所述理论性能计算单元用于针对内核各模块的子函数：将本组元数据和内核配置要求传递给子函数，调用子函数基于内核配置要求和本组元数据进行对应内核模块性能理论值的计算，包括：The theoretical performance calculation unit is used for the sub-functions of each module of the kernel: passing this group of metadata and kernel configuration requirements to the sub-functions, and calling the sub-functions to calculate the performance theoretical value of the corresponding kernel module based on the kernel configuration requirements and this group of metadata calculations, including:

进一步的，所述理论性能计算单元用于查找与本组元数据相关的内核至少一个模块的子函数，包括：Further, the theoretical performance calculation unit is used to find subfunctions of at least one module of the kernel related to this set of metadata, including:

第三方面，本发明实施例还提供一种计算机可读存储介质，所述计算机可读存储介质存储有一个或者多个程序，所述一个或者多个程序可被一个或者多个中央处理器执行，以实现前述第一方面所述的内核性能理论值计算方法。In the third aspect, the embodiment of the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more central processing units , so as to realize the method for calculating the theoretical value of the kernel performance described in the aforementioned first aspect.

本发明实施例提供的技术方案中，采用智能自动化处理方式取代旧有的人工加脚本的处理方式，使得稳定性大幅提升并带来了更小的维护需求，同时创新的对测试用例集解析和内核性能理论值计算进行了分层处理，提高了计算的准确性、便利性和泛用性。In the technical solution provided by the embodiment of the present invention, the intelligent automatic processing method is used to replace the old manual script processing method, which greatly improves the stability and brings smaller maintenance requirements. The calculation of the theoretical value of the core performance is processed in layers, which improves the accuracy, convenience and versatility of the calculation.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明实施例提供的一种内核性能理论值计算方法的流程图；Fig. 1 is a flow chart of a method for calculating a core performance theoretical value provided by an embodiment of the present invention;

图2为本发明实施例提供的一种内核性能理论值计算方法适用的的架构图；FIG. 2 is an applicable architecture diagram of a method for calculating a theoretical value of kernel performance provided by an embodiment of the present invention;

图3为本发明实施例提供的一种内核性能理论值计算装置的结构示意图。FIG. 3 is a schematic structural diagram of a device for calculating a theoretical value of kernel performance provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明实施例进行详细描述。Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.

应当明确，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。It should be clear that the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

本发明实施例提供了一种内核性能理论值计算方法，该方法可以由对应的内核性能理论值计算装置执行，该装置可以被集成在部署有处理器内核的智能设备中。参见图1，该方法具体包括如下步骤101-103。An embodiment of the present invention provides a method for calculating a theoretical value of core performance, which can be executed by a corresponding device for calculating a theoretical value of core performance, and the device can be integrated in an intelligent device deployed with a processor core. Referring to FIG. 1 , the method specifically includes the following steps 101-103.

步骤101、获取测试用例集，其中测试用例集用于内核性能理论值计算时的不同应用场景模拟。Step 101. Obtain a test case set, wherein the test case set is used for simulating different application scenarios when calculating the theoretical value of kernel performance.

在本步骤中，内核可以是中央处理器或者GPGPU中的核心芯片，内部由多个模块(称为内核模块)组成。一个测试用例集即为一组测试用例，该组测试用例中的不同测试用例用于内核同一性能的理论值计算时不同应用场景的模拟。对于内核的同一个性能理论值的计算而言,需要通过组合各种测试配置和内核载荷来形成一组测试用例，比如计算内核中某一模块的数据吞吐能力的理论值时,会使用各种不同的数据读取指令并结合对该模块的不同配置参数来进行测试。In this step, the kernel may be a central processing unit or a core chip in a GPGPU, which is internally composed of multiple modules (called kernel modules). A test case set is a group of test cases, and different test cases in this group of test cases are used to simulate different application scenarios when calculating the theoretical value of the same performance of the kernel. For the calculation of the same performance theoretical value of the kernel, it is necessary to form a set of test cases by combining various test configurations and kernel loads. For example, when calculating the theoretical value of the data throughput capacity of a certain module in the kernel, various Different data read commands are tested in combination with different configuration parameters of the module.

步骤102、解析测试用例集中各测试用例的内容得到各组元数据，其中不同组元数据为在不同测试用例模拟的应用场景下对内核执行的硬件操作及相关数据。Step 102: Analyze the content of each test case in the test case set to obtain each group of metadata, wherein different groups of metadata are hardware operations and related data performed on the kernel under different application scenarios simulated by the test cases.

在本步骤中，解析测试用例集中的一个测试用例的内容得到一组元数据，该组元数据为在对应测试用例模拟的应用场景下对内核执行的硬件操作及相关数据。其中，解析过程可以采用现有的文字解析技术实现，在此不再赘述。In this step, a set of metadata is obtained by analyzing the content of a test case in the test case set, and the set of metadata is the hardware operation and related data performed on the kernel in the application scenario simulated by the corresponding test case. Wherein, the parsing process may be realized by using existing text parsing technology, and details are not repeated here.

步骤103、针对每组元数据：根据内核配置要求和本组元数据，对内核性能进行理论性能计算。Step 103 , for each set of metadata: according to the kernel configuration requirements and this set of metadata, theoretical performance calculation is performed on the kernel performance.

在本步骤中，一组元数据对应一个应用场景，有自己的内核载荷信息。按照该元数据中的内核载荷信息对内核进行配置，配置完成后将该组元数据依次输入内核相关模块，得到在相关模块内产生的性能输出。优选的，可以进一步汇总该组元数据在不同模块产生的性能输出，并结合预设的相关需求预测性能瓶颈分布和带宽占用。In this step, a set of metadata corresponds to an application scenario and has its own kernel load information. The kernel is configured according to the kernel load information in the metadata, and after the configuration is completed, this group of metadata is input into the relevant modules of the kernel in sequence, and the performance output generated in the relevant modules is obtained. Preferably, the performance output generated by the group of metadata in different modules can be further summarized, and the performance bottleneck distribution and bandwidth occupancy can be predicted in combination with preset related requirements.

示例性的，根据内核配置要求和本组元数据，对内核性能进行理论性能计算，包括：Exemplarily, according to the kernel configuration requirements and this set of metadata, theoretical performance calculations are performed on the kernel performance, including:

优选的，在针对内核各模块的子函数：将本组元数据和内核配置要求传递给子函数之前，所述方法还包括：查找与本组元数据相关的内核至少一个模块的子函数；Preferably, before the subfunctions of each module of the kernel: before passing the set of metadata and kernel configuration requirements to the subfunctions, the method further includes: searching for the subfunctions of at least one module of the kernel related to the set of metadata;

具体实施时，查找与本组元数据相关的内核至少一个模块的子函数，可进一步包括：During specific implementation, searching for subfunctions of at least one module of the kernel related to this group of metadata may further include:

其中，在上述实施方式中，预先建立有多个内核模块的关键字与多个子函数之间的映射关系，例如：内核模块1的关键字对应子函数1；内核模块2的关键字对应子函数2；内核模块3的关键字对应子函数3，等等。每个内核模块都有自己的关键字，关键字为内核模块的特征描述。得到一组元数据后，将该组元数据去逐个匹配每个内核模块的关键字，与一个内核模块的关键字匹配成功时确定该组元数据属于该内核模块的激励，该内核模块的关键字对应的子函数为该组元数据相关的一个子函数。未匹配成功的内核模块的关键字对应的子函数无需再调用，此种实施方式下的理论值计算结果准确且精简，能够避免出现每次理论值计算时需要调用全部子函数从而耗费大量资源的弊端，不会出现输出结果太多而掩盖关键信息的问题。Wherein, in the above-mentioned embodiment, the mapping relationship between keywords of multiple kernel modules and multiple sub-functions is established in advance, for example: the keyword of kernel module 1 corresponds to sub-function 1; the keyword of kernel module 2 corresponds to sub-function 2; the keyword of kernel module 3 corresponds to subfunction 3, and so on. Each kernel module has its own keyword, and the keyword is a description of the characteristics of the kernel module. After obtaining a set of metadata, match the set of metadata with the keywords of each kernel module one by one. When the keyword of a kernel module is successfully matched, it is determined that the set of metadata belongs to the incentive of the kernel module. The key of the kernel module The sub-function corresponding to the word is a sub-function related to this group of metadata. The sub-functions corresponding to the keywords of the unmatched kernel modules do not need to be called again. The theoretical value calculation results in this implementation mode are accurate and concise, which can avoid the need to call all sub-functions every time the theoretical value is calculated, which consumes a lot of resources. Disadvantages, there will be no problem of covering up key information due to too many output results.

在以上方案的基础上，本发明实施例提供的方法中，在针对每组元数据：根据内核配置要求和本组元数据，对内核性能进行理论性能计算之前，还包括On the basis of the above scheme, in the method provided by the embodiment of the present invention, for each set of metadata: according to the kernel configuration requirements and this set of metadata, before performing theoretical performance calculation on the kernel performance, it also includes

步骤100、根据本次内核性能理论计算对应的项目，读取对应的项目描述得到内核配置要求。Step 100 , calculate the corresponding items according to the current kernel performance theory, and read the corresponding item description to obtain the kernel configuration requirements.

不同项目都有具体的内核配置要求，该配置要求可以预先由技术人员手工写好后以项目描述的形式存储。在本步骤中，可以根据本次的项目，读取对应的项目描述从而得到内核配置要求。该方式可以提升本发明实施例的泛用性，适用于各类项目。Different projects have specific kernel configuration requirements, which can be manually written by technicians in advance and stored in the form of project descriptions. In this step, you can read the corresponding project description according to this project to obtain the kernel configuration requirements. This method can improve the versatility of the embodiments of the present invention, and is applicable to various projects.

下面在上述实施例的基础上，以处理器类型是GPGPU为例提供一优选实施例。Based on the foregoing embodiments, a preferred embodiment is provided below by taking the processor type as GPGPU as an example.

为了解决当前技术的种种问题，提高性能验证的效率和可靠性，本方案进行了以下几点改进和提升：In order to solve various problems of current technology and improve the efficiency and reliability of performance verification, this solution has made the following improvements and enhancements:

采用纯编程语言(Python)描述的方式取代原有的手工输入Excel表格加perl脚本读取及处理的方式；The method described in pure programming language (Python) is used to replace the original method of manually inputting Excel tables plus perl scripts to read and process;

进行两层建模1)测试用例集模型专注于提取测试用例本身特征值和数据并对调用核心模型(内核理论模型)后产生的输出值进行处理以供后续流程使用2)核心模型涵盖各个子模块建模并提供统一的API供测试用例集模型调用。Carry out two-layer modeling 1) The test case set model focuses on extracting the characteristic values and data of the test case itself and processes the output value generated after calling the core model (kernel theory model) for subsequent use 2) The core model covers each sub- The module models and provides a unified API for the test case set model to call.

具体的，参见图2提供的架构，一种内核性能理论值计算方法包括如下步骤200-203。Specifically, referring to the framework provided in FIG. 2 , a method for calculating a theoretical value of kernel performance includes the following steps 200-203.

步骤200、预先存储用于GPGPU内核性能理论值计算的多个测试用例集。Step 200, pre-store a plurality of test case sets used for calculating the theoretical value of GPGPU kernel performance.

在本步骤中，可以预先创建多个测试用例集。其中，不同的测试用例集用于GPGPU内核不同性能的理论值计算时的应用场景模拟，比如：测试用例集a用于GPGPU内核性能1的理论值计算时的应用场景模拟，测试用例集b用于GPGPU内核性能2的理论值计算时的应用场景模拟，测试用例集c用于GPGPU内核性能3的理论值计算时的应用场景模拟等。其中，GPGPU内核性能包括：负责分发线程的前端控制器的接收与发射速率,数学运算执行模块的对各种不同运算的执行速率,数据吞吐模块的吞吐能力，等等。应用场景模拟可包括赋予GPGPU内核特定的载荷及测试配置，不同应用场景下载荷和测试配置通常不同。In this step, multiple test case sets can be created in advance. Among them, different test case sets are used for the application scenario simulation when calculating the theoretical value of different performances of the GPGPU kernel, for example: the test case set a is used for the application scenario simulation when the theoretical value calculation of the GPGPU kernel performance The application scenario simulation for the calculation of the theoretical value of the GPGPU kernel performance 2, the test case set c is used for the application scenario simulation of the theoretical value calculation of the GPGPU kernel performance 3, etc. Among them, the performance of the GPGPU kernel includes: the receiving and transmitting rate of the front-end controller responsible for distributing threads, the execution rate of various operations of the mathematical operation execution module, the throughput capacity of the data throughput module, and so on. Application scenario simulation may include assigning specific loads and test configurations to the GPGPU kernel, and the loads and test configurations are usually different in different application scenarios.

具体实施时，每个测试用例集内容包含有：测试用例配置、测试用例源代码和执行指令集。其中，测试用例源代码为GPGPU内核一个性能理论值计算过程中一种应用场景模拟的一部分实现代码，该部分实现代码可通过高级编程语言实现，典型的可包括如下代码：理论值计算过程的初始化、GPGPU内核载荷配置、理论值计算过程中用到的对GPGPU内核的执行指令的低级语言实现代码在执行指令集文件中的存储位置、指令所需的被操作数据的存储位置寄存处内容、GPGPU执行指定操作所必需的配置寄存器的读写、理论值计算过程中的数据采集等。During specific implementation, the content of each test case set includes: test case configuration, test case source code and execution instruction set. Among them, the source code of the test case is a part of the implementation code of an application scenario simulation in the calculation process of a performance theoretical value of the GPGPU kernel. This part of the implementation code can be realized by a high-level programming language, and typically includes the following code: Initialization of the theoretical value calculation process , GPGPU kernel load configuration, the low-level language implementation code used in the calculation of the theoretical value of the GPGPU kernel execution instructions in the execution instruction set file storage location, the storage location of the manipulated data required by the instruction register content, GPGPU Reading and writing of configuration registers necessary to perform specified operations, data acquisition during theoretical value calculations, etc.

执行指令集文件为对GPGPU内核的所有执行指令的低级语言实现代码集合文件。其中，执行指令可包括：读取指令、写指令、移位指令、异或指令等。The execution instruction set file is a low-level language implementation code collection file for all execution instructions of the GPGPU kernel. Wherein, the execution instruction may include: a read instruction, a write instruction, a shift instruction, an exclusive OR instruction, and the like.

典型的，HLSL(High Level Shading Language，高级渲染语言)的测试用例集源代码采用C++编程实现，格式为cpp文件；执行指令集文件格式为SP3(渲染指令文件格式)。Typically, the source code of the test case set of HLSL (High Level Shading Language, Advanced Rendering Language) is implemented by C++ programming, and the format is cpp file; the file format of the execution instruction set is SP3 (rendering instruction file format).

步骤201、测试用例集理论模型获取预先存储的多个测试用例集中本次GPGPU内核性能理论值计算需要的目标测试用例集，调用理论模型工具集解析目标测试用例集中各测试用例的配置、源代码和执行指令集，得到内核理论模型所必需的各组元数据。Step 201, the theoretical model of the test case set obtains the target test case set required for the calculation of the theoretical value of the GPGPU kernel performance in multiple pre-stored test case sets, and calls the theoretical model tool set to analyze the configuration and source code of each test case in the target test case set and the execution instruction set to obtain various sets of metadata necessary for the kernel theoretical model.

在本步骤201中，测试用例集理论模型在预先存储的多个测试用例集中找到本次对GPGPU内核一个性能的理论值计算需要的目标测试用例集后，针对目标测试用例集中的各测试用例：读取测试用例的配置、源代码和执行指令集文件，对读取结果进行文本解析，以从中得到在测试用例模拟的应用场景下对GPGPU内核执行的硬件操作及相关数据，作为内核理论模型所必需的一组元数据。其中，对于目标测试用例集中的任一测试用例，其所模拟的应用场景对应存在一组元数据，该组元数据包括：In this step 201, after the test case set theoretical model finds the target test case set required for the calculation of a theoretical value of a performance of the GPGPU kernel in a plurality of pre-stored test case sets, for each test case in the target test case set: Read the configuration, source code, and execution instruction set files of the test case, and perform text analysis on the read results to obtain the hardware operations and related data performed on the GPGPU kernel in the application scenario simulated by the test case, as the kernel theoretical model. A required set of metadata. Among them, for any test case in the target test case set, there is a set of metadata corresponding to the simulated application scenario, and the set of metadata includes:

对GPGPU内核的执行指令的低级语言实现代码所代表的指令条数，每条指令的操作类型、操作数的类型和大小等；The number of instructions represented by the low-level language implementation code for the execution instructions of the GPGPU kernel, the operation type of each instruction, the type and size of the operand, etc.;

GPGPU内核的各类不同存储单元的个数、指令执行所途径的模块及其接口、总的数据请求量和类型等。The number of various storage units of the GPGPU core, the modules and interfaces through which instructions are executed, the total amount and type of data requests, etc.

步骤202、内核理论模型获取本次GPGPU内核性能理论值计算需要的GPGPU内核配置要求。Step 202 , the kernel theoretical model acquires the GPGPU kernel configuration requirements required for this calculation of the theoretical value of the GPGPU kernel performance.

示例性的，GPGPU内核配置要求可具体包括：参变量、各个内部端口的数据宽度、不同子模块的个数和配置、各个子模块的理想峰值能力等。Exemplarily, the GPGPU kernel configuration requirements may specifically include: parameters, data width of each internal port, number and configuration of different sub-modules, ideal peak capacity of each sub-module, and the like.

步骤203、内核理论模型针对测试用例集理论模型本次输入的各组元数据：根据本组元数据以及获取到的GPGPU内核配置要求，调用理论模型工具集进行理论性能计算分析，得到本组元数据输入下的GPGPU内核性能理论值计算结果。Step 203, Kernel theoretical model For each set of metadata input by the test case set theoretical model this time: according to this set of metadata and the obtained GPGPU kernel configuration requirements, call the theoretical model tool set to perform theoretical performance calculation and analysis, and obtain this set of elements Calculation results of theoretical values of GPGPU kernel performance under data input.

其中，所述理论值计算结果可包括：理论值、可能的性能瓶颈分布以及带宽占用。具体的，调用理论模型工具集进行理论性能计算分析，得到本组元数据输入下的GPGPU内核性能理论值计算结果，包括：Wherein, the calculation result of the theoretical value may include: theoretical value, possible performance bottleneck distribution and bandwidth occupation. Specifically, call the theoretical model toolset for theoretical performance calculation and analysis, and obtain the calculation results of the theoretical value of the GPGPU kernel performance under the input of this group of metadata, including:

子步骤1、内核理论模型查找理论模型工具集中与本组元数据相关的GPGPU内核至少一个模块的子函数；Sub-step 1, the kernel theoretical model searches the subfunction of at least one module of the GPGPU kernel related to this group of metadata in the theoretical model tool set;

子步骤2、针对查找到的GPGPU内核各模块的子函数：Sub-step 2, for the sub-functions of each module of the found GPGPU kernel:

①将本组元数据和GPGPU内核配置要求传递给子函数；① Pass this group of metadata and GPGPU kernel configuration requirements to the sub-function;

②调用子函数基于接收到的本组元数据和GPGPU内核配置要求进行对应GPGPU内核模块性能理论值的计算；②Call the sub-function to calculate the theoretical value of the corresponding GPGPU kernel module performance based on the received metadata of this group and the GPGPU kernel configuration requirements;

其中，计算结果表明此种激励(测试用例集理论模型输入的本组元数据以及获取到的GPGPU内核配置要求)会在自身模块产生怎样的性能输出；Among them, the calculation results show what kind of performance output this kind of stimulus (this set of metadata input by the theoretical model of the test case set and the obtained GPGPU kernel configuration requirements) will produce in its own module;

子步骤3、在所有子函数调用完成后，顶层函数汇总子步骤2中得到的本组元数据输入下的GPGPU内核各模块的性能理论值计算结果，并结合预设的相关需求预测性能瓶颈分布和带宽占用，在转换量纲和格式后返回给内核理论模型，由内核理论模型传递给测试用例集理论模型。Sub-step 3. After all sub-function calls are completed, the top-level function summarizes the performance theoretical value calculation results of each module of the GPGPU kernel under the input of this group of metadata obtained in sub-step 2, and predicts the performance bottleneck distribution based on the preset related requirements and bandwidth occupancy are returned to the kernel theoretical model after converting dimensions and formats, and the kernel theoretical model is passed to the test case set theoretical model.

可选的，测试用例集理论模型在获取到内核理论模型的分析结果后，调用预先定制的GUI图形用户界面显示该分析结果。其中，分析结果可包括如下内容：不同组元数据输入下的GPGPU内核各模块的性能理论值计算结果；性能瓶颈分布和带宽占用。Optionally, after the test case set theoretical model obtains the analysis result of the core theoretical model, it calls a pre-customized GUI graphical user interface to display the analysis result. Wherein, the analysis results may include the following contents: calculation results of theoretical performance values of each module of the GPGPU kernel under different sets of metadata input; performance bottleneck distribution and bandwidth occupation.

经过以上示例给出的改进方案，能够解决现有方法的不稳定性和低效率进而解放了人力，更重要的是新方案采用了建模思维对芯片内核从理论值计算角度进行了建模从而带来了更高的准确度，减少了重复计算并且带来了更广泛的应用前景。After the improvement scheme given in the above example, it can solve the instability and low efficiency of the existing method and liberate manpower. More importantly, the new scheme uses modeling thinking to model the core of the chip from the perspective of theoretical value calculation. It brings higher accuracy, reduces double counting and brings wider application prospects.

相应的，本发明实施例提供了一种内核性能理论值计算装置，该装置可以用于执行本发明实施例所述的内核性能理论值计算方法，该装置可以被集成在部署有处理器内核的智能设备中。参见图3，该装置具体包括以下单元：Correspondingly, an embodiment of the present invention provides a device for calculating a theoretical value of core performance, which can be used to execute the method for calculating a theoretical value of core performance described in the embodiment of the present invention, and the device can be integrated in a processor core deployed in smart devices. Referring to Figure 3, the device specifically includes the following units:

测试用例集获取单元301，用于获取测试用例集，其中测试用例集用于内核性能理论值计算时的不同应用场景模拟；The test case set obtaining unit 301 is used to obtain the test case set, wherein the test case set is used for simulation of different application scenarios when calculating the theoretical value of kernel performance;

元数据生成单元302，用于解析测试用例集中各测试用例的内容得到各组元数据，其中不同组元数据为在不同测试用例模拟的应用场景下对内核执行的硬件操作及相关数据；The metadata generation unit 302 is used to analyze the content of each test case in the test case set to obtain each group of metadata, wherein different groups of metadata are hardware operations and related data performed on the kernel under the application scenarios simulated by different test cases;

理论性能计算单元303，用于针对每组元数据：根据内核配置要求和本组元数据，对内核性能进行理论性能计算。The theoretical performance calculation unit 303 is configured to, for each set of metadata: perform theoretical performance calculation on the kernel performance according to the kernel configuration requirements and the set of metadata.

进一步的，本发明实施例提供的内核性能理论值计算装置还包括内核配置要求获取单元300，用于：Further, the device for calculating the theoretical value of kernel performance provided by the embodiment of the present invention also includes a kernel configuration requirement acquisition unit 300, configured to:

进一步的，理论性能计算单元303用于根据内核配置要求和本组元数据，对内核性能进行理论性能计算，包括：Further, the theoretical performance calculation unit 303 is used to perform theoretical performance calculations on the kernel performance according to the kernel configuration requirements and this group of metadata, including:

进一步的，所述理论性能计算单元303还用于：在针对内核各模块的子函数：将本组元数据和内核配置要求传递给子函数之前，查找与本组元数据相关的内核至少一个模块的子函数；Further, the theoretical performance calculation unit 303 is also used to: before the subfunctions of each module of the kernel: before passing the metadata and kernel configuration requirements of the group to the subfunctions, find at least one module of the kernel related to the metadata of the group subfunction of

进一步的，所述理论性能计算单元303用于查找与本组元数据相关的内核至少一个模块的子函数，包括：Further, the theoretical performance calculation unit 303 is used to find subfunctions of at least one module of the kernel related to this set of metadata, including:

本实施例提供的内核性能理论值计算装置与前述方法实施例属于同一发明构思，未在本实施例中描述的技术细节可参见前述方法实施例中的相关描述，在此不再赘述。The device for calculating the theoretical value of kernel performance provided by this embodiment belongs to the same inventive concept as the foregoing method embodiments. For technical details not described in this embodiment, please refer to the relevant descriptions in the foregoing method embodiments, which will not be repeated here.

此外，本发明实施例还提供一种计算机可读存储介质，所述计算机可读存储介质存储有一个或多个程序，所述一个或者多个程序可被一个或者多个中央处理器执行，以实现前述实施例所述的内核性能理论值计算方法。In addition, an embodiment of the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more central processing units to The method for calculating the theoretical value of the core performance described in the foregoing embodiments is realized.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

本发明实施例中术语“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。The term "and/or" in the embodiments of the present invention describes the association relationship of associated objects, indicating that there may be three relationships, for example, A and/or B, which may mean: A exists alone, A and B exist simultaneously, and B exists alone These three situations. The character "/" generally indicates that the contextual objects are an "or" relationship.

本说明书中的各个实施例均采用相关的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。Each embodiment in this specification is described in a related manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.

尤其，对于装置实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.

为了描述的方便，描述以上装置是以功能分为各种单元/模块分别描述。当然，在实施本发明时可以把各单元/模块的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, the above devices are described by dividing their functions into various units/modules and describing them separately. Of course, when implementing the present invention, the functions of each unit/module can be implemented in one or more pieces of software and/or hardware.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)或随机存储记忆体(Random AccessMemory，RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the programs can be stored in a computer-readable storage medium. During execution, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM) and the like.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. All should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. A method for calculating a theoretical value of performance of a kernel, the method comprising:

acquiring a test case set, wherein the test case set is used for simulating different application scenes when kernel performance theoretical values are calculated;

analyzing the content of each test case in the test case set to obtain each group of metadata, wherein different groups of metadata are hardware operations and related data executed on the kernel under the application scene simulated by different test cases;

for each set of metadata: according to the kernel configuration requirement and the group of metadata, performing theoretical performance calculation on kernel performance to obtain a kernel performance theoretical value calculation result under the input of the group of metadata so as to finish calculation of theoretical values in the performance verification process;

the kernel is a core chip in a central processing unit or a general graphics processor, and the theoretical value calculation result comprises a theoretical value.

2. The method of claim 1, wherein, for each set of metadata: before performing theoretical performance calculation on the kernel performance according to the kernel configuration requirement and the group of metadata, the method further comprises:

and calculating corresponding items according to the kernel performance theory, and reading corresponding item description to obtain kernel configuration requirements.

3. The method of claim 1, wherein performing theoretical performance calculations on core performance based on core configuration requirements and the set of metadata comprises:

sub-functions for each module of the kernel: transferring the set of metadata and the kernel configuration requirements to the subfunctions; calling a sub-function to calculate a performance theoretical value of a corresponding kernel module based on the kernel configuration requirement and the group of metadata;

after all the subfunctions are called, the calculation results of the performance theoretical values of all the modules of the kernel under the input of the metadata of the group are summarized, and the performance bottleneck distribution and/or bandwidth occupation are predicted by combining the preset related requirements.

4. A method according to claim 3, wherein, in the case of sub-functions for each module of the kernel: before passing the set of metadata and kernel configuration requirements to the sub-function, the method further comprises: searching a sub-function of at least one module of the kernel related to the group of metadata;

sub-functions for each module of the kernel: transferring the set of metadata and the kernel configuration requirements to the subfunctions; calling the subfunction to calculate the performance theoretical value of the corresponding kernel module based on the kernel configuration requirement and the group of metadata comprises the following steps:

sub-functions for each module of the found kernel: transferring the set of metadata and the kernel configuration requirements to the subfunctions; and calling the subfunction to calculate the performance theoretical value of the corresponding kernel module based on the kernel configuration requirement and the group of metadata.

5. The method of claim 4, wherein finding a sub-function of at least one module of the kernel associated with the set of metadata comprises:

pre-establishing mapping relations between keywords of a plurality of kernel modules and a plurality of sub-functions;

and matching the group of metadata with the keywords of each kernel module one by one, and determining that the metadata belongs to the excitation of one kernel module when the key of the kernel module is successfully matched with the keywords of the kernel module, wherein the sub-function corresponding to the key of the kernel module is a sub-function related to the metadata.

6. A core performance theory value calculation apparatus, characterized in that the apparatus comprises:

the test case set acquisition unit is used for acquiring a test case set, wherein the test case set is used for simulating different application scenes when the kernel performance theoretical value is calculated;

the metadata generation unit is used for analyzing the content of each test case in the test case set to obtain each group of metadata, wherein different groups of metadata are hardware operations and related data executed on the kernel under the application scene simulated by different test cases;

theoretical performance calculation unit for, for each set of metadata: according to the kernel configuration requirement and the group of metadata, performing theoretical performance calculation on kernel performance to obtain a kernel performance theoretical value calculation result under the input of the group of metadata so as to finish calculation of theoretical values in the performance verification process;

7. The apparatus according to claim 6, further comprising a kernel configuration requirement acquisition unit configured to:

for each set of metadata at the theoretical performance calculation unit: and before theoretical performance calculation is carried out on the kernel performance according to the kernel configuration requirements and the group of metadata, corresponding items are calculated according to the kernel performance theory, and corresponding item descriptions are read to obtain the kernel configuration requirements.

8. The apparatus of claim 6, wherein the theoretical performance calculation unit is configured to perform theoretical performance calculation on the kernel performance according to the kernel configuration requirement and the set of metadata, and comprises:

9. The apparatus of claim 8, wherein the theoretical performance calculation unit is further configured to: in the sub-functions for each module of the kernel: before the group of metadata and the kernel configuration requirement are transferred to the subfunction, searching the subfunction of at least one module of the kernel related to the group of metadata;

the theoretical performance calculation unit is used for carrying out the subfunction of each module of the kernel: transferring the set of metadata and the kernel configuration requirement to a subfunction, and calling the subfunction to calculate a corresponding kernel module performance theoretical value based on the kernel configuration requirement and the set of metadata, wherein the method comprises the following steps:

10. The apparatus of claim 9, wherein the theoretical performance calculation unit is configured to find a sub-function of at least one module of the kernel associated with the set of metadata, comprising:

11. A computer readable storage medium storing one or more programs executable by one or more central processing units to implement the method of any of claims 1-5.