CN111275197B

CN111275197B - Operation method, device, computer equipment and storage medium

Info

Publication number: CN111275197B
Application number: CN201910625443.2A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-12-05
Filing date: 2019-07-11
Publication date: 2023-11-10
Anticipated expiration: 2039-07-11
Also published as: CN111275197A

Abstract

The present disclosure relates to a computing method, device, computer equipment and storage medium. The combined processing device includes: a machine learning computing device, a universal interconnection interface and other processing devices; the machine learning computing device interacts with other processing devices to jointly complete user-specified computing operations, where the combined processing device also includes: a storage device, The storage device is respectively connected to the machine learning computing device and other processing devices, and is used to save data of the machine learning computing device and other processing devices. The computing methods, devices, computer equipment, and storage media provided by the embodiments of the present disclosure have a wide range of applications, and the computing processing efficiency is high and the processing speed is fast.

Description

Computing methods, devices, computer equipment and storage media

技术领域Technical field

本公开涉及计算机技术领域，尤其涉及一种标量类型转换指令运算方法、装置、计算机设备和存储介质。The present disclosure relates to the field of computer technology, and in particular, to a scalar type conversion instruction operation method, device, computer equipment and storage medium.

背景技术Background technique

随着科技的不断发展，机器学习，尤其是神经网络算法的使用越来越广泛。其在图像识别、语音识别、自然语言处理等领域中都得到了良好的应用。但由于神经网络算法的复杂度越来越高，所涉及的数据运算种类和数量不断增大。相关技术中，在对标量数据进行标量类型转换运算的效率低、速度慢。With the continuous development of science and technology, the use of machine learning, especially neural network algorithms, is becoming more and more widespread. It has been well used in image recognition, speech recognition, natural language processing and other fields. However, due to the increasing complexity of neural network algorithms, the types and quantities of data operations involved continue to increase. In related technologies, scalar type conversion operations on scalar data are inefficient and slow.

发明内容Contents of the invention

有鉴于此，本公开提出了一种运算方法、装置、计算机设备和存储介质，以提高对标量数据进行标量类型转换运算的效率和速度。In view of this, the present disclosure proposes an operation method, device, computer equipment and storage medium to improve the efficiency and speed of scalar type conversion operations on scalar data.

根据本公开的第一方面，提供了一种标量类型转换指令处理装置，所述装置包括：According to a first aspect of the present disclosure, a scalar type conversion instruction processing device is provided, and the device includes:

控制模块，用于对获取到的标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域，并根据所述操作码和所述操作域获取执行标量类型转换指令所需的待运算标量和目标地址，以及确定目标数据类型和所述待运算标量的初始数据类型；A control module, configured to parse the obtained scalar type conversion instruction, obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the operations to be performed required to execute the scalar type conversion instruction according to the operation code and the operation domain. Scalar and target address, and determining the target data type and the initial data type of the scalar to be operated;

运算模块，用于根据所述目标数据类型对初始数据类型的所述待运算标量进行标量类型转换运算，获得运算结果，并将所述运算结果存入所述目标地址中，所述运算结果的数据类型为所述目标数据类型，An operation module, configured to perform a scalar type conversion operation on the to-be-operated scalar of the initial data type according to the target data type, obtain the operation result, and store the operation result in the target address. The operation result is The data type is the target data type,

其中，所述操作码用于指示所述标量类型转换指令对数据所进行的运算为标量类型转换运算，所述操作域包括待运算标量地址和所述目标地址。Wherein, the operation code is used to indicate that the operation performed on the data by the scalar type conversion instruction is a scalar type conversion operation, and the operation field includes a scalar address to be operated and the target address.

根据本公开的第二方面，提供了一种机器学习运算装置，所述装置包括：According to a second aspect of the present disclosure, a machine learning computing device is provided, and the device includes:

一个或多个上述第一方面所述的标量类型转换指令处理装置，用于从其他处理装置中获取待运算标量和控制信息，并执行指定的机器学习运算，将执行结果通过I/O接口传递给其他处理装置；One or more scalar type conversion instruction processing devices described in the first aspect, used to obtain scalars to be operated and control information from other processing devices, perform specified machine learning operations, and transfer the execution results through the I/O interface to other processing devices;

当所述机器学习运算装置包含多个所述标量类型转换指令处理装置时，所述多个所述标量类型转换指令处理装置间可以通过特定的结构进行连接并传输数据；When the machine learning computing device includes multiple scalar type conversion instruction processing devices, the multiple scalar type conversion instruction processing devices can be connected and transmit data through a specific structure;

其中，多个所述标量类型转换指令处理装置通过快速外部设备互连总线PCIE总线进行互联并传输数据，以支持更大规模的机器学习的运算；多个所述标量类型转换指令处理装置共享同一控制系统或拥有各自的控制系统；多个所述标量类型转换指令处理装置共享内存或者拥有各自的内存；多个所述标量类型转换指令处理装置的互联方式是任意互联拓扑。Wherein, multiple scalar type conversion instruction processing devices are interconnected and transmit data through the PCIE bus to support larger-scale machine learning operations; multiple scalar type conversion instruction processing devices share the same The control system may have its own control system; the plurality of scalar type conversion instruction processing devices may share memory or have its own memory; the interconnection method of the plurality of scalar type conversion instruction processing devices may be any interconnection topology.

根据本公开的第三方面，提供了一种组合处理装置，所述装置包括：According to a third aspect of the present disclosure, a combined processing device is provided, the device comprising:

上述第二方面所述的机器学习运算装置、通用互联接口和其他处理装置；The machine learning computing device, universal interconnection interface and other processing devices described in the second aspect above;

所述机器学习运算装置与所述其他处理装置进行交互，共同完成用户指定的计算操作。The machine learning computing device interacts with the other processing devices to jointly complete calculation operations specified by the user.

根据本公开的第四方面，提供了一种机器学习芯片，所述机器学习芯片包括上述第二方面所述的机器学习运算装置或上述第三方面所述的组合处理装置。According to a fourth aspect of the present disclosure, a machine learning chip is provided. The machine learning chip includes the machine learning computing device described in the second aspect or the combined processing device described in the third aspect.

根据本公开的第五方面，提供了一种机器学习芯片封装结构，该机器学习芯片封装结构包括上述第四方面所述的机器学习芯片。According to a fifth aspect of the present disclosure, a machine learning chip packaging structure is provided. The machine learning chip packaging structure includes the machine learning chip described in the fourth aspect.

根据本公开的第六方面，提供了一种板卡，该板卡包括上述第五方面所述的机器学习芯片封装结构。According to a sixth aspect of the present disclosure, a board card is provided, which includes the machine learning chip packaging structure described in the fifth aspect.

根据本公开的第七方面，提供了一种电子设备，所述电子设备包括上述第四方面所述的机器学习芯片或上述第六方面所述的板卡。According to a seventh aspect of the present disclosure, an electronic device is provided. The electronic device includes the machine learning chip described in the fourth aspect or the board card described in the sixth aspect.

根据本公开的第八方面，提供了一种标量类型转换指令处理方法，所述方法应用于标量类型转换指令处理装置，所述方法包括：According to an eighth aspect of the present disclosure, a scalar type conversion instruction processing method is provided. The method is applied to a scalar type conversion instruction processing device. The method includes:

对获取到的标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域，并根据所述操作码和所述操作域获取执行标量类型转换指令所需的待运算标量和目标地址，以及确定目标数据类型和所述待运算标量的初始数据类型；Analyze the obtained scalar type conversion instruction to obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the scalar to be operated and the target address required to execute the scalar type conversion instruction based on the operation code and the operation domain, and determining the target data type and the initial data type of the scalar to be operated;

根据所述目标数据类型对初始数据类型的所述待运算标量进行标量类型转换运算，获得运算结果，并将所述运算结果存入所述目标地址中，所述运算结果的数据类型为所述目标数据类型，Perform a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type, obtain the operation result, and store the operation result in the target address. The data type of the operation result is the target data type,

根据本公开的第九方面，提供了一种非易失性计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令被处理器执行时实现上述标量类型转换指令处理方法。According to a ninth aspect of the present disclosure, a non-volatile computer-readable storage medium is provided, on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above-mentioned scalar type conversion instruction processing method is implemented.

在一些实施例中，所述电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In some embodiments, the electronic equipment includes a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, Cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, home appliances, and/or medical equipment.

在一些实施例中，所述交通工具包括飞机、轮船和/或车辆；所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机；所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In some embodiments, the means of transportation include airplanes, ships and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical Equipment includes MRI machines, B-ultrasound machines and/or electrocardiographs.

本公开实施例所提供的标量类型转换指令处理方法、装置、计算机设备和存储介质，该装置包括控制模块和运算模块。控制模块用于对获取到的标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域，并根据操作码和操作域获取执行标量类型转换指令所需的待运算标量和目标地址，以及确定目标数据类型和待运算标量的初始数据类型。运算模块用于根据目标数据类型对初始数据类型的待运算标量进行标量类型转换运算，获得运算结果，并将运算结果存入目标地址中。本公开实施例所提供的标量类型转换指令处理方法、装置、计算机设备和存储介质的适用范围广，对标量类型转换指令的处理效率高、处理速度快，进行标量类型转换的处理效率高、处理速度快。The embodiments of the present disclosure provide a scalar type conversion instruction processing method, device, computer equipment, and storage medium. The device includes a control module and an operation module. The control module is used to parse the obtained scalar type conversion instruction, obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the scalar to be operated and the target address required to execute the scalar type conversion instruction based on the operation code and operation domain. and determining the target data type and the initial data type of the scalar to be operated on. The operation module is used to perform a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type, obtain the operation result, and store the operation result in the target address. The scalar type conversion instruction processing method, device, computer equipment, and storage medium provided by the embodiments of the present disclosure have a wide range of applications, have high processing efficiency and fast processing speed for scalar type conversion instructions, and have high processing efficiency and fast processing speed for scalar type conversion. high speed.

根据下面参考附图对示例性实施例的详细说明，本公开的其它特征及方面将变得清楚。Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

附图说明Description of the drawings

包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面，并且用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the disclosure and together with the description serve to explain the principles of the disclosure.

图1示出根据本公开一实施例的标量类型转换指令处理装置的框图。FIG. 1 shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure.

图2a-图2f示出根据本公开一实施例的标量类型转换指令处理装置的框图。2a-2f illustrate a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure.

图3示出根据本公开一实施例的标量类型转换指令处理装置的应用场景的示意图。FIG. 3 shows a schematic diagram of an application scenario of a scalar type conversion instruction processing device according to an embodiment of the present disclosure.

图4a、图4b示出根据本公开一实施例的组合处理装置的框图。4a and 4b show a block diagram of a combination processing device according to an embodiment of the present disclosure.

图5示出根据本公开一实施例的板卡的结构示意图。Figure 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure.

图6示出根据本公开一实施例的标量类型转换指令处理方法的流程图。FIG. 6 shows a flowchart of a scalar type conversion instruction processing method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

下面将结合本披露实施例中的附图，对本披露实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本披露一部分实施例，而不是全部的实施例。基于本披露中的实施例，本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本披露保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the embodiments in this disclosure, all other embodiments obtained by those skilled in the art without creative efforts fall within the scope of protection of this disclosure.

应当理解，本披露的权利要求、说明书及附图中的术语“第零”、“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。本披露的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that the terms "zero", "first", "second", etc. in the claims, description, and drawings of this disclosure are used to distinguish different objects, rather than describing a specific sequence. The terms "comprise" and "include" used in the description and claims of this disclosure indicate the presence of described features, integers, steps, operations, elements and/or components but do not exclude one or more other features, integers , the presence or addition of steps, operations, elements, components and/or collections thereof.

还应当理解，在此本披露说明书中所使用的术语仅仅是出于描述特定实施例的目的，而并不意在限定本披露。如在本披露说明书和权利要求书中所使用的那样，除非上下文清楚地指明其它情况，否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解，在本披露说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合，并且包括这些组合。It should also be understood that the terminology used in the specification of the disclosure is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. As used in this disclosure and the claims, the singular forms "a," "an," and "the" are intended to include the plural forms unless the context clearly dictates otherwise. It will be further understood that the term "and/or" as used in this specification and the claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

如在本说明书和权利要求书中所使用的那样，术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地，短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".

由于神经网络算法的广泛使用，计算机硬件运算人能力的不断提升，实际应用中所涉及到的数据运算的种类和数量不断提高。由于编程语言的种类多样，在不同的语言环境下，为实现标量运算的运算过程，相关技术中，由于现阶段没有能广泛适用于各类编程语言的标量类型转换指令，技术人员需要自定义对应其编程语言环境的多条指令来实现标量的类型转换，导致进行类型转换的效率低、速度慢。本公开提供一种类型转换指令处理方法、装置、计算机设备和存储介质，仅用一个指令即可以实现标量类型转换，能够显著提高进行标量类型转换的效率和速度。Due to the widespread use of neural network algorithms and the continuous improvement of computer hardware computing capabilities, the types and quantities of data operations involved in practical applications continue to increase. Due to the various types of programming languages, in order to implement the operation process of scalar operations in different language environments, in related technologies, since there is currently no scalar type conversion instruction that can be widely applied to various programming languages, technicians need to customize the corresponding Multiple instructions in its programming language environment are used to implement scalar type conversion, resulting in low efficiency and slow speed of type conversion. The present disclosure provides a type conversion instruction processing method, device, computer equipment and storage medium, which can realize scalar type conversion with only one instruction, and can significantly improve the efficiency and speed of scalar type conversion.

图1示出根据本公开一实施例的标量类型转换指令处理装置的框图。如图1所示，该装置包括控制模块11和运算模块12。FIG. 1 shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. As shown in Figure 1, the device includes a control module 11 and an operation module 12.

控制模块11，用于对获取到的标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域，并根据操作码和操作域获取执行标量类型转换指令所需的待运算标量和目标地址，以及确定目标数据类型和待运算标量的初始数据类型。其中，操作码用于指示标量类型转换指令对数据所进行的运算为标量类型转换运算，操作域包括待运算标量地址和目标地址。The control module 11 is used to parse the obtained scalar type conversion instruction, obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the scalar to be operated and the target required to execute the scalar type conversion instruction according to the operation code and operation domain. address, and determines the target data type and the initial data type of the scalar to be operated on. Among them, the operation code is used to indicate that the operation performed on the data by the scalar type conversion instruction is a scalar type conversion operation, and the operation domain includes the scalar address to be operated and the target address.

运算模块12，用于根据目标数据类型对初始数据类型的待运算标量进行标量类型转换运算，获得运算结果，并将运算结果存入目标地址中。其中，运算结果的数据类型为目标数据类型。The operation module 12 is configured to perform a scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type, obtain the operation result, and store the operation result in the target address. Among them, the data type of the operation result is the target data type.

在本实施例中，控制模块可以从待运算标量地址中获取待运算标量。控制模块可以通过数据输入输出单元获得标量类型转换指令和待运算标量，该数据输入输出单元可以为一个或多个数据I/O接口或I/O引脚。In this embodiment, the control module may obtain the scalar to be operated from the address of the scalar to be operated. The control module can obtain the scalar type conversion instructions and the scalar to be operated through the data input and output unit, which can be one or more data I/O interfaces or I/O pins.

在本实施例中，操作码可以是计算机程序中所规定的要执行操作的那一部分指令或字段(通常用代码表示)，是指令序列号，用来告知执行指令的装置具体需要执行哪一条指令。操作域可以是执行对应的指令所需的所有数据的来源，执行对应的指令所需的所有数据包括参数数据、待运算标量、对应的运算方法等等。对于一个标量类型转换指令其必须包括操作码和操作域，其中操作域至少包括待运算标量地址和目标地址。In this embodiment, the operation code can be the part of instructions or fields (usually represented by codes) specified in the computer program to perform the operation. It is the instruction sequence number, which is used to inform the device that executes the instruction which instruction needs to be executed. . The operation domain can be the source of all data required to execute the corresponding instruction. All data required to execute the corresponding instruction include parameter data, scalars to be operated, corresponding operation methods, etc. For a scalar type conversion instruction, it must include an opcode and an operation field, where the operation field at least includes the scalar address to be operated on and the target address.

应当理解的是，本领域技术人员可以根据需要对标量类型转换指令的指令格式以及所包含的操作码和操作域进行设置，本公开对此不作限制。It should be understood that those skilled in the art can set the instruction format of the scalar type conversion instruction as well as the included operation codes and operation fields as needed, and this disclosure does not limit this.

在本实施例中，该装置可以包括一个或多个控制模块，以及一个或多个运算模块，可以根据实际需要对控制模块和运算模块的数量进行设置，本公开对此不作限制。在装置包括一个控制模块时，该控制模块可以接收标量类型转换指令，并控制一个或多个运算模块进行标量类型转换运算。在装置包括多个控制模块时，多个控制模块可以分别接收标量类型转换指令，并控制对应的一个或多个运算模块进行标量类型转换运算。In this embodiment, the device may include one or more control modules and one or more computing modules. The number of control modules and computing modules may be set according to actual needs, and this disclosure does not limit this. When the device includes a control module, the control module can receive scalar type conversion instructions and control one or more operation modules to perform scalar type conversion operations. When the device includes multiple control modules, the multiple control modules can respectively receive scalar type conversion instructions and control one or more corresponding operation modules to perform scalar type conversion operations.

本公开实施例所提供的标量类型转换指令处理装置，该装置包括控制模块和运算模块。控制模块用于对获取到的标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域，并根据操作码和操作域获取执行标量类型转换指令所需的待运算标量和目标地址，以及确定目标数据类型和待运算标量的初始数据类型。运算模块用于根据目标数据类型对初始数据类型的待运算标量进行标量类型转换运算，获得运算结果，并将运算结果存入目标地址中。本公开实施例所提供的标量类型转换指令处理装置的适用范围广，对标量类型转换指令的处理效率高、处理速度快，进行标量类型转换的处理效率高、处理速度快。A scalar type conversion instruction processing device provided by an embodiment of the present disclosure includes a control module and an operation module. The control module is used to parse the obtained scalar type conversion instruction, obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the scalar to be operated and the target address required to execute the scalar type conversion instruction based on the operation code and operation domain. and determining the target data type and the initial data type of the scalar to be operated on. The operation module is used to perform a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type, obtain the operation result, and store the operation result in the target address. The scalar type conversion instruction processing device provided by the embodiments of the present disclosure has a wide application range, has high processing efficiency and fast processing speed for scalar type conversion instructions, and has high processing efficiency and fast processing speed for scalar type conversion.

图2a示出根据本公开一实施例的标量类型转换指令处理装置的框图。在一种可能的实现方式中，如图2a所示，运算模块12可以包括多个标量运算器120，用于执行标量类型转换运算。Figure 2a shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. In a possible implementation, as shown in Figure 2a, the operation module 12 may include multiple scalar operators 120 for performing scalar type conversion operations.

在该实现方式中，运算模块也可以包括一个标量运算器。可以根据所需进行标量类型转换运算的数据量的大小、对标量类型转换运算的处理速度、效率等要求对标量运算器的数量进行设置，本公开对此不作限制。In this implementation, the operation module may also include a scalar operator. The number of scalar operators can be set according to the size of the data required for the scalar type conversion operation, the processing speed, efficiency and other requirements for the scalar type conversion operation, and this disclosure does not limit this.

图2b示出根据本公开一实施例的标量类型转换指令处理装置的框图。在一种可能的实现方式中，如图2b所示，运算模块12可以包括主运算子模块121和多个从运算子模块122。主运算子模块121可以包括多个标量运算器120(图中未示出)。Figure 2b shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. In a possible implementation, as shown in FIG. 2b , the operation module 12 may include a main operation sub-module 121 and a plurality of slave operation sub-modules 122. The main operator module 121 may include multiple scalar operators 120 (not shown in the figure).

主运算子模块121，用于利用多个标量运算器120执行标量类型转换运算，得到运算结果，并将运算结果存入目标地址中。The main operation sub-module 121 is used to use multiple scalar operators 120 to perform scalar type conversion operations, obtain operation results, and store the operation results in the target address.

在一种可能的实现方式中，控制模块11，还用于对获取到的计算指令进行解析，得到计算指令的操作域和操作码，并根据操作域和操作码获取执行计算指令所需的待运算数据。运算模块12，还用于根据计算指令对待运算数据进行运算，得到计算指令的计算结果。其中，运算模块可以包括多个运算器，用于执行与计算指令的运算类型相对应的运算。In a possible implementation, the control module 11 is also used to parse the obtained computing instruction, obtain the operation domain and operation code of the computing instruction, and obtain the required data required to execute the computing instruction according to the operation domain and the operation code. Compute data. The operation module 12 is also used to perform operations on the data to be operated according to the calculation instructions to obtain the calculation results of the calculation instructions. Wherein, the operation module may include a plurality of operators, used to perform operations corresponding to the operation type of the calculation instruction.

在该实现方式中，计算指令可以是其他对标量、向量、矩阵、张量等数据进行算术运算、逻辑运算等运算的指令，本领域技术人员可以根据实际需要对计算指令进行设置，本公开对此不作限制。In this implementation, the calculation instructions can be other instructions that perform arithmetic operations, logical operations, and other operations on data such as scalars, vectors, matrices, tensors, etc. Those skilled in the art can set the calculation instructions according to actual needs. This disclosure provides This is not a limitation.

该实现方式中，运算器可以包括加法器、除法器、乘法器、比较器等能够对数据进行算术运算、逻辑运算等运算的运算器。可以根据所需进行的运算的数据量的大小、运算类型、对数据进行运算的处理速度、效率等要求对运算器的种类及数量进行设置，本公开对此不作限制。In this implementation, the arithmetic units may include adders, dividers, multipliers, comparators and other arithmetic units that can perform arithmetic operations, logical operations and other operations on data. The type and number of operators can be set according to the size of the data required for the operation, the type of operation, the processing speed of the data operation, efficiency and other requirements, and this disclosure does not limit this.

在一种可能的实现方式中，控制模块11，还用于解析计算指令得到多个运算指令，并将待运算数据和多个运算指令发送至主运算子模块121。In a possible implementation, the control module 11 is also configured to parse the calculation instruction to obtain multiple operation instructions, and send the data to be operated and the multiple operation instructions to the main operation sub-module 121 .

主运算子模块121，用于对待运算数据执行前序处理，以及与多个从运算子模块122进行数据和运算指令的传输。The main operation sub-module 121 is used to perform pre-processing on the data to be operated, and to transmit data and operation instructions with multiple slave operation sub-modules 122.

从运算子模块122，用于根据从主运算子模块121传输的数据和运算指令并行执行中间运算得到多个中间结果，并将多个中间结果传输给主运算子模块122。The slave operation sub-module 122 is configured to perform intermediate operations in parallel according to the data and operation instructions transmitted from the main operation sub-module 121 to obtain multiple intermediate results, and transmit the multiple intermediate results to the main operation sub-module 122 .

主运算子模块121，还用于对多个中间结果执行后续处理，得到计算指令的计算结果，并将计算结果存入对应的地址中。The main operation sub-module 121 is also used to perform subsequent processing on multiple intermediate results, obtain the calculation results of the calculation instructions, and store the calculation results in the corresponding address.

在该实现方式中，在计算指令为针对标量、向量数据所进行的运算时，装置可以控制主运算子模块利用其中的运算器进行与计算指令相对应的运算。在计算指令为针对矩阵、张量等维度大于或等于2的数据进行运算时，装置可以控制从运算子模块利用其中的运算器进行与计算指令相对应的运算。In this implementation, when the calculation instruction is an operation performed on scalar or vector data, the device can control the main operation sub-module to use the arithmetic unit therein to perform the operation corresponding to the calculation instruction. When the calculation instruction is to perform operations on data with dimensions greater than or equal to 2, such as matrices and tensors, the device can control the operation sub-module to use the arithmetic unit therein to perform operations corresponding to the calculation instructions.

需要说明的是，本领域技术人员可以根据实际需要对主运算子模块和多个从运算子模块之间的连接方式进行设置，以实现对运算模块的架构设置，例如，运算模块的架构可以是“H”型架构、阵列型架构、树型架构等，本公开对此不作限制。It should be noted that those skilled in the art can set the connection mode between the main operation sub-module and multiple slave operation sub-modules according to actual needs to achieve the architecture setting of the operation module. For example, the architecture of the operation module can be “H”-shaped architecture, array-shaped architecture, tree-shaped architecture, etc., this disclosure does not limit these.

图2c示出根据本公开一实施例的标量类型转换指令处理装置的框图。在一种可能的实现方式中，如图2c所示，运算模块12还可以包括一个或多个分支运算子模块123，该分支运算子模块123用于转发主运算子模块121和从运算子模块122之间的数据和/或运算指令。其中，主运算子模块121与一个或多个分支运算子模块123连接。这样，运算模块中的主运算子模块、分支运算子模块和从运算子模块之间采用“H”型架构连接，通过分支运算子模块转发数据和/或运算指令，节省了对主运算子模块的资源占用，进而提高指令的处理速度。Figure 2c shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. In a possible implementation, as shown in Figure 2c, the operation module 12 may also include one or more branch operator sub-modules 123, which are used to forward the main operator sub-module 121 and the slave operator sub-module 123. 122 data and/or operation instructions. Among them, the main operator sub-module 121 is connected to one or more branch operator sub-modules 123. In this way, the main operation sub-module, branch operation sub-module and slave operation sub-module in the operation module are connected using an "H" type architecture. Data and/or operation instructions are forwarded through the branch operation sub-module, saving the need for the main operation sub-module. resource usage, thereby increasing the instruction processing speed.

图2d示出根据本公开一实施例的标量类型转换指令处理装置的框图。在一种可能的实现方式中，如图2d所示，多个从运算子模块122呈阵列分布。Figure 2d shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. In a possible implementation, as shown in Figure 2d, multiple slave operator sub-modules 122 are distributed in an array.

每个从运算子模块122与相邻的其他从运算子模块122连接，主运算子模块121连接多个从运算子模块122中的k个从运算子模块122，k个从运算子模块122为：第1行的n个从运算子模块122、第m行的n个从运算子模块122以及第1列的m个从运算子模块122。Each slave operator sub-module 122 is connected to other adjacent slave operator sub-modules 122. The main operator sub-module 121 is connected to k slave operator sub-modules 122 in multiple slave operator sub-modules 122. The k slave operator sub-modules 122 are : n slave operator modules 122 in the 1st row, n slave operator modules 122 in the mth row, and m slave operator modules 122 in the 1st column.

其中，如图2d所示，k个从运算子模块仅包括第1行的n个从运算子模块、第m行的n个从运算子模块以及第1列的m个从运算子模块，即该k个从运算子模块为多个从运算子模块中直接与主运算子模块连接的从运算子模块。其中，k个从运算子模块，用于在主运算子模块以及多个从运算子模块之间的数据以及指令的转发。这样，多个从运算子模块呈阵列分布，可以提高主运算子模块向从运算子模块发送数据和/或运算指令速度，进而提高指令的处理速度。Among them, as shown in Figure 2d, the k slave operator sub-modules only include n slave operator sub-modules in the first row, n slave operator sub-modules in the m-th row, and m slave operator sub-modules in the first column, that is, The k slave operator sub-modules are the slave operator sub-modules directly connected to the main operator sub-module among the plurality of slave operator sub-modules. Among them, k slave operator sub-modules are used for forwarding data and instructions between the main operator sub-module and multiple slave operator sub-modules. In this way, multiple slave operation sub-modules are distributed in an array, which can increase the speed at which the main operation sub-module sends data and/or operation instructions to the slave operation sub-modules, thereby increasing the instruction processing speed.

图2e示出根据本公开一实施例的标量类型转换指令处理装置的框图。在一种可能的实现方式中，如图2e所示，运算模块还可以包括树型子模块124。该树型子模块124包括一个根端口401和多个支端口402。根端口401与主运算子模块121连接，多个支端口402与多个从运算子模块122分别连接。其中，树型子模块124具有收发功能，用于转发主运算子模块121和从运算子模块122之间的数据和/或运算指令。这样，通过树型子模块的作用使得运算模块呈树型架构连接，并利用树型子模块的转发功能，可以提高主运算子模块向从运算子模块发送数据和/或运算指令速度，进而提高指令的处理速度。Figure 2e shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. In a possible implementation, as shown in Figure 2e, the operation module may also include a tree sub-module 124. The tree sub-module 124 includes a root port 401 and multiple branch ports 402 . The root port 401 is connected to the main operator sub-module 121, and the plurality of branch ports 402 is connected to the plurality of slave operator sub-modules 122 respectively. Among them, the tree sub-module 124 has a transceiver function and is used to forward data and/or operation instructions between the main operation sub-module 121 and the slave operation sub-module 122. In this way, the operation modules are connected in a tree structure through the function of the tree sub-module, and the forwarding function of the tree sub-module can be used to increase the speed at which the main operation sub-module sends data and/or operation instructions to the slave operation sub-modules, thereby improving The processing speed of instructions.

在一种可能的实现方式中，树型子模块124可以为该装置的可选结果，其可以包括至少一层节点。节点为具有转发功能的线结构，节点本身不具备运算功能。最下层的节点与从运算子模块连接，以转发主运算子模块121和从运算子模块122之间的数据和/或运算指令。特殊地，如树型子模块具有零层节点，该装置则无需树型子模块。In a possible implementation, the tree sub-module 124 may be an optional result of the device, and may include at least one layer of nodes. The node is a line structure with forwarding function, and the node itself does not have computing function. The lowest node is connected to the slave operator sub-module to forward data and/or operation instructions between the main operator sub-module 121 and the slave operator sub-module 122 . Specifically, if the tree sub-module has zero-level nodes, the device does not require the tree sub-module.

在一种可能的实现方式中，树型子模块124可以包括n叉树结构的多个节点，n叉树结构的多个节点可以具有多个层。In a possible implementation, the tree sub-module 124 may include multiple nodes of an n-ary tree structure, and the multiple nodes of the n-ary tree structure may have multiple layers.

举例来说，图2f示出根据本公开一实施例的标量类型转换指令处理装置的框图。如图2f所示，n叉树结构可以是二叉树结构，树型子模块包括2层节点01。最下层节点01与从运算子模块122连接，以转发主运算子模块121和从运算子模块122之间的数据和/或运算指令。For example, FIG. 2f shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. As shown in Figure 2f, the n-ary tree structure can be a binary tree structure, and the tree sub-module includes 2-layer node 01. The lowest node 01 is connected to the slave operator module 122 to forward data and/or operation instructions between the master operator module 121 and the slave operator module 122 .

在该实现方式中，n叉树结构还可以是三叉树结构等，n为大于或等于2的正整数。本领域技术人员可以根据需要对n叉树结构中的n以及n叉树结构中节点的层数进行设置，本公开对此不作限制。In this implementation, the n-ary tree structure may also be a ternary tree structure, etc., and n is a positive integer greater than or equal to 2. Those skilled in the art can set n in the n-ary tree structure and the number of layers of nodes in the n-ary tree structure as needed, and this disclosure does not limit this.

在一种可能的实现方式中，操作域还可以包括初始数据类型和目标数据类型。控制模块11，还用于根据操作域确定目标数据类型和待运算标量的初始数据类型。In a possible implementation, the operation domain may also include an initial data type and a target data type. The control module 11 is also used to determine the target data type and the initial data type of the scalar to be operated according to the operation domain.

在一种可能的实现方式中，操作码还可以用于指示初始数据类型和目标数据类型。控制模块11，还用于根据操作码确定目标数据类型和待运算标量的初始数据类型。In one possible implementation, the opcode can also be used to indicate the initial data type and the target data type. The control module 11 is also used to determine the target data type and the initial data type of the scalar to be operated according to the operation code.

在一种可能的实现方式中，在根据操作码或者操作域均不能确定初始数据类型和/或目标数据类型时，可以根据预先设置的默认初始数据类型和默认目标数据类型确定初始数据类型和/或目标数据类型。可以将预先设置的默认初始数据类型确定为当前该标量类型转换指令的初始数据类型，可以将预先设置的默认目标数据类型确定为当前该标量类型转换指令的目标数据类型。本领域技术人员可以根据实际需要对目标数据类型和初始数据类型的确定方式进行设置，本公开对此不作限制。In a possible implementation, when the initial data type and/or target data type cannot be determined based on the operation code or operation domain, the initial data type and/or target data type can be determined based on the preset default initial data type and default target data type. or target data type. The preset default initial data type may be determined as the current initial data type of the scalar type conversion instruction, and the preset default target data type may be determined as the current target data type of the scalar type conversion instruction. Those skilled in the art can set the determination method of the target data type and the initial data type according to actual needs, and this disclosure does not limit this.

在一种可能的实现方式中，目标数据类型可以包括16位浮点数、32位浮点数、48位浮点数、16位整数、32位整数和48位整数中的任意一种，初始数据类型可以包括16位有符号数、32位有符号数、48位有符号数、16位无符号数、32位无符号数、48位无符号数和指针数据类型中的任意一种。In a possible implementation, the target data type can include any one of 16-bit floating point numbers, 32-bit floating point numbers, 48-bit floating point numbers, 16-bit integers, 32-bit integers, and 48-bit integers, and the initial data type can Including any one of 16-bit signed number, 32-bit signed number, 48-bit signed number, 16-bit unsigned number, 32-bit unsigned number, 48-bit unsigned number and pointer data type.

在该实现方式中，目标数据类型和初始数据类型还可以是如64位整数等数据类型，本领域技术人员可以根据实际需要对目标数据类型和初始数据类型进行设置，只要保证目标数据类型与初始数据类型所指示的数据类型不同即可，本公开对此不作限制。In this implementation, the target data type and the initial data type can also be data types such as 64-bit integers. Those skilled in the art can set the target data type and the initial data type according to actual needs, as long as the target data type is consistent with the initial data type. The data types indicated by the data types only need to be different, and this disclosure does not limit this.

在该实现方式中，可以对上述目标数据类型和初始数据类型的编号、名称等标识(或代码)进行设置，以根据标量转换指令中的标识(或代码)确定标量类型转换指令所指示的目标数据类型和初始数据类型。例如，可以将16位浮点数的标识设置为cvtf16、32位浮点数的标识设置为cvtf32、48位浮点数的标识设置为cvtf48、16位整数的标识设置为cvti16、32位整数的标识设置为cvti32以及将48位整数的标识设置为cvti48。可以将16位有符号数的标识设置为s16、32位有符号数的标识设置为s32、48位有符号数的标识设置为s48、16位无符号数的标识设置为u16、32位无符号数的标识设置为u32、48位无符号数的标识设置为u48和指针数据类型的标识设置为ptr。本领域技术人员可以根据实际需要对目标数据类型和初始数据类型的标识进行设置，本公开对此不作限制。In this implementation, the numbers, names and other identifiers (or codes) of the above-mentioned target data type and initial data type can be set to determine the target indicated by the scalar type conversion instruction based on the identifiers (or codes) in the scalar conversion instruction. Data types and initial data types. For example, you can set the flag of 16-bit floating point numbers to cvtf16, the flag of 32-bit floating point numbers to cvtf32, the flag of 48-bit floating point numbers to cvtf48, the flag of 16-bit integers to cvti16, and the flag of 32-bit integers to cvti32 and sets the identity of the 48-bit integer to cvti48. You can set the identity of the 16-bit signed number to s16, the identity of the 32-bit signed number to s32, the identity of the 48-bit signed number to s48, the identity of the 16-bit unsigned number to u16, and the 32-bit unsigned number. The identifier of the number is set to u32, the identifier of the 48-bit unsigned number is set to u48 and the identifier of the pointer data type is set to ptr. Those skilled in the art can set the identification of the target data type and the initial data type according to actual needs, and this disclosure does not limit this.

在一种可能的实现方式中，如图2a-图2f所示，该装置还可以包括存储模块13。存储模块13用于存储待运算标量。In a possible implementation, as shown in Figures 2a-2f, the device may also include a storage module 13. The storage module 13 is used to store scalars to be operated.

在该实现方式中，存储模块可以包括缓存和寄存器中的一种或多种，缓存可以包括速暂存缓存，还可以包括至少一个NRAM(Neuron Random Access Memory，神经元随机存取存储器)。缓存可以用于存储待运算数据，寄存器可以用于存储待运算标量。In this implementation, the storage module may include one or more of a cache and a register. The cache may include a scratch cache, and may also include at least one NRAM (Neuron Random Access Memory). Cache can be used to store data to be operated on, and registers can be used to store scalars to be operated on.

在一种可能的实现方式中，缓存可以包括神经元缓存。神经元缓存也即上述神经元随机存取存储器，可以用于存储待运算数据中的神经元数据，神经元数据可以包括神经元向量数据。其中，待运算数据包括与进行标量类型转换相关的数据、和/或与其他计算指令的运算相关的数据。In one possible implementation, the cache may include a neuron cache. The neuron cache, also known as the neuron random access memory, can be used to store neuron data in the data to be operated, and the neuron data can include neuron vector data. The data to be operated on includes data related to scalar type conversion and/or data related to the operation of other calculation instructions.

在一种可能的实现方式中，该装置还可以包括直接内存访问模块，用于从存储模块中读取或者存储数据。In a possible implementation, the device may also include a direct memory access module for reading or storing data from the storage module.

在一种可能的实现方式中，如图2a-图2f所示，控制模块11可以包括指令存储子模块111、指令处理子模块112和队列存储子模块113。In a possible implementation, as shown in Figures 2a-2f, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112 and a queue storage sub-module 113.

指令存储子模块111用于存储标量类型转换指令。The instruction storage submodule 111 is used to store scalar type conversion instructions.

指令处理子模块112用于对标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域。The instruction processing sub-module 112 is used to parse the scalar type conversion instruction and obtain the operation code and operation domain of the scalar type conversion instruction.

队列存储子模块113用于存储指令队列，指令队列包括按照执行顺序依次排列的多个待执行指令，多个待执行指令可以包括标量类型转换指令。The queue storage submodule 113 is used to store an instruction queue. The instruction queue includes multiple instructions to be executed that are arranged in sequence according to execution order. The multiple instructions to be executed may include scalar type conversion instructions.

在该实现方式中，待执行指令还可以包括与标量类型转换指令有一定相关性、或者不相关的计算指令，本领域技术人员可以根据实际需要进行设置，本公开对此不作限制。可以根据待执行指令的接收时间、优先级别等对多个待执行指令的执行顺序进行排列获得指令队列，以便于根据指令队列依次执行多个待执行指令。In this implementation, the instructions to be executed may also include calculation instructions that are somewhat related or unrelated to the scalar type conversion instructions. Those skilled in the art can set them according to actual needs, and this disclosure does not limit this. The execution sequence of multiple to-be-executed instructions can be arranged according to the reception time, priority level, etc. of the to-be-executed instructions to obtain an instruction queue, so that the multiple to-be-executed instructions can be executed sequentially according to the instruction queue.

在一种可能的实现方式中，如图2a-图2f所示，控制模块11可以包括依赖关系处理子模块114。In a possible implementation, as shown in Figures 2a-2f, the control module 11 may include a dependency processing sub-module 114.

依赖关系处理子模块114，用于在确定多个待执行指令中的第一待执行指令与第一待执行指令之前的第零待执行指令存在关联关系时，将第一待执行指令缓存在指令存储子模块111中，在第零待执行指令执行完毕后，从指令存储子模块111中提取第一待执行指令发送至运算模块12。其中，第一待执行指令和第零待执行指令是多个待执行指令中的指令。Dependency processing sub-module 114, configured to cache the first instruction to be executed in the instruction when it is determined that the first instruction to be executed among the instructions to be executed is associated with the zeroth instruction to be executed before the first instruction to be executed. In the storage sub-module 111, after the execution of the zeroth instruction to be executed is completed, the first instruction to be executed is extracted from the instruction storage sub-module 111 and sent to the computing module 12. Wherein, the first instruction to be executed and the zeroth instruction to be executed are instructions among multiple instructions to be executed.

其中，第一待执行指令与第一待执行指令之前的第零待执行指令存在关联关系包括：存储第一待执行指令所需数据的第一存储地址区间与存储第零待执行指令所需数据的第零存储地址区间具有重叠的区域。反之，第一待执行指令与第零待执行指令之间没有关联关系可以是第一存储地址区间与第零存储地址区间没有重叠区域。Wherein, the correlation between the first instruction to be executed and the zeroth instruction to be executed before the first instruction to be executed includes: the first storage address interval for storing the data required for the first instruction to be executed and the data required for storing the zeroth instruction to be executed. The zeroth storage address interval of has overlapping areas. On the contrary, the lack of correlation between the first instruction to be executed and the zeroth instruction to be executed may mean that there is no overlapping area between the first storage address interval and the zeroth storage address interval.

通过这种方式，可以根据待执行指令之间的依赖关系，使得在先的待执行指令执行完毕之后，再执行在后的待执行指令，保证运算结果的准确性。In this way, according to the dependency relationship between the instructions to be executed, after the previous instructions to be executed are executed, the subsequent instructions to be executed can be executed to ensure the accuracy of the operation results.

在一种可能的实现方式中，标量类型转换指令的指令格式可以是：In a possible implementation, the instruction format of the scalar type conversion instruction can be:

scalar dst src0 opcode.typescalar dst src0 opcode.type

其中，scalar是标量类型转换指令的操作码，dst、src0、opcode.type是标量类型转换指令的操作域。其中，dst是目标地址。src0是待运算标量地址。opcode.type中的opcode是目标数据类型，opcode.type中的type是待运算标量的初始数据类型。Among them, scalar is the opcode of the scalar type conversion instruction, and dst, src0, and opcode.type are the operation fields of the scalar type conversion instruction. Among them, dst is the destination address. src0 is the scalar address to be operated on. The opcode in opcode.type is the target data type, and the type in opcode.type is the initial data type of the scalar to be operated on.

在一种可能的实现方式中，标量类型转换指令的指令格式还可以是：In a possible implementation, the instruction format of the scalar type conversion instruction can also be:

opcode.scalar.type dstsrc0opcode.scalar.type dstsrc0

其中，opcode.scalar.type是标量类型转换指令的操作码，dst、src0是标量类型转换指令的操作域。其中，opcode.scalar.type中的opcode用于指示目标数据类型，opcode.scalar.type中的type用于指示待运算标量的初始数据类型，opcode.scalar.type中的scalar用于指示该指令为标量类型转换指令。dst是目标地址，src0是待运算标量地址。Among them, opcode.scalar.type is the opcode of the scalar type conversion instruction, and dst and src0 are the operation fields of the scalar type conversion instruction. Among them, opcode in opcode.scalar.type is used to indicate the target data type, type in opcode.scalar.type is used to indicate the initial data type of the scalar to be operated, and scalar in opcode.scalar.type is used to indicate that the instruction is Scalar type conversion instructions. dst is the target address, and src0 is the scalar address to be operated on.

应当理解的是，本领域技术人员可以根据需要对标量类型转换指令的操作码、指令格式中操作码和操作域的位置进行设置，本公开对此不作限制。It should be understood that those skilled in the art can set the operation code of the scalar type conversion instruction, the position of the operation code and the operation field in the instruction format as needed, and this disclosure does not limit this.

在一种可能的实现方式中，该装置可以设置于图形处理器(Graphics ProcessingUnit，简称GPU)、中央处理器(Central Processing Unit，简称CPU)和嵌入式神经网络处理器(Neural-network Processing Unit，简称NPU)的一种或多种之中。In a possible implementation, the device can be provided in a graphics processor (Graphics Processing Unit, referred to as GPU), a central processing unit (Central Processing Unit, referred to as CPU), and an embedded neural network processor (Neural-network Processing Unit, (referred to as NPU) one or more.

需要说明的是，尽管以上述实施例作为示例介绍了标量类型转换指令处理装置如上，但本领域技术人员能够理解，本公开应不限于此。事实上，用户完全可根据个人喜好和/或实际应用场景灵活设定各模块，只要符合本公开的技术方案即可。It should be noted that although the above embodiment is used as an example to introduce the scalar type conversion instruction processing device, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, users can flexibly set each module according to personal preferences and/or actual application scenarios, as long as they comply with the technical solution of the present disclosure.

应用示例Application examples

以下结合“利用标量类型转换指令处理装置进行标量类型转换运算”作为一个示例性应用场景，给出根据本公开实施例的应用示例，以便于理解标量类型转换指令处理装置的流程。本领域技术人员应理解，以下应用示例仅仅是出于便于理解本公开实施例的目的，不应视为对本公开实施例的限制。In the following, in conjunction with "Using a scalar type conversion instruction processing device to perform scalar type conversion operations" as an exemplary application scenario, application examples according to embodiments of the present disclosure are given to facilitate understanding of the flow of the scalar type conversion instruction processing device. Those skilled in the art should understand that the following application examples are only for the purpose of facilitating understanding of the embodiments of the present disclosure and should not be regarded as limiting the embodiments of the present disclosure.

图3示出根据本公开一实施例的标量类型转换指令处理装置的应用场景的示意图。如图3所示，标量类型转换指令处理装置对标量类型转换指令进行处理的过程如下：FIG. 3 shows a schematic diagram of an application scenario of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. As shown in Figure 3, the process of the scalar type conversion instruction processing device processing the scalar type conversion instruction is as follows:

控制模块11对获取到的标量类型转换指令1(如标量类型转换指令1为scalar 500100 cvtf16.u32)进行解析，得到标量类型转换指令1的操作码和操作域。其中，标量类型转换指令1的操作码为scalar，目标地址为500，待运算标量地址为100，目标数据类型为cvtf16(也即16为浮点数)，待运算标量的初始数据类型为u32(也即32位无符号数)。控制模块11从待运算标量地址100获取待运算标量。The control module 11 analyzes the obtained scalar type conversion instruction 1 (for example, the scalar type conversion instruction 1 is scalar 500100 cvtf16.u32), and obtains the operation code and operation domain of the scalar type conversion instruction 1. Among them, the opcode of scalar type conversion instruction 1 is scalar, the target address is 500, the address of the scalar to be operated is 100, the target data type is cvtf16 (that is, 16 is a floating point number), and the initial data type of the scalar to be operated is u32 (also That is, a 32-bit unsigned number). The control module 11 obtains the scalar to be operated from the address 100 of the scalar to be operated.

运算模块12根据目标数据类型对初始数据类型的待运算标量进行标量类型转换运算(也即将32位无符号数的待运算标量的数据类型转换为16为浮点数)，得到运算结果，并将运算结果存入目标地址500中。The operation module 12 performs a scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type (that is, converts the data type of the 32-bit unsigned scalar to be operated into a 16-bit floating point number), obtains the operation result, and performs the operation The result is stored in target address 500.

以上各模块的工作过程可参考上文的相关描述。For the working process of each of the above modules, please refer to the relevant descriptions above.

这样，标量类型转换指令处理装置可以高效、快速地对标量类型转换指令进行处理，进行标量类型转换的处理效率高、处理速度快。In this way, the scalar type conversion instruction processing device can process the scalar type conversion instruction efficiently and quickly, and the processing efficiency of the scalar type conversion is high and the processing speed is fast.

本公开提供一种机器学习运算装置，该机器学习运算装置可以包括一个或多个上述标量类型转换指令处理装置，用于从其他处理装置中获取待运算标量和控制信息，执行指定的机器学习运算。该机器学习运算装置可以从其他机器学习运算装置或非机器学习运算装置中获得标量类型转换指令，并将执行结果通过I/O接口传递给外围设备(也可称其他处理装置)。外围设备譬如摄像头，显示器，鼠标，键盘，网卡，wifi接口，服务器。当包含一个以上标量类型转换指令处理装置时，标量类型转换指令处理装置间可以通过特定的结构进行链接并传输数据，譬如，通过PCIE总线进行互联并传输数据，以支持更大规模的神经网络的运算。此时，可以共享同一控制系统，也可以有各自独立的控制系统；可以共享内存，也可以每个加速器有各自的内存。此外，其互联方式可以是任意互联拓扑。The present disclosure provides a machine learning operation device. The machine learning operation device may include one or more of the above-mentioned scalar type conversion instruction processing devices, used to obtain scalars to be operated and control information from other processing devices, and perform specified machine learning operations. . The machine learning computing device can obtain scalar type conversion instructions from other machine learning computing devices or non-machine learning computing devices, and transfer the execution results to peripheral devices (also called other processing devices) through the I/O interface. Peripheral devices such as cameras, monitors, mice, keyboards, network cards, wifi interfaces, and servers. When more than one scalar type conversion instruction processing device is included, the scalar type conversion instruction processing devices can be linked and transmit data through a specific structure, such as interconnection and data transmission through the PCIE bus to support larger-scale neural networks. Operation. At this time, the same control system can be shared, or there can be independent control systems; the memory can be shared, or each accelerator can have its own memory. In addition, its interconnection method can be any interconnection topology.

该机器学习运算装置具有较高的兼容性，可通过PCIE接口与各种类型的服务器相连接。The machine learning computing device has high compatibility and can be connected to various types of servers through the PCIE interface.

图4a示出根据本公开一实施例的组合处理装置的框图。如图4a所示，该组合处理装置包括上述机器学习运算装置、通用互联接口和其他处理装置。机器学习运算装置与其他处理装置进行交互，共同完成用户指定的操作。Figure 4a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in Figure 4a, the combined processing device includes the above-mentioned machine learning computing device, a universal interconnection interface and other processing devices. The machine learning computing device interacts with other processing devices to jointly complete user-specified operations.

其他处理装置，包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为机器学习运算装置与外部数据和控制的接口，包括数据搬运，完成对本机器学习运算装置的开启、停止等基本控制；其他处理装置也可以和机器学习运算装置协作共同完成运算任务。Other processing devices include one or more processor types among general/special-purpose processors such as central processing units (CPUs), graphics processors (GPUs), and neural network processors. There is no limit on the number of processors included in other processing devices. Other processing devices serve as the interface between the machine learning computing device and external data and control, including data transfer, to complete basic control such as starting and stopping the machine learning computing device; other processing devices can also cooperate with the machine learning computing device to complete computing tasks.

通用互联接口，用于在机器学习运算装置与其他处理装置间传输数据和控制指令。该机器学习运算装置从其他处理装置中获取所需的输入数据，写入机器学习运算装置片上的存储装置；可以从其他处理装置中获取控制指令，写入机器学习运算装置片上的控制缓存；也可以读取机器学习运算装置的存储模块中的数据并传输给其他处理装置。A universal interconnect interface used to transmit data and control instructions between machine learning computing devices and other processing devices. The machine learning computing device obtains the required input data from other processing devices and writes them into the on-chip storage device of the machine learning computing device; it can obtain control instructions from other processing devices and write them into the on-chip control cache of the machine learning computing device; also Data in the storage module of the machine learning computing device can be read and transmitted to other processing devices.

图4b示出根据本公开一实施例的组合处理装置的框图。在一种可能的实现方式中，如图4b所示，该组合处理装置还可以包括存储装置，存储装置分别与机器学习运算装置和所述其他处理装置连接。存储装置用于保存在机器学习运算装置和所述其他处理装置的数据，尤其适用于所需要运算的数据在本机器学习运算装置或其他处理装置的内部存储中无法全部保存的数据。Figure 4b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation, as shown in Figure 4b, the combined processing device may also include a storage device, and the storage device is connected to the machine learning computing device and the other processing devices respectively. The storage device is used to store data in the machine learning arithmetic device and the other processing devices, and is particularly suitable for data requiring calculations that cannot be fully stored in the internal storage of the machine learning arithmetic device or other processing devices.

该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统，有效降低控制部分的核心面积，提高处理速度，降低整体功耗。此情况时，该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头，显示器，鼠标，键盘，网卡，wifi接口。This combined processing device can be used as a SOC system-on-chip for mobile phones, robots, drones, video surveillance equipment and other equipment, effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption. In this case, the universal interconnection interface of the combined processing device is connected to certain components of the device. Certain components such as cameras, monitors, mice, keyboards, network cards, and wifi interfaces.

本公开提供一种机器学习芯片，该芯片包括上述机器学习运算装置或组合处理装置。The present disclosure provides a machine learning chip, which includes the above machine learning computing device or combined processing device.

本公开提供一种机器学习芯片封装结构，该机器学习芯片封装结构包括上述机器学习芯片。The present disclosure provides a machine learning chip packaging structure, which includes the above machine learning chip.

本公开提供一种板卡，图5示出根据本公开一实施例的板卡的结构示意图。如图5所示，该板卡包括上述机器学习芯片封装结构或者上述机器学习芯片。板卡除了包括机器学习芯片389以外，还可以包括其他的配套部件，该配套部件包括但不限于：存储器件390、接口装置391和控制器件392。The present disclosure provides a board card, and FIG. 5 shows a schematic structural diagram of the board card according to an embodiment of the present disclosure. As shown in Figure 5, the board card includes the above-mentioned machine learning chip packaging structure or the above-mentioned machine learning chip. In addition to the machine learning chip 389 , the board card may also include other supporting components, including but not limited to: a storage device 390 , an interface device 391 and a control device 392 .

存储器件390与机器学习芯片389(或者机器学习芯片封装结构内的机器学习芯片)通过总线连接，用于存储数据。存储器件390可以包括多组存储单元393。每一组存储单元393与机器学习芯片389通过总线连接。可以理解，每一组存储单元393可以是DDR SDRAM(英文：Double Data Rate SDRAM，双倍速率同步动态随机存储器)。The storage device 390 is connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure) through a bus for storing data. Memory device 390 may include multiple sets of memory cells 393 . Each group of storage units 393 is connected to the machine learning chip 389 through a bus. It can be understood that each group of storage units 393 can be DDR SDRAM (English: Double Data Rate SDRAM, double rate synchronous dynamic random access memory).

DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.

在一个实施例中，存储器件390可以包括4组存储单元393。每一组存储单元393可以包括多个DDR4颗粒(芯片)。在一个实施例中，机器学习芯片389内部可以包括4个72位DDR4控制器，上述72位DDR4控制器中64bit用于传输数据，8bit用于ECC校验。可以理解，当每一组存储单元393中采用DDR4-3200颗粒时，数据传输的理论带宽可达到25600MB/s。In one embodiment, memory device 390 may include four sets of memory cells 393. Each group of memory cells 393 may include multiple DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include four 72-bit DDR4 controllers, of which 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of storage units 393, the theoretical bandwidth of data transmission can reach 25600MB/s.

在一个实施例中，每一组存储单元393包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在机器学习芯片389中设置控制DDR的控制器，用于对每个存储单元393的数据传输与数据存储的控制。In one embodiment, each group of memory cells 393 includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transmission and data storage of each storage unit 393 .

接口装置391与机器学习芯片389(或者机器学习芯片封装结构内的机器学习芯片)电连接。接口装置391用于实现机器学习芯片389与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中，接口装置391可以为标准PCIE接口。比如，待处理的数据由服务器通过标准PCIE接口传递至机器学习芯片289，实现数据转移。优选的，当采用PCIE 3.0X 16接口传输时，理论带宽可达到16000MB/s。在另一个实施例中，接口装置391还可以是其他的接口，本公开并不限制上述其他的接口的具体表现形式，接口装置能够实现转接功能即可。另外，机器学习芯片的计算结果仍由接口装置传送回外部设备(例如服务器)。The interface device 391 is electrically connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure). The interface device 391 is used to implement data transmission between the machine learning chip 389 and external devices (such as servers or computers). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the machine learning chip 289 through the standard PCIE interface to realize data transfer. Preferably, when using the PCIE 3.0X 16 interface for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device 391 can also be other interfaces. This disclosure does not limit the specific expression forms of the above-mentioned other interfaces, as long as the interface device can realize the switching function. In addition, the calculation results of the machine learning chip are still sent back to the external device (such as a server) through the interface device.

控制器件392与机器学习芯片389电连接。控制器件392用于对机器学习芯片389的状态进行监控。具体的，机器学习芯片389与控制器件392可以通过SPI接口电连接。控制器件392可以包括单片机(Micro Controller Unit，MCU)。如机器学习芯片389可以包括多个处理芯片、多个处理核或多个处理电路，可以带动多个负载。因此，机器学习芯片389可以处于多负载和轻负载等不同的工作状态。通过控制器件可以实现对机器学习芯片中多个处理芯片、多个处理和/或多个处理电路的工作状态的调控。The control device 392 is electrically connected to the machine learning chip 389. The control device 392 is used to monitor the status of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 can be electrically connected through an SPI interface. The control device 392 may include a Micro Controller Unit (MCU). For example, the machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and may drive multiple loads. Therefore, the machine learning chip 389 can be in different working states such as multi-load and light load. The control device can control the working status of multiple processing chips, multiple processes and/or multiple processing circuits in the machine learning chip.

本公开提供一种电子设备，该电子设备包括上述机器学习芯片或板卡。The present disclosure provides an electronic device, which includes the above machine learning chip or board.

电子设备可以包括数据处理装置、计算机设备、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。Electronic equipment may include data processing devices, computer equipment, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, video cameras, projectors , watches, headphones, mobile storage, wearable devices, transportation, home appliances, and/or medical equipment.

交通工具可以包括飞机、轮船和/或车辆。家用电器可以包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机。医疗设备可以包括核磁共振仪、B超仪和/或心电图仪。Transportation may include aircraft, ships, and/or vehicles. Household appliances can include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, lights, gas stoves, and range hoods. Medical equipment may include MRI machines, B-ultrasound machines, and/or electrocardiographs.

图6示出根据本公开一实施例的标量类型转换指令处理方法的流程图。该方法可以应用于包含存储器和处理器的如计算机设备等，其中，存储器用于存储执行方法过程中所使用的数据；处理器用于执行相关的处理、运算步骤，如执行下述步骤S51和步骤S52。如图6所示，该方法应用于上述标量类型转换指令处理装置，该方法包括步骤S51和步骤S52。FIG. 6 shows a flowchart of a scalar type conversion instruction processing method according to an embodiment of the present disclosure. This method can be applied to computer equipment including a memory and a processor, where the memory is used to store data used in executing the method; the processor is used to perform related processing and operation steps, such as performing the following steps S51 and S52. As shown in Figure 6, this method is applied to the above-mentioned scalar type conversion instruction processing device, and the method includes step S51 and step S52.

在步骤S51中，利用控制模块对获取到的标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域，并根据操作码和操作域获取执行标量类型转换指令所需的待运算标量和目标地址，以及确定目标数据类型和待运算标量的初始数据类型。其中，操作码用于指示标量类型转换指令对数据所进行的运算为标量类型转换运算，操作域包括待运算标量地址和目标地址。In step S51, the control module is used to parse the obtained scalar type conversion instruction to obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the scalar to be operated required to execute the scalar type conversion instruction according to the operation code and operation domain. and the destination address, as well as determining the destination data type and the initial data type of the scalar to be operated on. Among them, the operation code is used to indicate that the operation performed on the data by the scalar type conversion instruction is a scalar type conversion operation, and the operation domain includes the scalar address to be operated and the target address.

在步骤S52中，利用运算模块根据目标数据类型对初始数据类型的待运算标量进行标量类型转换运算，获得运算结果，并将运算结果存入目标地址中，运算结果的数据类型为目标数据类型。In step S52, the operation module is used to perform a scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type to obtain the operation result, and the operation result is stored in the target address. The data type of the operation result is the target data type.

在一种可能的实现方式中，根据目标数据类型对初始数据类型的待运算标量进行标量类型转换运算，可以包括：In one possible implementation, a scalar type conversion operation is performed on the scalar of the initial data type to be operated on according to the target data type, which may include:

利用运算模块中的多个标量运算器执行标量类型转换运算。Use multiple scalar operators in the operation module to perform scalar type conversion operations.

在一种可能的实现方式中，运算模块包括主运算子模块和多个从运算子模块，主运算子模块包括多个标量运算器。其中，步骤S52可以包括：In a possible implementation manner, the operation module includes a main operation sub-module and multiple slave operation sub-modules, and the main operation sub-module includes multiple scalar operators. Among them, step S52 may include:

利用主运算子模块中的多个标量运算器执行标量类型转换运算，得到运算结果，并将运算结果存入目标地址中。Use multiple scalar operators in the main operation submodule to perform scalar type conversion operations, obtain the operation results, and store the operation results in the target address.

在一种可能的实现方式中，操作域还可以包括初始数据类型和目标数据类型，步骤S51可以包括：根据操作域确定目标数据类型和待运算标量的初始数据类型。In a possible implementation, the operation domain may also include an initial data type and a target data type, and step S51 may include: determining the target data type and the initial data type of the scalar to be operated according to the operation domain.

在一种可能的实现方式中，操作码还用于指示初始数据类型和目标数据类型，步骤S51可以包括：根据操作码确定目标数据类型和待运算标量的初始数据类型In a possible implementation, the operation code is also used to indicate the initial data type and the target data type. Step S51 may include: determining the target data type and the initial data type of the scalar to be operated according to the operation code.

在一种可能的实现方式中，该方法还可以包括：利用装置的存储模块存储待运算标量，In a possible implementation, the method may also include: using a storage module of the device to store the scalar to be calculated,

其中，存储模块包括寄存器和缓存中的至少一种，Wherein, the storage module includes at least one of a register and a cache,

缓存，用于存储待运算数据，缓存包括至少一个神经元缓存NRAM；The cache is used to store data to be calculated. The cache includes at least one neuron cache NRAM;

寄存器，用于存储待运算标量；Register, used to store scalars to be operated on;

神经元缓存，用于存储待运算数据中的神经元数据，神经元数据包括神经元向量数据。Neuron cache is used to store neuron data in the data to be operated. Neuron data includes neuron vector data.

在一种可能的实现方式中，步骤S51可以包括：In a possible implementation, step S51 may include:

存储标量类型转换指令；Storage scalar type conversion instructions;

对标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域；Analyze the scalar type conversion instruction to obtain the operation code and operation domain of the scalar type conversion instruction;

存储指令队列，指令队列包括按照执行顺序依次排列的多个待执行指令，多个待执行指令可以包括标量类型转换指令。The instruction queue is stored. The instruction queue includes multiple instructions to be executed that are arranged in order of execution. The multiple instructions to be executed may include scalar type conversion instructions.

在一种可能的实现方式中，该方法还可以包括：In a possible implementation, the method may also include:

在确定多个待执行指令中的第一待执行指令与第一待执行指令之前的第零待执行指令存在关联关系时，缓存第一待执行指令，并在确定第零待执行指令执行完毕后，控制进行第一待执行指令的执行，When it is determined that the first instruction to be executed among the plurality of instructions to be executed is associated with the zeroth instruction to be executed before the first instruction to be executed, the first instruction to be executed is cached, and after it is determined that the execution of the zeroth instruction to be executed is completed , control the execution of the first instruction to be executed,

其中，第一待执行指令与第一待执行指令之前的第零待执行指令存在关联关系包括：Among them, the relationship between the first instruction to be executed and the zeroth instruction to be executed before the first instruction to be executed includes:

存储第一待执行指令所需数据的第一存储地址区间与存储第零待执行指令所需数据的第零存储地址区间具有重叠的区域。The first storage address interval that stores the data required by the first instruction to be executed has an overlapping area with the zeroth storage address interval that stores the data required by the zeroth instruction to be executed.

需要说明的是，尽管以上述实施例作为示例介绍了标量类型转换指令处理方法如上，但本领域技术人员能够理解，本公开应不限于此。事实上，用户完全可根据个人喜好和/或实际应用场景灵活设定各步骤，只要符合本公开的技术方案即可。It should be noted that although the above embodiment is used as an example to introduce the scalar type conversion instruction processing method as above, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, users can flexibly set each step according to personal preferences and/or actual application scenarios, as long as they comply with the technical solution of the present disclosure.

本公开实施例所提供的标量类型转换指令处理方法的适用范围广，对标量类型转换指令的处理效率高、处理速度快，进行标量类型转换的处理效率高、处理速度快。The scalar type conversion instruction processing method provided by the embodiments of the present disclosure has a wide range of application, has high processing efficiency and fast processing speed for scalar type conversion instructions, and has high processing efficiency and fast processing speed for scalar type conversion.

本公开还提供一种非易失性计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令被处理器执行时实现上述标量类型转换指令处理方法。The present disclosure also provides a non-volatile computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the above-mentioned scalar type conversion instruction processing method is implemented.

需要说明的是，对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本披露并不受所描述的动作顺序的限制，因为依据本披露，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于可选实施例，所涉及的动作和模块并不一定是本披露所必须的。It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the present disclosure is not limited by the described action sequence. Because certain steps may be performed in other orders or concurrently in accordance with this disclosure. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily necessary for this disclosure.

进一步需要说明的是，虽然图6的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，图6中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些子步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be further noted that although each step in the flowchart of FIG. 6 is shown in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated in this article, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 6 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily sequential, but may be performed in turn or alternately with other steps or sub-steps of other steps or at least part of the stages.

应该理解，上述的装置实施例仅是示意性的，本披露的装置还可通过其它的方式实现。例如，上述实施例中所述单元/模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。例如，多个单元、模块或组件可以结合，或者可以集成到另一个系统，或一些特征可以忽略或不执行。It should be understood that the above device embodiments are only illustrative, and the device of the present disclosure can also be implemented in other ways. For example, the division of units/modules in the above embodiments is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units, modules or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.

另外，若无特别说明，在本披露各个实施例中的各功能单元/模块可以集成在一个单元/模块中，也可以是各个单元/模块单独物理存在，也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现，也可以采用软件程序模块的形式实现。In addition, unless otherwise specified, each functional unit/module in each embodiment of the present disclosure may be integrated into one unit/module, or each unit/module may exist physically alone, or there may be two or more units/modules. Modules are integrated together. The above integrated units/modules can be implemented in the form of hardware or software program modules.

所述集成的单元/模块如果以硬件的形式实现时，该硬件可以是数字电路，模拟电路等等。硬件结构的物理实现包括但不局限于晶体管，忆阻器等等。若无特别说明，若无特别说明，上述存储模块可以是任何适当的磁存储介质或者磁光存储介质，比如，阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic RandomAccess Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, etc. The physical implementation of hardware structures includes but is not limited to transistors, memristors, etc. Unless otherwise specified, the above-mentioned storage module can be any appropriate magnetic storage medium or magneto-optical storage medium, such as resistive random access memory RRAM (Resistive Random Access Memory), dynamic random access memory DRAM (Dynamic Random Access Memory). Memory), static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid memory cube HMC (Hybrid Memory Cube) and so on.

所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储器中。基于这样的理解，本披露的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储器中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本披露各个实施例所述方法的全部或部分步骤。而前述的存储器包括：U盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit/module, if implemented in the form of a software program module and sold or used as an independent product, may be stored in a computer-readable memory. Based on this understanding, the technical solution of the present disclosure is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to cause a computer device (which can be a personal computer, a server or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments. The technical features of the above embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, all possible combinations should be used. It is considered to be within the scope of this manual.

依据以下条款可以更好的理解前述内容：The foregoing content can be better understood according to the following terms:

条款A1、一种标量类型转换指令处理装置，所述装置包括：Clause A1, a scalar type conversion instruction processing device, the device includes:

条款A2、根据条款A1所述的装置，所述运算模块，包括：Clause A2. The device according to Clause A1, the computing module includes:

多个标量运算器，用于执行所述标量类型转换运算。A plurality of scalar operators used to perform the scalar type conversion operation.

条款A3、根据条款A2所述的装置，所述运算模块包括主运算子模块和多个从运算子模块，所述主运算子模块包括所述多个标量运算器，Clause A3. The device according to Clause A2, the operation module includes a main operator sub-module and a plurality of slave operator sub-modules, the main operator sub-module includes the plurality of scalar operators,

所述主运算子模块，用于利用所述多个标量运算器执行所述标量类型转换运算，得到运算结果，并将所述运算结果存入所述目标地址中。The main operation submodule is configured to use the plurality of scalar operators to perform the scalar type conversion operation, obtain an operation result, and store the operation result in the target address.

条款A4、根据条款A1所述的装置，所述操作域还包括初始数据类型和目标数据类型，Clause A4. The device according to Clause A1, the operation domain also includes an initial data type and a target data type,

其中，所述控制模块，还用于根据所述操作域确定目标数据类型和所述待运算标量的初始数据类型。Wherein, the control module is further configured to determine the target data type and the initial data type of the scalar to be operated according to the operation domain.

条款A5、根据条款A1所述的装置，所述操作码还用于指示初始数据类型和目标数据类型，Clause A5. The device according to Clause A1, wherein the operation code is further used to indicate the initial data type and the target data type,

其中，所述控制模块，还用于根据所述操作码确定目标数据类型和所述待运算标量的初始数据类型。Wherein, the control module is further configured to determine the target data type and the initial data type of the scalar to be operated according to the operation code.

条款A6、根据条款A1所述的装置，所述目标数据类型包括16位浮点数、32位浮点数、48位浮点数、16位整数、32位整数和48位整数中的任意一种，所述初始数据类型包括16位有符号数、32位有符号数、48位有符号数、16位无符号数、32位无符号数、48位无符号数和指针数据类型中的任意一种。Clause A6. The device according to clause A1, the target data type includes any one of 16-bit floating point number, 32-bit floating point number, 48-bit floating point number, 16-bit integer, 32-bit integer and 48-bit integer, so The initial data types include any one of 16-bit signed numbers, 32-bit signed numbers, 48-bit signed numbers, 16-bit unsigned numbers, 32-bit unsigned numbers, 48-bit unsigned numbers, and pointer data types.

条款A7、根据条款A1所述的装置，所述装置还包括：Clause A7. The device according to Clause A1, the device further comprising:

存储模块，用于存储所述待运算标量，Storage module, used to store the scalar to be calculated,

其中，所述存储模块包括寄存器和缓存中的至少一种，Wherein, the storage module includes at least one of a register and a cache,

所述缓存，用于存储所述待运算数据，所述缓存包括至少一个神经元缓存NRAM；The cache is used to store the data to be calculated, and the cache includes at least one neuron cache NRAM;

所述寄存器，用于存储所述待运算标量；The register is used to store the scalar to be operated;

所述神经元缓存，用于存储所述待运算数据中的神经元数据，所述神经元数据包括神经元向量数据。The neuron cache is used to store neuron data in the data to be operated, where the neuron data includes neuron vector data.

条款A8、根据条款A1所述的装置，所述控制模块包括：Clause A8. The device according to Clause A1, the control module includes:

指令存储子模块，用于存储所述标量类型转换指令；Instruction storage submodule, used to store the scalar type conversion instructions;

指令处理子模块，用于对所述标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域；The instruction processing submodule is used to parse the scalar type conversion instruction and obtain the operation code and operation domain of the scalar type conversion instruction;

队列存储子模块，用于存储指令队列，所述指令队列包括按照执行顺序依次排列的多个待执行指令，所述多个待执行指令包括所述标量类型转换指令。The queue storage submodule is used to store an instruction queue, where the instruction queue includes a plurality of instructions to be executed that are arranged in sequence according to execution order, and the plurality of instructions to be executed include the scalar type conversion instruction.

条款A9、根据条款A8所述的装置，所述控制模块，还包括：Clause A9. The device according to Clause A8, the control module further includes:

依赖关系处理子模块，用于在确定所述多个待执行指令中的第一待执行指令与所述第一待执行指令之前的第零待执行指令存在关联关系时，将所述第一待执行指令缓存在所述指令存储子模块中，在所述第零待执行指令执行完毕后，从所述指令存储子模块中提取所述第一待执行指令发送至所述运算模块，Dependency processing submodule, configured to: when it is determined that the first instruction to be executed among the plurality of instructions to be executed is associated with the zeroth instruction to be executed before the first instruction to be executed, the first instruction to be executed is Execution instructions are cached in the instruction storage sub-module. After the execution of the zeroth instruction to be executed is completed, the first instruction to be executed is extracted from the instruction storage sub-module and sent to the operation module,

其中，所述第一待执行指令与所述第一待执行指令之前的第零待执行指令存在关联关系包括：Wherein, the correlation between the first instruction to be executed and the zeroth instruction to be executed before the first instruction to be executed includes:

存储所述第一待执行指令所需数据的第一存储地址区间与存储所述第零待执行指令所需数据的第零存储地址区间具有重叠的区域。The first storage address interval that stores the data required by the first instruction to be executed has an overlapping area with the zeroth storage address interval that stores the data required by the zeroth instruction to be executed.

条款A10、一种机器学习运算装置，所述装置包括：Clause A10. A machine learning computing device, the device comprising:

一个或多个如条款A1-条款A9任一项所述的标量类型转换指令处理装置，用于从其他处理装置中获取待运算标量和控制信息，并执行指定的机器学习运算，将执行结果通过I/O接口传递给其他处理装置；One or more scalar type conversion instruction processing devices as described in any one of Clause A1 to Clause A9, used to obtain scalars and control information to be operated from other processing devices, and perform specified machine learning operations, and pass the execution results through The I/O interface is passed to other processing devices;

条款A11、一种组合处理装置，所述组合处理装置包括：Clause A11, a combined processing device, the combined processing device includes:

如条款A10所述的机器学习运算装置、通用互联接口和其他处理装置；Machine learning computing devices, universal interconnect interfaces and other processing devices as described in Clause A10;

所述机器学习运算装置与所述其他处理装置进行交互，共同完成用户指定的计算操作，The machine learning computing device interacts with the other processing devices to jointly complete the calculation operations specified by the user,

其中，所述组合处理装置还包括：存储装置，该存储装置分别与所述机器学习运算装置和所述其他处理装置连接，用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is connected to the machine learning computing device and the other processing devices respectively, and is used to save data of the machine learning computing device and the other processing devices.

条款A12、一种机器学习芯片，所述机器学习芯片包括：Clause A12, a machine learning chip, the machine learning chip includes:

如条款A10所述的机器学习运算装置或如条款A11所述的组合处理装置。A machine learning computing device as described in clause A10 or a combined processing device as described in clause A11.

条款A13、一种电子设备，所述电子设备包括：Clause A13, an electronic device, the electronic device includes:

如条款A12所述的机器学习芯片。Machine learning chips as described in Clause A12.

条款A14、一种板卡，所述板卡包括：存储器件、接口装置和控制器件以及如条款A12所述的机器学习芯片；Clause A14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip as described in Clause A12;

其中，所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接；Wherein, the machine learning chip is connected to the storage device, the control device and the interface device respectively;

所述存储器件，用于存储数据；The storage device is used to store data;

所述接口装置，用于实现所述机器学习芯片与外部设备之间的数据传输；The interface device is used to realize data transmission between the machine learning chip and external equipment;

所述控制器件，用于对所述机器学习芯片的状态进行监控。The control device is used to monitor the status of the machine learning chip.

条款A15、一种标量类型转换指令处理方法，所述方法应用于标量类型转换指令处理装置，所述装置包括控制模块和运算模块，所述方法包括：Clause A15. A method for processing scalar type conversion instructions. The method is applied to a scalar type conversion instruction processing device. The device includes a control module and an operation module. The method includes:

利用控制模块对获取到的标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域，并根据所述操作码和所述操作域获取执行标量类型转换指令所需的待运算标量和目标地址，以及确定目标数据类型和所述待运算标量的初始数据类型；Use the control module to parse the obtained scalar type conversion instruction to obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the scalar sum to be operated required to execute the scalar type conversion instruction based on the operation code and the operation domain. The target address, and determine the target data type and the initial data type of the scalar to be operated;

利用运算模块根据所述目标数据类型对初始数据类型的所述待运算标量进行标量类型转换运算，获得运算结果，并将所述运算结果存入所述目标地址中，所述运算结果的数据类型为所述目标数据类型，Utilize the operation module to perform a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type, obtain the operation result, and store the operation result in the target address. The data type of the operation result for the target data type,

条款A16、根据条款A15所述的方法，根据所述目标数据类型对初始数据类型的所述待运算标量进行标量类型转换运算，包括：Clause A16. According to the method described in clause A15, performing a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type, including:

利用所述运算模块中的多个标量运算器执行所述标量类型转换运算。The scalar type conversion operation is performed using multiple scalar operators in the operation module.

条款A17、根据条款A16所述的方法，所述运算模块包括主运算子模块和多个从运算子模块，所述主运算子模块包括所述多个标量运算器，Clause A17. The method according to Clause A16, the operation module includes a main operator sub-module and a plurality of slave operator sub-modules, the main operator sub-module includes the plurality of scalar operators,

其中，根据所述目标数据类型对初始数据类型的所述待运算标量进行标量类型转换运算，获得运算结果，并将所述运算结果存入所述目标地址中，包括：Wherein, performing a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type, obtaining the operation result, and storing the operation result in the target address, including:

利用所述主运算子模块中的多个标量运算器执行所述标量类型转换运算，得到运算结果，并将所述运算结果存入所述目标地址中。Multiple scalar operators in the main operation sub-module are used to perform the scalar type conversion operation to obtain an operation result, and the operation result is stored in the target address.

条款A18、根据条款A15所述的方法，所述操作域还包括初始数据类型和目标数据类型，Clause A18. According to the method described in Clause A15, the operation domain also includes an initial data type and a target data type,

其中，确定目标数据类型和所述待运算标量的初始数据类型，包括：Among them, determining the target data type and the initial data type of the scalar to be operated includes:

根据所述操作域确定目标数据类型和所述待运算标量的初始数据类型。The target data type and the initial data type of the scalar to be operated are determined according to the operation domain.

条款A19、根据条款A15所述的方法，所述操作码还用于指示初始数据类型和目标数据类型，Clause A19. A method according to Clause A15, said opcode further indicating an initial data type and a target data type,

根据所述操作码确定目标数据类型和所述待运算标量的初始数据类型。The target data type and the initial data type of the scalar to be operated are determined according to the operation code.

条款A20、根据条款A15所述的方法，所述目标数据类型包括16位浮点数、32位浮点数、48位浮点数、16位整数、32位整数和48位整数中的任意一种，所述初始数据类型包括16位有符号数、32位有符号数、48位有符号数、16位无符号数、32位无符号数、48位无符号数和指针数据类型中的任意一种。Clause A20. According to the method described in clause A15, the target data type includes any one of 16-bit floating point number, 32-bit floating point number, 48-bit floating point number, 16-bit integer, 32-bit integer and 48-bit integer, so The initial data types include any one of 16-bit signed numbers, 32-bit signed numbers, 48-bit signed numbers, 16-bit unsigned numbers, 32-bit unsigned numbers, 48-bit unsigned numbers, and pointer data types.

条款A21、根据条款A16所述的方法，所述方法还包括：Clause A21. Method according to Clause A16, said method further comprising:

利用所述装置的存储模块存储所述待运算标量，Utilize the storage module of the device to store the scalar to be calculated,

条款A22、根据条款A15所述的方法，对所述标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域，包括：Clause A22. According to the method described in Clause A15, the scalar type conversion instruction is parsed to obtain the operation code and operation domain of the scalar type conversion instruction, including:

存储所述标量类型转换指令；store the scalar type conversion instruction;

对所述标量类型转换指令进行解析，得到标量类型转换指令的操作码和操作域；Analyze the scalar type conversion instruction to obtain the operation code and operation domain of the scalar type conversion instruction;

存储指令队列，所述指令队列包括按照执行顺序依次排列的多个待执行指令，所述多个待执行指令包括所述标量类型转换指令。An instruction queue is stored, and the instruction queue includes a plurality of instructions to be executed sequentially arranged in an execution order, and the plurality of instructions to be executed include the scalar type conversion instruction.

条款A23、根据条款A22所述的方法，所述方法还包括：Clause A23. A method according to clause A22, further comprising:

在确定所述多个待执行指令中的第一待执行指令与所述第一待执行指令之前的第零待执行指令存在关联关系时，缓存所述第一待执行指令，并在确定所述第零待执行指令执行完毕后，控制进行所述第一待执行指令的执行，When it is determined that the first instruction to be executed among the plurality of instructions to be executed is associated with the zeroth instruction to be executed before the first instruction to be executed, the first instruction to be executed is cached, and when it is determined that the first instruction to be executed is After the execution of the zeroth instruction to be executed is completed, the execution of the first instruction to be executed is controlled,

条款A24、一种非易失性计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令被处理器执行时实现条款A15至条款A23任一项所述的方法。Clause A24. A non-volatile computer-readable storage medium having computer program instructions stored thereon, which when executed by a processor implements the method described in any one of clauses A15 to A23.

以上对本申请实施例进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The embodiments of the present application have been introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method and the core idea of the present application; at the same time, for Those of ordinary skill in the art will have changes in the specific implementation and application scope based on the ideas of the present application. In summary, the content of this description should not be understood as a limitation of the present application.

Claims

1. A scalar type conversion instruction processing device, characterized in that the device includes:

A control module, configured to parse the obtained scalar type conversion instruction, obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the operations to be performed required to execute the scalar type conversion instruction according to the operation code and the operation domain. Scalar and target address, and determining the target data type and the initial data type of the scalar to be operated;

An operation module, configured to perform a scalar type conversion operation on the to-be-operated scalar of the initial data type according to the target data type, obtain the operation result, and store the operation result in the target address. The operation result is The data type is the target data type,

Wherein, the operation code is used to indicate that the operation performed on the data by the scalar type conversion instruction is a scalar type conversion operation, and the operation field includes the scalar address to be operated and the target address;

Wherein, the control module includes:

Instruction storage submodule, used to store the scalar type conversion instructions;

The instruction processing submodule is used to parse the scalar type conversion instruction and obtain the operation code and operation domain of the scalar type conversion instruction;

The queue storage submodule is used to store an instruction queue, where the instruction queue includes a plurality of instructions to be executed that are arranged in sequence according to execution order, and the plurality of instructions to be executed include the scalar type conversion instruction.

2. The device according to claim 1, characterized in that the computing module includes:

A plurality of scalar operators used to perform the scalar type conversion operation.

3. The device according to claim 2, wherein the operation module includes a main operation sub-module and a plurality of slave operation sub-modules, and the main operation sub-module includes the plurality of scalar operators,

The main operation submodule is configured to use the plurality of scalar operators to perform the scalar type conversion operation, obtain an operation result, and store the operation result in the target address.

4. The device according to claim 1, wherein the operation domain further includes an initial data type and a target data type,

Wherein, the control module is further configured to determine the target data type and the initial data type of the scalar to be operated according to the operation domain.

5. The device according to claim 1, wherein the operation code is also used to indicate an initial data type and a target data type,

Wherein, the control module is further configured to determine the target data type and the initial data type of the scalar to be operated according to the operation code.

6. The device according to claim 1, wherein the target data type includes any of 16-bit floating point numbers, 32-bit floating point numbers, 48-bit floating point numbers, 16-bit integers, 32-bit integers and 48-bit integers. One, the initial data type includes 16-bit signed number, 32-bit signed number, 48-bit signed number, 16-bit unsigned number, 32-bit unsigned number, 48-bit unsigned number and pointer data type. Any kind.

7. The device of claim 1, further comprising:

Storage module, used to store the scalar to be calculated,

Wherein, the storage module includes at least one of a register and a cache,

The cache is used to store data to be calculated, and the cache includes at least one neuron cache NRAM;

The register is used to store the scalar to be operated;

The neuron cache is used to store neuron data in the data to be operated, where the neuron data includes neuron vector data.

8. The device according to claim 1, wherein the control module further includes:

Dependency processing submodule, configured to: when it is determined that the first instruction to be executed among the plurality of instructions to be executed is associated with the zeroth instruction to be executed before the first instruction to be executed, the first instruction to be executed is Execution instructions are cached in the instruction storage sub-module. After the execution of the zeroth instruction to be executed is completed, the first instruction to be executed is extracted from the instruction storage sub-module and sent to the operation module,

Wherein, the correlation between the first instruction to be executed and the zeroth instruction to be executed before the first instruction to be executed includes:

The first storage address interval that stores the data required by the first instruction to be executed has an overlapping area with the zeroth storage address interval that stores the data required by the zeroth instruction to be executed.

9. A machine learning computing device, characterized in that the device includes:

One or more scalar type conversion instruction processing devices as claimed in any one of claims 1 to 8, used to obtain scalars and control information to be operated from other processing devices, and perform specified machine learning operations, and pass the execution results through The I/O interface is passed to other processing devices;

When the machine learning computing device includes multiple scalar type conversion instruction processing devices, the multiple scalar type conversion instruction processing devices can be connected and transmit data through a specific structure;

Wherein, multiple scalar type conversion instruction processing devices are interconnected and transmit data through the PCIE bus to support larger-scale machine learning operations; multiple scalar type conversion instruction processing devices share the same The control system may have its own control system; the plurality of scalar type conversion instruction processing devices may share memory or have its own memory; the interconnection method of the plurality of scalar type conversion instruction processing devices may be any interconnection topology.

10. A combined processing device, characterized in that the combined processing device includes:

The machine learning computing device, universal interconnection interface and other processing devices as claimed in claim 9;

The machine learning computing device interacts with the other processing devices to jointly complete the calculation operations specified by the user,

Wherein, the combined processing device further includes: a storage device, which is connected to the machine learning computing device and the other processing devices respectively, and is used to save data of the machine learning computing device and the other processing devices.

11. A machine learning chip, characterized in that the machine learning chip includes:

The machine learning computing device according to claim 9 or the combined processing device according to claim 10.

12. An electronic device, characterized in that the electronic device includes:

The machine learning chip according to claim 11.

13. A board card, characterized in that the board card includes: a storage device, an interface device and a control device, and the machine learning chip according to claim 11;

Wherein, the machine learning chip is connected to the storage device, the control device and the interface device respectively;

The storage device is used to store data;

The interface device is used to realize data transmission between the machine learning chip and external equipment;

The control device is used to monitor the status of the machine learning chip.

14. A scalar type conversion instruction processing method, characterized in that the method is applied to a scalar type conversion instruction processing device, the device includes a control module and an operation module, and the method includes:

Use the control module to parse the obtained scalar type conversion instruction to obtain the operation code and operation domain of the scalar type conversion instruction, and obtain the scalar sum to be operated required to execute the scalar type conversion instruction based on the operation code and the operation domain. The target address, and determine the target data type and the initial data type of the scalar to be operated;

Utilize the operation module to perform a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type, obtain the operation result, and store the operation result in the target address. The data type of the operation result for the target data type,

Among them, the scalar type conversion instruction is parsed to obtain the operation code and operation domain of the scalar type conversion instruction, including:

store the scalar type conversion instruction;

Analyze the scalar type conversion instruction to obtain the operation code and operation domain of the scalar type conversion instruction;

An instruction queue is stored, and the instruction queue includes a plurality of instructions to be executed sequentially arranged in an execution order, and the plurality of instructions to be executed include the scalar type conversion instruction.

15. The method according to claim 14, characterized in that performing a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type includes:

The scalar type conversion operation is performed using multiple scalar operators in the operation module.

16. The method according to claim 15, wherein the operation module includes a main operator sub-module and a plurality of slave operator sub-modules, and the main operator sub-module includes the plurality of scalar operators,

Wherein, performing a scalar type conversion operation on the scalar to be operated on the initial data type according to the target data type, obtaining the operation result, and storing the operation result in the target address, including:

Multiple scalar operators in the main operation sub-module are used to perform the scalar type conversion operation to obtain an operation result, and the operation result is stored in the target address.

17. The method according to claim 14, characterized in that the operation domain also includes an initial data type and a target data type,

Among them, determining the target data type and the initial data type of the scalar to be operated includes:

The target data type and the initial data type of the scalar to be operated are determined according to the operation domain.

18. The method according to claim 14, characterized in that the operation code is also used to indicate an initial data type and a target data type,

The target data type and the initial data type of the scalar to be operated are determined according to the operation code.

19. The method according to claim 14, wherein the target data type includes any of 16-bit floating point numbers, 32-bit floating point numbers, 48-bit floating point numbers, 16-bit integers, 32-bit integers and 48-bit integers. One, the initial data type includes 16-bit signed number, 32-bit signed number, 48-bit signed number, 16-bit unsigned number, 32-bit unsigned number, 48-bit unsigned number and pointer data type. Any kind.

20. The method of claim 15, further comprising:

Utilize the storage module of the device to store the scalar to be calculated,

Wherein, the storage module includes at least one of a register and a cache,

The register is used to store the scalar to be operated;

21. The method of claim 14, further comprising:

When it is determined that the first instruction to be executed among the plurality of instructions to be executed is associated with the zeroth instruction to be executed before the first instruction to be executed, the first instruction to be executed is cached, and when it is determined that the first instruction to be executed is After the execution of the zeroth instruction to be executed is completed, the execution of the first instruction to be executed is controlled,

22. A non-volatile computer-readable storage medium with computer program instructions stored thereon, characterized in that when the computer program instructions are executed by a processor, the method of any one of claims 14 to 21 is implemented.