CN1152300C

CN1152300C - Single instruction multiple data processing method and device in multimedia signal processor

Info

Publication number: CN1152300C
Application number: CNB971174059A
Authority: CN
Inventors: Ī��ȡ�A��º�Ĭ��; 莫塔兹·Ａ·穆罕默德; 朴宪哲; Sd; 利·T·恩格延; 罗尼·S·D·旺
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 1996-08-19
Filing date: 1997-08-19
Publication date: 2004-06-02
Anticipated expiration: 2017-08-19
Also published as: FR2752629B1; KR100267089B1; JPH10143494A; CN1188275A; TW346595B; DE19735349B4; FR2752629A1; KR19980018065A; DE19735349A1

Abstract

A vector processor includes a scalar register for a scalar value, and a vector register for a vector having a plurality of data elements. Operations performed by a vector processor process 2 or more than 2 vector operands to determine a vector quantity, combine scalar operands and vector operands to determine a vector quantity, and combine 2 or more than 2 scalar operands to determine a scalar quantity. Scalar registers also operate on individual data elements in the vector registers.

Description

Single instruction multiple data processing method and device in multimedia signal processor

发明领域field of invention

本发明涉及数字信号处理技术，特别涉及用于多媒体功能(如视频和音频编码和解码)的对每个指令的多个数据元素进行并行处理的方法及其装置。The present invention relates to digital signal processing technology, in particular to a method and device for parallel processing multiple data elements of each instruction for multimedia functions such as video and audio encoding and decoding.

背景技术Background technique

本专利文件涉及并参考下列同时申请的专利申请：This patent document refers to and refers to the following concurrently filed patent applications:

美国专利申请顺序号UNKNOWN1，代理人案号M-4354，题为“Multiprocessor Operation in a Multimedia Signal Processor(多媒体信号处理器中的多处理器操作)”；U.S. Patent Application Serial No. UNKNOWN1, Attorney's Case No. M-4354, entitled "Multiprocessor Operation in a Multimedia Signal Processor";

美国专利申请顺序号UNKNOWN2，代理人案号M-4355，题为“Single-Instruction-Multiple-Data Processing in a Multimedia Signal Processor(多媒体信号处理器中的单指令多数据处理)”；U.S. patent application serial number UNKNOWN2, attorney case number M-4355, titled "Single-Instruction-Multiple-Data Processing in a Multimedia Signal Processor";

美国专利申请顺序号UNKNOWN3，代理人案号M-4365，题为“Efficient Context Saving and Restoring in Multiprocessors(多处理器中的高效现场保存和恢复)”；U.S. Patent Application Serial No. UNKNOWN3, Attorney Case No. M-4365, entitled "Efficient Context Saving and Restoring in Multiprocessors (Efficient Context Saving and Restoring in Multiprocessors)";

美国专利申请顺序号UNKNOWN4，代理人案号M-4366，题为“Systemand Method for Handling Software Interrupts with Argument Passing(处理具有参数传递的软件中断的系统和方法)”；U.S. patent application serial number UNKNOWN4, attorney case number M-4366, titled "System and Method for Handling Software Interrupts with Argument Passing (system and method for processing software interrupts with parameter passing)";

美国专利申请顺序号UNKNOWN5，代理人案号M-4367，题为“Systemand Method for Handling Interrupts and Exception Events in an AsymmetricMultiprocessor Architecture(在不对称多处理器结构中处理中断和异常事件的系统和方法)”；U.S. Patent Application Serial No. UNKNOWN5, Attorney Case No. M-4367, entitled "System and Method for Handling Interrupts and Exception Events in an Asymmetric Multiprocessor Architecture (System and Method for Handling Interrupts and Exception Events in an Asymmetric Multiprocessor Architecture)" ;

美国专利申请顺序号UNKNOWN6，代理人案号M-4368，题为“Methods and Apparatus for Processing Video Data(处理视频数据的方法和装置)”；U.S. patent application serial number UNKNOWN6, attorney case number M-4368, titled "Methods and Apparatus for Processing Video Data (method and device for processing video data)";

美国专利申请顺序号UNKNOWN7，代理人案号M-4369，题为“Single-Instruction-Multiple-Data Processing Using Multiple Banks of VectorRegisters(采用多个向量寄存器组的单指令多数据处理)”；以及U.S. Patent Application Serial No. UNKNOWN7, Attorney's Docket No. M-4369, entitled "Single-Instruction-Multiple-Data Processing Using Multiple Banks of Vector Registers"; and

用于多媒体应用(例如实时视频编码和解码)的可编程数字信号处理器(DSPs)需要相当强的处理能力，以便在有限时间内处理大量的数据。数字信号处理器的几种结构是众所周知的。大多数微处理器采用的通用结构一般需要高工作频率，以提供具有足以进行实时视频编码或解码的计算能力的DSP。这使这种DSP价格昂贵。Programmable digital signal processors (DSPs) for multimedia applications (such as real-time video encoding and decoding) require considerable processing power to process large amounts of data within a limited time. Several architectures of digital signal processors are well known. The general-purpose architecture used by most microprocessors generally requires high operating frequencies to provide a DSP with sufficient computing power for real-time video encoding or decoding. This makes this kind of DSP expensive.

超长指令字(VLIW)处理器是具有很多功能单元的一种DSP，这些功能单元中的大部分执行不同的、相对简单的任务。VLIW DSP的单个指令可以是128字节或更长，并具有多个独立的被独立的功能单元并行执行的部分。VLIW DSPs具有很强的计算能力，因为许多功能单元能并行工作。VLIWDSPs还具有相对低的成本，因为每个功能单元相对小而简单。VLIW DSPs存在的一个问题是在处理输入/输出控制、同主计算机通信及其它不宜于用VLIW DSP的多个功能单元并行执行的功能方面效率低下。另外，VLIW的软件不同于传统的软件且开发困难，因为缺乏编程工具和熟悉VLIW软件结构的程序员。因此，能提供合理成本、高计算能力和熟悉的编程环境的DSP是多媒体应用所寻求的。A Very Long Instruction Word (VLIW) processor is a type of DSP that has many functional units, most of which perform different, relatively simple tasks. A single instruction of a VLIW DSP can be 128 bytes or longer and have multiple independent sections executed in parallel by independent functional units. VLIW DSPs have strong computing power because many functional units can work in parallel. VLIWDSPs also have relatively low cost because each functional unit is relatively small and simple. A problem with VLIW DSPs is the inefficiency in handling input/output control, communication with a host computer, and other functions that are not amenable to parallel execution by multiple functional units of a VLIW DSP. In addition, the software of VLIW is different from traditional software and is difficult to develop because of the lack of programming tools and programmers who are familiar with the structure of VLIW software. Therefore, a DSP that can provide reasonable cost, high computing power and a familiar programming environment is sought for by multimedia applications.

发明内容Contents of the invention

本发明的目的是提供一种单指令多数据处理方法及其装置。The object of the present invention is to provide a single instruction multiple data processing method and its device.

根据本发明一个方面，提供了一种处理器，包括一个标量寄存器，适于存贮单个标量值；一个向量寄存器，适于存贮多个数据元素；和处理电路，它连接到所述标量寄存器和所述向量寄存器，其中该处理电路响应于单个指令并行地执行多种操作，每种操作把所述向量寄存器中的一个数据元素同所述标量寄存器中的所述标量值结合起来。According to one aspect of the present invention there is provided a processor comprising a scalar register adapted to store a single scalar value; a vector register adapted to store a plurality of data elements; and processing circuitry connected to said scalar register and said vector register, wherein the processing circuit executes multiple operations in parallel in response to a single instruction, each operation combining a data element in said vector register with said scalar value in said scalar register.

根据本发明另一个方面，提供了一种操作处理电路以执行指令的方法，包括：读构成向量值分量的寄存器数据元素；和执行并行操作，该操作把标量值同每个数据元素相结合，以产生向量结果。According to another aspect of the present invention, there is provided a method of operating a processing circuit to execute an instruction, comprising: reading register data elements constituting a vector-valued component; and performing a parallel operation that associates a scalar value with each data element , to produce a vector result.

根据本发明再一个方面，一种操作处理器的方法，包括在所述处理器中提供标量寄存器和向量寄存器，其中每个标量寄存器适于存贮单个标量值，而每个向量寄存器适于存储构成向量分量的多个数据元素；向每个标量寄存器赋给一个寄存器号，该寄存器号不同于赋给其它标量寄存器的寄存器号；向每个向量寄存器赋给一个寄存器号，该寄存器号不同于赋给其它向量寄存器的寄存器号，其中赋给所述向量寄存器的至少某些寄存器号与赋给所述标量寄存器的寄存器号相同；形成一指令，该指令包括第一操作数和第二操作数，其中第一操作数是标识标量寄存器的寄存器号，第二操作数是标识向量寄存器的寄存器号；和执行所述指令以在由所述第一操作数标识的所述标量寄存器和由所述第二操作数标识的所述向量寄存器中的数据元素之间转移数据。According to yet another aspect of the invention, a method of operating a processor includes providing in said processor scalar registers and vector registers, wherein each scalar register is adapted to store a single scalar value and each vector register is adapted to Stores the multiple data elements that make up the vector components; assigns each scalar register a register number that is different from the register number assigned to the other scalar registers; assigns each vector register a register number that is different register numbers assigned to other vector registers, wherein at least some of the register numbers assigned to said vector registers are the same as the register numbers assigned to said scalar registers; forming an instruction comprising a first operand and a second operand number, wherein the first operand is a register number identifying a scalar register and the second operand is a register number identifying a vector register; and executing said instruction to operate on said scalar register identified by said first operand and by said transferring data between data elements in the vector register identified by the second operand.

根据本发明一个方面的一个多媒体数字信号处理器(DSP)包括一个向量处理器，该向量处理器操作向量数据(即每操作数有多个数据元素)以提供高处理能力。该处理器使用RISC类型指令集的单指令多数据结构。程序员可以很容易地适应向量处理器的编程环境，因为它类似于大部分程序员熟悉的通用处理器的编程环境。A multimedia digital signal processor (DSP) according to one aspect of the present invention includes a vector processor that operates on vector data (ie, multiple data elements per operand) to provide high processing power. This processor uses a single instruction multiple data structure of a RISC type instruction set. Programmers can easily adapt to the programming environment for vector processors because it is similar to the programming environment for general-purpose processors with which most programmers are familiar.

DSP包括一套通用向量寄存器。每个向量寄存器具有固定长度，但是分成多个用户可以选择长度的独立数据元素。因此，存储在向量寄存器中的数据元素数取决于为该元素所选择的长度。例如32字节寄存器可以分成32个8位的数据元素、16个16位的数据元素，或8个32位的数据元素。数据长度和类型的选择通过处理与向量寄存器有关的数据的指令来确定，而指令的一个执行数据通路执行多个并行操作，这取决于指令所指示的数据长度。The DSP includes a set of general-purpose vector registers. Each vector register has a fixed length, but is divided into individual data elements of a user-selectable length. Therefore, the number of data elements stored in a vector register depends on the length chosen for that element. For example, a 32-byte register can be divided into 32 8-bit data elements, 16 16-bit data elements, or 8 32-bit data elements. The choice of data length and type is determined by the instruction that processes the data associated with the vector register, and one execution datapath of the instruction performs multiple parallel operations, depending on the data length indicated by the instruction.

向量处理器的指令可以有向量寄存器或标量寄存器作为操作数，并且并行地操作多个向量寄存器的多个数据元素，以便提高计算能力。本发明的向量处理器的一个示例性的指令集包括：协处理器接口操作；流控制操作：加载/存储操作；及逻辑/算术操作。逻辑/算术操作包括的操作，把一个向量寄存器的多个数据元素同一个或多个别的向量寄存器中相应的多个数据元素相结合，产生结果数据向量的多个数据元素。其他的逻辑/算术操作混合一个或多个向量寄存器的各种数据元素，或把向量寄存器的数据元素同标量结合起来。The instructions of the vector processor can have vector registers or scalar registers as operands, and operate on multiple data elements of multiple vector registers in parallel in order to increase computing power. An exemplary instruction set for the vector processor of the present invention includes: coprocessor interface operations; flow control operations: load/store operations; and logic/arithmetic operations. Logical/arithmetic operations include operations that combine data elements of one vector register with corresponding data elements of one or more other vector registers to produce data elements of a resulting data vector. Other logical/arithmetic operations mix various data elements of one or more vector registers, or combine data elements of a vector register with scalars.

该向量处理器的一种结构扩展，加上了标量寄存器，每个标量寄存器都包括一个标量数据元素。标量和向量寄存器的结合方便了将向量处理器的指令集进行扩充以包括并行地把一个向量的每个数据元素同一个标量值结合的操作。例如，一条指令把一个向量的多个数据元素乘以一个标量值。标量寄存器亦提供一个位置，用于存储要对向量寄存器中提取或存入向量寄存器的单个数据元素。标量寄存器以在向量处理器和协处理器(该协处理器的结构仅提供标量寄存器)之间传送信息、以及对计算加载/存储操作所用的有效地址也很方便。An architectural extension of the vector processor that adds scalar registers each containing a scalar data element. The combination of scalar and vector registers facilitates extending the instruction set of a vector processor to include operations that combine each data element of a vector with a scalar value in parallel. For example, an instruction multiplies multiple data elements of a vector by a scalar value. Scalar registers also provide a location for storing a single data element to be fetched from or stored to a vector register. Scalar registers are also convenient for passing information between the vector processor and the coprocessor (the architecture of the coprocessor provides only scalar registers), and for calculating effective addresses for load/store operations.

根据本发明的另一方面，向量处理器的多个向量寄存器被组织为多个组。每个组能被选择为“当前(current)”组，而另一个组则是“替代(alternative)”组。在向量处理器的控制寄存器中“当前组”位指示当前且。为了减少标识向量寄存器所需的位数，有些指令只提供标识当前组中的一个向量寄存器的寄存器号。加载/存储指令有一个附加位来标识任何一个组的向量寄存器。因此，加载/存储操作能在操作当前组中的数据的期间取出数据送到替代组。这有助于图像处理和图形过程的软件流水线作业，且当取数据时降低处理器的延迟，因为用访问替代寄存器组的加载/存储操作，逻辑/算术操作可不按次序执行。在其它指令中，替代组允许使用双倍长度向量寄存器，这种寄存器包括一个来自当前组的向量寄存器，和一个来自替代组的相应的向量寄存器。这种双倍长度寄存器可根据指令句法来鉴别。向量处理器中的控制位可以被设定，以使缺省向量长度是一个或两个向量寄存器。替代组亦允许在复杂指令句法中使用更少的显式标识的操作数，如混洗(shuffle)、去混洗(unshuffle)、饱和(saturate)、及具有两个源和两个目的寄存器的条件转移。According to another aspect of the invention, the vector registers of the vector processor are organized into banks. Each group can be selected as the "current" group and the other as the "alternative" group. The "current group" bit in the control register of the vector processor indicates the current and. To reduce the number of bits required to identify a vector register, some instructions provide only the register number identifying one vector register in the current bank. Load/store instructions have an additional bit to identify any one set of vector registers. Thus, a load/store operation can fetch data to an alternate bank while manipulating data in the current bank. This facilitates software pipelining of image processing and graphics processes, and reduces processor latency when fetching data, since logical/arithmetic operations can be performed out of order with accesses instead of register bank load/store operations. Among other instructions, the substitution group allows the use of double-length vector registers consisting of a vector register from the current group and a corresponding vector register from the substitution group. Such double-length registers can be identified from the instruction syntax. Control bits in the vector processor can be set so that the default vector length is one or two vector registers. Substitution groups also allow the use of fewer explicitly identified operands in complex instruction syntax, such as shuffle, unshuffle, saturate, and operands with two source and two destination registers. Conditional transfer.

向量处理器还实现新颖的指令，如四平均(average quad)、混洗、去混洗、成对方式最大(pair-wise maximum)和交换(exchange)以及饱和。这些指令执行的操作在多媒体功能(例如视频编码和解码)中是常见的，并代替在其它指令集中实现同样功能所需的2个或更多个指令。因而，向量处理器指令集改进了多媒体应用中程序的效率和速度。The vector processor also implements novel instructions such as average quad, shuffle, deshuffle, pair-wise maximum and exchange, and saturation. These instructions perform operations that are common in multimedia functions such as video encoding and decoding, and replace 2 or more instructions required to implement the same function in other instruction sets. Thus, the vector processor instruction set improves the efficiency and speed of programs in multimedia applications.

附图说明Description of drawings

以下结合附图详细说明本发明的优选实施例，其中，Preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings, wherein,

图1是根据本发明一实施例的多媒体处理器的方框图。FIG. 1 is a block diagram of a multimedia processor according to an embodiment of the present invention.

图2是图1的多媒体处理器的向量处理器的方框图。FIG. 2 is a block diagram of a vector processor of the multimedia processor of FIG. 1. FIG.

图3是图2的向量处理器的取指单元的方框图。FIG. 3 is a block diagram of an instruction fetch unit of the vector processor in FIG. 2 .

图4是图2的向量处理器的取指单元的方框图。FIG. 4 is a block diagram of an instruction fetch unit of the vector processor in FIG. 2 .

图5A、5B和5C示出了图2的向量处理器的寄存器到寄存器指令、加载指令及存储指令所用的执行流水线的步骤。5A, 5B and 5C show the steps of the execution pipeline for register-to-register instructions, load instructions, and store instructions of the vector processor of FIG. 2 .

图6A是图2的向量处理器的执行数据通路的方框图。FIG. 6A is a block diagram of the execution data path of the vector processor of FIG. 2 .

图6B是图6A执行数据通路的寄存器文件(register file)的方框图。FIG. 6B is a block diagram of the register file (register file) of the execution data path of FIG. 6A.

图6C是图6A执行数据通路的并行处理逻辑单元的方框图。FIG. 6C is a block diagram of the parallel processing logic of FIG. 6A executing the data path.

图7是图2的向量处理器的加载/存储单元的方框图。FIG. 7 is a block diagram of a load/store unit of the vector processor of FIG. 2 .

图8是本发明一实施例的向量处理器指令集的格式。FIG. 8 is a format of a vector processor instruction set according to an embodiment of the present invention.

具体实施方式Detailed ways

在不同图中所用的相同参考符号表示相似的或相同的项。The use of the same reference symbols in different figures indicates similar or identical items.

图1示出了本发明一实施例的多媒体信号处理器(MSP)100的实施例的方块图。多媒体处理器100包括通用处理器110和向量处理器120组成的处理核心105。处理核心105通过高速缓冲存储器(以下称为高速缓存)子系统130连到多媒体处理器100的剩余部分，高速缓存子系统包括SRAM 160和190，ROM170及高速缓存控制器180。高速缓存控制器180可将SRAM160配置为处理器110的指令高速缓存162和数据高速缓存164，而将SRAM190配置为向量处理器120的指令高速缓存192和数据高速缓存194。FIG. 1 shows a block diagram of an embodiment of a multimedia signal processor (MSP) 100 according to an embodiment of the present invention. The multimedia processor 100 includes a processing core 105 composed of a general purpose processor 110 and a vector processor 120 . The processing core 105 is connected to the rest of the multimedia processor 100 through a cache memory (hereinafter referred to as cache) subsystem 130, which includes SRAM 160 and 190, ROM 170 and cache controller 180. Cache controller 180 may configure SRAM 160 as instruction cache 162 and data cache 164 for processor 110 and SRAM 190 as instruction cache 192 and data cache 194 for vector processor 120 .

片内ROM170包含处理器110和120的数据和指令，并能配置为高速缓存。在本实施例中，ROM170包括：复位和初始化过程；自测试诊断过程；中断和异常处理程序；及声霸卡仿真子例程；V.34调制解调器信号处理子例程；一般电话功能；2-D和3-D图形子例程序；以及用于音频和视频标准如MPEG-1，MPEG-2，H.261，H.263，G.728和G.723的子例程序。On-chip ROM 170 contains data and instructions for processors 110 and 120 and can be configured as a cache. In the present embodiment, ROM170 includes: reset and initialization process; Self-test diagnosis process; Interrupt and exception handling program; And sound blaster emulation subroutine; V.34 modem signal processing subroutine; General telephone function; D and 3-D graphics subroutines; and subroutines for audio and video standards such as MPEG-1, MPEG-2, H.261, H.263, G.728, and G.723.

高速缓存子系统130把处理器110和120连接到两个系统总线140和150，并作为处理器110和120及耦合到总线140和150的设备的高速缓存和切换站(switching station)。系统总线150用比总线140更高的时钟频率工作，并连接到存储器控制器158、局部总线接口156、DMA控制器154和设备接口152，它们分别为外部局部存储器、主计算机的局部总线、直接存储器存取和各种模数、数模转换器提供接口。系统定时器142、UART(Universal asynchronous receiver transceiver，通用异步收发信机)144、位流处理器146及中断控制器148连接到总线140。前面提到的题为“Multiprocessor Operation in a Multimedia Signal Processor”和“Methods andapparatus for Processing Video Data”的专利申请更全面地说明了高速缓存子系统130和示例性的设备的工作，处理器110和120通过高缓存子系统130及总线140和150访问所述设备。Cache subsystem 130 connects processors 110 and 120 to two system buses 140 and 150 and acts as a cache and switching station for processors 110 and 120 and devices coupled to buses 140 and 150 . The system bus 150 operates at a higher clock frequency than the bus 140, and is connected to a memory controller 158, a local bus interface 156, a DMA controller 154, and a device interface 152, which are respectively external local memory, the local bus of the host computer, the direct Memory access and various analog-to-digital and digital-to-analog converters provide interfaces. A system timer 142 , a UART (Universal asynchronous receiver transceiver) 144 , a bit stream processor 146 and an interrupt controller 148 are connected to the bus 140 . The aforementioned patent applications entitled "Multiprocessor Operation in a Multimedia Signal Processor" and "Methods and apparatus for Processing Video Data" more fully describe the operation of cache subsystem 130 and exemplary devices, processors 110 and 120 The devices are accessed through cache subsystem 130 and buses 140 and 150 .

处理器110和120执行独立的程序线程，并且在结构上也是不同的，以便更有效地执行赋予它们的特定任务。处理器110主要用于控制功能，例如实时操作系统的执行及类似的不需要大量重复计算的功能。因此，处理器110不需要强计算能力，能用传统的通用处理器结构实现。向量处理器120主要实现数学计算(number crunching)这种包含对多媒体处理中常见的数据块的重复操作。为了有强计算能力和相对简单的编程，向量处理器120具有SIMD(Single instruction multiple data，单指令多数据)结构；在本实施例中，在向量处理器120中大部分数据通路为288位或576位宽，以支持向量数据操作。此外，向量处理器120的指令集包括尤其适用于多媒体问题的指令。Processors 110 and 120 execute separate program threads and are also structured differently in order to more efficiently perform the particular tasks assigned to them. The processor 110 is mainly used for control functions, such as the execution of a real-time operating system and similar functions that do not require a lot of repeated calculations. Therefore, the processor 110 does not require strong computing power, and can be implemented with a traditional general-purpose processor structure. The vector processor 120 mainly implements mathematical calculation (number crunching), which includes repetitive operations on common data blocks in multimedia processing. In order to have strong computing power and relatively simple programming, vector processor 120 has a SIMD (Single instruction multiple data, single instruction multiple data) structure; In the present embodiment, most of the data paths in vector processor 120 are 288 bits or 576 bits wide to support vector data operations. Furthermore, the instruction set of the vector processor 120 includes instructions that are particularly suited for multimedia problems.

在本实施例中，处理器110是32位RISC处理器，工作在40MHz上，符合ARM7处理器的结构，所述ARM7处理器包含有ARM7标准定义的寄存器集。有关ARM 7 RISC处理器的结构和指令集在“ARM7DM DataSheet(ARM7DM产品说明书)”Document Number(文件号)：ARM DDI0010G中作了说明，这可从Advance RISC Machines Ltd.公司得到。ARM7DM Data Sheet全部包括在这里作为参考。附录A说明了本实施例的ARM7指令集的扩展。In this embodiment, the processor 110 is a 32-bit RISC processor, operating at 40 MHz, conforming to the structure of an ARM7 processor, and the ARM7 processor includes a register set defined by the ARM7 standard. The structure and instruction set of the ARM 7 RISC processor are described in the "ARM7DM DataSheet (ARM7DM Product Specification)" Document Number: ARM DDI0010G, which is available from Advance RISC Machines Ltd. The ARM7DM Data Sheet is fully included here for reference. Appendix A illustrates the extension of the ARM7 instruction set of this embodiment.

向量处理器120既操作向量又操作标量。在本实施例中，向量数据处理器120包括以80MHz工作的流水线式RISC引擎(engine)。向量处理器120的寄存器包括32位标量寄存器、32位专用寄存器、两组288位向量寄存器及两组双倍长度(即576位)向量累加器寄存器。附录C说明了本实施例的向量处理器120的寄存器集。在本实施例中，处理器120包括32个标量寄存器，这些标量寄存器通过范围从0到31的5位寄存器号在指令中被标识。还有64个288位的向量寄存器，这些寄存器组成两个组，每组有32个向量寄存器。每个向量寄存器可用1位组号(0或1)和5位范围从0到31的向量寄存器号来标识。大部分指令只访问当前组中的向量寄存器，如存储在向量处理器120的控制寄存器VCSR的缺省组位CBANK所表示。第2个控制位VEC64表示是否寄存器号缺省表示由来自每个组的一个寄存器组成的双倍长度向量寄存器。指令的句法区别标识向量寄存器的寄存器号和标识标量寄存器的寄存器号。The vector processor 120 operates on both vectors and scalars. In this embodiment, the vector data processor 120 includes a pipelined RISC engine operating at 80 MHz. The registers of the vector processor 120 include 32-bit scalar registers, 32-bit special purpose registers, two sets of 288-bit vector registers and two sets of double-length (ie 576-bit) vector accumulator registers. Appendix C illustrates the register set of the vector processor 120 of this embodiment. In the present embodiment, processor 120 includes 32 scalar registers, which are identified in instructions by 5-bit register numbers ranging from 0-31. There are also 64 288-bit vector registers organized into two banks of 32 vector registers each. Each vector register can be identified by a 1-bit group number (0 or 1) and a 5-bit vector register number ranging from 0 to 31. Most instructions only access vector registers in the current bank, as indicated by the default bank bit CBANK stored in the vector processor 120 control register VCSR. The second control bit VEC64 indicates whether the register number defaults to a double-length vector register consisting of one register from each bank. The syntax of the instruction distinguishes between register numbers identifying vector registers and register numbers identifying scalar registers.

每个向量寄存器可被分成长度可编程的多个数据元素，表1示出了在一个288位向量寄存器中支持的数据元素的数据类型。Each vector register can be divided into multiple data elements whose length can be programmed. Table 1 shows the data types of data elements supported in a 288-bit vector register.

表1：数据类型数据长度解释 int8 8位(字节) -128和127之间8位2的补码 int9 9位(字节9) -256和255之间9位2的补码 int16 16位(半字) -32，768和32，767之间16位2的补码 int32 32位(字) -2147483648和2147483647之间32位2的补码。 float 32位(字) 32位IEEE 754单精度格式 Table 1: type of data Data length explain int8 8 bits (bytes) 8-bit 2's complement between -128 and 127 int9 9 bits (byte 9) 9-bit 2's complement between -256 and 255 int16 16 bits (half word) 16-bit 2's complement between -32,768 and 32,767 int32 32 bits (word) 32-bit 2's complement between -2147483648 and 2147483647. float 32 bits (word) 32-bit IEEE 754 single-precision format

附录D进一步提供了本发明的实施例中支持的数据长度和类型的说明。Appendix D further provides a description of the data lengths and types supported in embodiments of the present invention.

对int9数据类型，9位字节被连续地组合在288位向量寄存器中，而对其它的数据类型，在288位向量寄存器中每个第9位不使用。288位向量寄存器能放32个8或9位整型数据元素、16个16位整型数据元素、或8个32位整型或浮点元素。此外，2个向量寄存器可被结合以双倍长度向量组装数据元素。在本发明的实施例中，将控制和状态寄存器VCSR中的控制位VEC64置位，把向量处理器120置于方式VEC64，这里双倍长度(576位)是向量寄存器的缺省长度。For the int9 data type, 9-bit bytes are grouped consecutively in the 288-bit vector register, while for other data types, every ninth bit in the 288-bit vector register is not used. A 288-bit vector register can hold 32 8- or 9-bit integer data elements, 16 16-bit integer data elements, or 8 32-bit integer or floating-point elements. Additionally, 2 vector registers can be combined to pack data elements into double length vectors. In an embodiment of the invention, setting control bit VEC64 in control and status register VCSR puts vector processor 120 in mode VEC64, where double length (576 bits) is the default size of the vector register.

多媒体处理器100还包括一套处理器110和120都可以访问的32位扩展寄存器115。附录B说明了本发明的实施例中扩展寄存器集和它们的功能。扩展寄存器和向量处理器120的标量和专用寄存器，在某些情况下可供处理器110访问。2个专用“用户”扩展寄存器具有2个读端口，允许处理器110和120同时读寄存器。其它扩展寄存器不能同时被访问。Multimedia processor 100 also includes a set of 32-bit extension registers 115 that are accessible by both processors 110 and 120 . Appendix B describes the extended register sets and their functions in an embodiment of the present invention. Extension registers and scalar and special purpose registers of the vector processor 120 and, in some cases, accessible to the processor 110 . The 2 dedicated "user" extension registers have 2 read ports allowing processors 110 and 120 to read the registers simultaneously. Other extended registers cannot be accessed at the same time.

向量处理器120有两个交替的状态VP_RUN和VP_IDLE，指示向量处理器120是处于工作还是处于空闲状态。当向量处理器120处于状态VP_IDLE时，处理器110能读或写向量处理器120的标量和专用寄存器。但向量处理器120处于状态VP_RUN时处理器110读或写向量处理器120的一寄存器的结果未予定义。The vector processor 120 has two alternate states, VP_RUN and VP_IDLE, indicating whether the vector processor 120 is in a working or idle state. When vector processor 120 is in state VP_IDLE, processor 110 can read or write to vector processor 120 scalar and special purpose registers. However, the result of reading or writing a register of the vector processor 120 by the processor 110 when the vector processor 120 is in the VP_RUN state is undefined.

对处理器110的ARM7指令集的扩展包括访问扩展寄存器和向量处理器120的标量或专用寄存器的指令。指令MFER和MFEP分别把扩展寄存器和向量处理器120的标量或专用寄存器中的数据移到处理器110中的通用寄存器中，指令MTER和MTEP分别把处理器110中通用寄存器的数据移到扩展寄存器和向量处理器120的标量或专用寄存器中。TESTSET指令读扩展寄存器并把扩展寄存器的位30置成1。指令TESTSET通过将位30置位，向处理器120发生信号说明处理器110已读出(或使用)产生的结果，方便了使用者/生产者同步。处理器110的其它指令如STARTVP和INTVP控制向量处理器120的工作状态。Extensions to the ARM7 instruction set of processor 110 include instructions to access extension registers and scalar or special purpose registers of vector processor 120 . Instructions MFER and MFEP move the data in the extended registers and the scalar or special registers of the vector processor 120 to the general-purpose registers in the processor 110 respectively, and instructions MTER and MTEP move the data in the general-purpose registers in the processor 110 to the extended registers respectively and vector processor 120 in scalar or special purpose registers. The TESTSET instruction reads the extension register and sets bit 30 of the extension register to 1. The instruction TESTSET facilitates consumer/producer synchronization by setting bit 30 to signal to processor 120 that processor 110 has read (or used) the generated result. Other instructions of the processor 110 such as STARTVP and INTVP control the working state of the vector processor 120 .

处理器110起主处理器的作用以控制向量处理器120的操作。用处理器110和120之间控制的不对称划分简化了处理器110和120进行同步的问题。当向量处理器120处于VP_IDLE状态时，处理器110通过把指令地址写到向量处理器120的程序计数器中，来初始化向量处理器120。然后，处理器110执行STARTVP指令，把向量处理器120改变成状态VP_RUN。在状态VP_RUN下，向量处理器120通过高速缓存子系统130取指，并同继续执行它自己程序的处理器110并行地执行那些指令。在启动后，向量处理器120继续执行，直到遇到一个异常、执行满足适当条件的VCJOIN或VCINT指令、或被处理器110中断。向量处理器120通过把结果写到扩展寄存器、把结果写到处理器110和120共享的地址空间、或当向量处理器120重新进入状态VP_IDLE时把结果留在处理器110访问的标量或专用寄存器中，可将程序执行的结果传送到处理器110。The processor 110 functions as a master processor to control the operation of the vector processor 120 . Using an asymmetric division of control between processors 110 and 120 simplifies the problem of synchronization between processors 110 and 120 . When the vector processor 120 is in the VP_IDLE state, the processor 110 initializes the vector processor 120 by writing an instruction address into the program counter of the vector processor 120 . Processor 110 then executes the STARTVP instruction, which changes vector processor 120 to state VP_RUN. In state VP_RUN, vector processor 120 fetches instructions through cache subsystem 130 and executes those instructions in parallel with processor 110 continuing to execute its own program. After startup, vector processor 120 continues executing until it encounters an exception, executes a VCJOIN or VCINT instruction that satisfies the appropriate conditions, or is interrupted by processor 110 . Vector processor 120 does this by writing the result to an extension register, writing the result to an address space shared by processors 110 and 120, or leaving the result in a scalar or special purpose register accessed by processor 110 when vector processor 120 re-enters state VP_IDLE , the results of the program execution may be transmitted to the processor 110 .

向量处理器120并不处理它自己的异常。在执行引起异常的指令时，向量处理器120进入状态VP_IDLE，并通过直通线发出一个中断请求给处理器110。向量处理器120保持在状态VP_IDLE，直到处理器110执行另一个STARTVP指令。处理器110负责读向量处理器120的寄存器VISRC以决定异常的性质，可能通过重新初始化向量处理器120来处理异常，然后，根据需要引导向量处理器120恢复执行。Vector processor 120 does not handle its own exceptions. When executing the instruction that caused the exception, the vector processor 120 enters the state VP_IDLE and issues an interrupt request to the processor 110 through the direct line. Vector processor 120 remains in state VP_IDLE until processor 110 executes another STARTVP instruction. Processor 110 is responsible for reading register VISRC of vector processor 120 to determine the nature of the exception, possibly handling the exception by reinitializing vector processor 120, and then directing vector processor 120 to resume execution as needed.

通过处理器110执行的INTVP指令中断向量处理器120，使向量处理器120进入空闲状态VP_IDLE。指令INTVP可以例如用在多任务系统中，把向量处理器从执行的一个任务如视频译码切换到另一个任务如声卡仿真。The vector processor 120 is interrupted by the INTVP instruction executed by the processor 110, so that the vector processor 120 enters the idle state VP_IDLE. The instruction INTVP can be used, for example, in a multitasking system to switch the execution of a vector processor from one task, such as video decoding, to another task, such as sound card emulation.

向量处理器指令VCINT和VCJOIN是流控制指令，如果指令指示的条件满足，这些指令将停止向量处理器120的执行，使向量处理器120置于状态VP_IDLE，并向处理器110发中断请求，除非这个请求被屏蔽。向量处理器120的程序计数器(专用寄存器VPC)指出VCINT或VCJOIN指令后的指令地址。处理器110能检查向量处理器120的中断源寄存器VISRC，确定是否是VCINT或VCJOIN指令引起中断请求。因为向量处理器120有大量数据总线，及在保存和恢复它的寄存器上更有效，所以通过向量处理器120执行的软件在现场切换(context switching)期间应该保存和恢复寄存器。上面提到的题为“Efficient Context Saving and Restoring in Multiprocessors”的专利申请，说明了现场切换的一示例性的系统。The vector processor instructions VCINT and VCJOIN are flow control instructions. If the conditions indicated by the instructions are satisfied, these instructions will stop the execution of the vector processor 120, place the vector processor 120 in the state VP_IDLE, and send an interrupt request to the processor 110, unless This request is blocked. The program counter (special purpose register VPC) of the vector processor 120 indicates the instruction address following the VCINT or VCJOIN instruction. Processor 110 can check the interrupt source register VISRC of vector processor 120 to determine whether the VCINT or VCJOIN instruction caused the interrupt request. Because the vector processor 120 has a large data bus and is more efficient at saving and restoring its registers, software executed by the vector processor 120 should save and restore registers during context switching. The above-mentioned patent application entitled "Efficient Context Saving and Restoring in Multiprocessors" illustrates an exemplary system for context switching.

图2示出了向量处理器120的实施例的主要功能框图。向量处理器120包括一个取指单元(IFU)210、一个解码器220、一个调度器230、一个执行数据通路240及一个加载/存储单元(LSU)250。IFU210取指并处理流控制指令(如分支)。指令译码器220根据从IFU 210到达的顺序，每周期译码一条指令，并把从指令中译码出的字段值写到调度器230中的FIFO。调度器230根据执行操作步骤的需要，选择发送给执行控制寄存器的字段值。发送选择取决于操作数依赖性(dependency)和处理资源如执行数据通路240或装入/存储单元250的可用性。执行数据通路240执行操作向量或标量数据的逻辑/算术指令。装入/存储单元250执行访问向量处理器120的地址空间的装入/存储指令。FIG. 2 shows a main functional block diagram of an embodiment of the vector processor 120 . The vector processor 120 includes an instruction fetch unit (IFU) 210 , a decoder 220 , a scheduler 230 , an execution data path 240 and a load/store unit (LSU) 250 . IFU 210 fetches and processes flow control instructions (such as branches). The instruction decoder 220 decodes one instruction per cycle according to the order of arrival from the IFU 210, and writes the field value decoded from the instruction to the FIFO in the scheduler 230. The scheduler 230 selects field values to be sent to the execution control register according to the needs of executing the operation steps. The choice of dispatch depends on operand dependencies and the availability of processing resources such as execution data path 240 or load/store unit 250 . Execution data path 240 executes logic/arithmetic instructions that operate on vector or scalar data. Load/store unit 250 executes load/store instructions that access the address space of vector processor 120 .

图3示出了IFU210的一实施例的框图。IFU包括一个指令缓冲器，该缓冲器分成主指令缓冲器310和辅助指令缓冲器312。主缓冲器310包含8条连续指令，其中包括对应于当前程序计数的指令。辅助缓冲器312中包含紧接着缓冲器310中的指令的8条指令。IFU210亦包括一个分支目标缓冲器314，它包含8个连续指令，其中包括缓冲器310或312中下一个流控制指令的目标。在本实施例中，向量处理器120使用RISC型指令集，其中每条指令为32位长，缓冲器310、312或314是8×32位缓冲器，并通过256位指令总线连到高速缓存子系统130。IFU 210可在一个时钟周期内，把高速缓存子系统130中8条指令装载到缓冲器310、312或314中的任何一个中。寄存器340、342和344分别指示缓冲器310、312和314中装入指令的基地址。FIG. 3 shows a block diagram of an embodiment of IFU 210 . The IFU includes an instruction buffer divided into a main instruction buffer 310 and an auxiliary instruction buffer 312 . Main buffer 310 contains 8 consecutive instructions, including the instruction corresponding to the current program count. Auxiliary buffer 312 contains eight instructions immediately following the instruction in buffer 310 . IFU 210 also includes a branch target buffer 314 that contains eight consecutive instructions, including the target of the next flow control instruction in buffer 310 or 312 . In this embodiment, the vector processor 120 uses a RISC-type instruction set, where each instruction is 32 bits long, and the buffers 310, 312 or 314 are 8×32-bit buffers connected to the cache by a 256-bit instruction bus Subsystem 130. IFU 210 can load 8 instructions from cache subsystem 130 into any one of buffers 310, 312 or 314 in one clock cycle. Registers 340, 342, and 344 indicate the base address of the load instruction in buffers 310, 312, and 314, respectively.

多路选择器332从主指令缓冲器310中选择当前的指令。如果当前指令不是流控制指令，而且存储在指令寄存器330中的指令进展到执行的译码阶段，则当前指令被存到指令寄存器330，向程序计数被增量。在程序计数增量后，选择缓冲器310中最后一条指令，则下一组8条指令被装载到缓冲器310。如果缓冲器312包含所要的8条指令，则缓冲器312和寄存器342的内容立即移到缓冲器310和寄存器340，又有8条指令从高速缓存系统130预取指送到辅助缓冲器312。加法器350根据寄存器342中的基地址和由多路选择器352选择的偏移量，确定下一组指令的地址。由加法器350得到的结果地址被存储在寄存器342中，这是在该地址从寄存器342移到寄存器340时或以后进行的。计算出的地址随同8条指令的请求也送到高速缓存子系统130中。如果上次对高速缓存控制系统130的调用，在缓冲器310请求时，还没有向缓冲器312提供下面8条指令，则上次请求的指令，从高速缓存子系统130接收到时，立即存储到缓冲器310中。The multiplexer 332 selects the current instruction from the main instruction buffer 310 . If the current instruction is not a flow control instruction, and the instruction stored in instruction register 330 progresses to the decode stage of execution, then the current instruction is stored in instruction register 330 and the program count is incremented. After the program count is incremented, the last instruction in buffer 310 is selected and the next set of 8 instructions are loaded into buffer 310 . If buffer 312 contains the desired eight instructions, the contents of buffer 312 and register 342 are immediately moved to buffer 310 and register 340, and eight more instructions are prefetched from cache system 130 to auxiliary buffer 312. Adder 350 determines the address of the next set of instructions based on the base address in register 342 and the offset selected by multiplexer 352 . The resulting address obtained by adder 350 is stored in register 342, either when the address is moved from register 342 to register 340 or later. The calculated address is also sent to the cache subsystem 130 along with the request for 8 instructions. If the call to the cache control system 130 last time did not provide the following 8 instructions to the buffer 312 when the buffer 310 requested it, then the last requested instruction, when received from the cache subsystem 130, is stored immediately to buffer 310.

如果当前指令是流控制指令，IFU210通过对流控制指令条件的计算及在流控制指令之后更新程序计数来处理该指令。如果因为前面可能改变条件的指令没有完成，而条件不能确定时，使IFU210停顿。如果没有发生分支，程序计数器被增量，下面的指令如上所述被选择。如果发生分支且分支目标缓冲器314包含该分支的目标，则缓冲器314和寄存器344的内容被移到缓冲器310和寄存器340，以使IFU 210能继续为译码器220提供指令而不需等待来自高速缓存子系统130中的指令。If the current instruction is a flow control instruction, IFU 210 processes the instruction by evaluating the condition of the flow control instruction and updating the program count after the flow control instruction. If the condition cannot be determined because the previous instruction that may change the condition has not been completed, the IFU 210 is stopped. If no branch has taken, the program counter is incremented and the following instruction is selected as described above. If a branch occurs and branch target buffer 314 contains the target of the branch, the contents of buffer 314 and register 344 are moved to buffer 310 and register 340 so that IFU 210 can continue to provide instructions to decoder 220 without Waiting for instructions from cache subsystem 130 .

为了为分支目标缓冲器314预取指令，扫描器320扫描缓冲器310和312以定位跟着当前程序计数的下一个流控制指令。如果在缓冲器310或312中找到流控制指令，扫描器320确定从包含该指令的缓冲器310或312的基地址，到一组对准的(aligned)包括流控制指令目标地址的8条指令的偏移量。多路选择器352和354为加法器350提供流控制指令的偏移量和来自寄存器340或342的基地址，由加法器350为缓冲器314产生一个新的基地址。新的基地址被传到高速缓存子系统130，再则其为分支目标缓冲器314提供8条指令。To prefetch instructions for branch target buffer 314, scanner 320 scans buffers 310 and 312 to locate the next flow control instruction following the current program count. If a flow control instruction is found in buffer 310 or 312, scanner 320 determines from the base address of buffer 310 or 312 containing the instruction, to a set of aligned (aligned) 8 instructions that include the target address of the flow control instruction offset. Multiplexers 352 and 354 provide the offset of the flow control instruction and the base address from register 340 or 342 to adder 350 , which generates a new base address for buffer 314 . The new base address is passed to cache subsystem 130 , which in turn provides 8 instructions to branch target buffer 314 .

在处理流控制指令如“减量和条件转移”指令VD1CBR、VD2CBR和VD3CBR，及“改变控制寄存器”指令VCHGCR时，IFU210能改变除程序计数外的寄存器的值。当IFU 210找到一非流控制指令的指令时，该指令送到指令寄存器330，并从那里到译码器220。When processing flow control instructions such as "decrement and conditional branch" instructions VD1CBR, VD2CBR, and VD3CBR, and "change control register" instruction VCHGCR, IFU 210 can change the value of registers except the program count. When IFU 210 finds an instruction that is not a flow control instruction, the instruction is sent to instruction register 330 and from there to decoder 220.

如图4所示，译码器220通过把控制值写到调度器230的FIFO缓冲器410的各个字段来译码一指令。FIFO缓冲器410包括4行触发器，其中每一行可包含5个信息字段，用以控制一条指令的执行。行0到3分别保持最早到最新的指令的信息，当较早的信息随着指令完成而被去除时，在FIFO缓冲器410中的信息下移到更低的行。调度器230通过选择必需的指令字段装载到包含执行寄存器421到427的控制管道420来给执行阶段发出一指令。大部分指令能被调度，以便不按次序发出和执行。尤其是有关逻辑/算术操作和加载/存储操作的次序是任意的，除非在加载/存储操作和逻辑/算术操作之间有操作数依赖性。FIFO缓冲器410中字段值的比较指示是否有操作依赖性存在。As shown in FIG. 4 , the decoder 220 decodes an instruction by writing control values into various fields of the FIFO buffer 410 of the scheduler 230 . The FIFO buffer 410 includes 4 rows of flip-flops, each of which can contain 5 information fields to control the execution of an instruction. Rows 0 to 3 hold information for the oldest to newest instructions respectively, with information in the FIFO buffer 410 being moved down to lower rows as older information is removed as instructions complete. The scheduler 230 issues an instruction to the execution stage by selecting the necessary instruction fields to load into the control pipeline 420 comprising the execution registers 421-427. Most instructions can be scheduled so that they are issued and executed out of order. In particular, the order regarding logical/arithmetic operations and load/store operations is arbitrary unless there is an operand dependency between load/store operations and logical/arithmetic operations. A comparison of field values in FIFO buffer 410 indicates whether any operational dependencies exist.

图5A举例说明一个指令的6阶段执行流水线，该指令实现了寄存器到寄存器的操作，而不用访问向量处理器120的地址空间。在指令取指阶段511中，IFU210如上所述取指一指令。取指阶段需要1个时钟周期，除非由于流水线延迟、未解决的分支条件或在提供预取指令的高速缓存子系统130中的延迟，使IFU210停顿。在译码阶段512，译码器220译码来自IFU210的指令，并将该指令的信息写到调度器230。译码阶段512亦需要一个时钟周期，除非对新的操作，FIFO 410中无可得到的行。在FIFO 410的第一周期期间，能发出操作到控制管道420，但是会由于较早的操作的发出被延迟。FIG. 5A illustrates a 6-stage execution pipeline for an instruction that implements a register-to-register operation without accessing the vector processor 120 address space. In the instruction fetch stage 511, the IFU 210 fetches an instruction as described above. The instruction fetch stage takes 1 clock cycle unless the IFU 210 is stalled due to pipeline delays, unresolved branch conditions, or delays in the cache subsystem 130 providing prefetched instructions. In the decode stage 512 , the decoder 220 decodes the instruction from the IFU 210 and writes information for the instruction to the scheduler 230 . The decode stage 512 also takes one clock cycle unless there are no rows available in the FIFO 410 for a new operation. During the first cycle of FIFO 410, operations can be issued to control pipe 420, but will be delayed due to issuance of earlier operations.

执行数据通道240实现寄存器到寄存器的操作，并为加载/存储操作提供数据和地址。图6A示出了执行数据通路240一实施例的框图，并同执行阶段514、515和516一起被说明。执行寄存器421提供标识寄存器文件610中两个寄存器的信号，寄存器文件610在读阶段514期间的一个时钟周期内被读。寄存器文件610包括32个标量寄存器和64个向量寄存器。图6B是寄存器文件610的框图。寄存器文件610具有2个读端口和2个写端口，以便在每个时钟周期提供2个读和2个写。每个端口包括选择电路612、614、616或618以及288位的数据总线613、615、617或619。选择电路如电路612、614、616和618在本技术领域是熟知的，并且使用地址信号WRADDR1、WRADDR2、RDADDR1或RDADDR2，这是译码器220从一般是从指令里提供的5位寄存器号、一个来自指令或控制状态寄存器VCSR的一个组位、及指示寄存器是向量寄存器还是标量寄存器的指令句法中得到的。数据读的路径可以是通过多路选择器656到加载/存储单元250，或者通过多路选择器622和624，通过乘法器620算术逻辑单元630、累加器640。大部分操作读2个寄存器，并且读阶段514在一个周期内完成。然而，某些指令，如乘和加指令VMAD及操作双倍长度向量的指令需要多于2个寄存器的数据，致使读阶段514需超过一个时钟周期。Execution data path 240 implements register-to-register operations and provides data and addresses for load/store operations. FIG. 6A shows a block diagram of one embodiment of the execution data path 240 and is illustrated along with execution stages 514 , 515 , and 516 . Execution registers 421 provide signals identifying two registers in register file 610 , which are read in one clock cycle during read phase 514 . Register file 610 includes 32 scalar registers and 64 vector registers. FIG. 6B is a block diagram of register file 610 . Register file 610 has 2 read ports and 2 write ports to provide 2 reads and 2 writes per clock cycle. Each port includes a selection circuit 612 , 614 , 616 or 618 and a data bus 613 , 615 , 617 or 619 of 288 bits. Selection circuits such as circuits 612, 614, 616, and 618 are well known in the art and use address signals WRADDR1, WRADDR2, RDADDR1 or RDADDR2, which are obtained from the decoder 220, typically from a 5-bit register number provided in the instruction, A group bit from the instruction or control status register VCSR and the instruction syntax indicating whether the register is a vector register or a scalar register. The data read path can be through the multiplexer 656 to the load/store unit 250 , or through the multiplexers 622 and 624 , through the multiplier 620 ALU 630 , and the accumulator 640 . Most operations read 2 registers, and the read phase 514 completes in one cycle. However, some instructions, such as the multiply and add instruction VMAD and instructions that operate on double-length vectors, require more than 2 registers of data, causing the read phase 514 to take more than one clock cycle.

在执行阶段515，乘法器620、算术逻辑单元630及累加器640处理前面从寄存器文件610读出的数据。如果为了读必要的数据要求多个周期，执行阶段515可与读阶段514重叠。执行阶段515的持续时间取决于处理数据元素的类型(整型或浮点型)和数量(读周期数据)。来自执行寄存器422、423和425的信号控制插入数据到算术逻辑单元630、累加器640、和乘法器620以便在执行阶段实现第一步操作。来自执行寄存器432、433和435的信号控制在执行阶段515实现第二步操作。In the execution stage 515 , the multiplier 620 , the ALU 630 and the accumulator 640 process the data previously read from the register file 610 . Execute phase 515 may overlap with read phase 514 if multiple cycles are required to read the necessary data. The duration of the execution phase 515 depends on the type (integer or floating point) and quantity (read cycle data) of the processed data elements. Signals from execute registers 422, 423, and 425 control the insertion of data into ALU 630, accumulator 640, and multiplier 620 to implement the first step in the execute phase. Signals from execute registers 432 , 433 and 435 control the execution stage 515 to implement the second step.

图6C示出了乘法器620和ALU 630一实施例的框图。乘法器620是一个整数乘法器，它包括8个独立的36×36位乘法器626。每个乘法器626包括通过控制电路连结在一起的4个9×9位乘法器。对8位和9位数据元素宽度，来自调度器230的控制信号断开4个9×9位乘法器的相互连结，以使每个乘法器626实现4个乘法，乘法器620在一个周期内实现32个独立的乘法。对16位数据元素，控制电路把9×9位乘法器对连接在一起操作。乘法器620实现16个并行乘法。对32位整型数据元素类型，8个乘法器626每个时钟周期实现8个并行乘法。乘法的结果对9位数据元素宽度提供576位结果，对其它数据长度提供512位结果。FIG. 6C shows a block diagram of an embodiment of multiplier 620 and ALU 630. Multiplier 620 is an integer multiplier comprising eight independent 36×36 bit multipliers 626 . Each multiplier 626 includes four 9x9 bit multipliers connected together by control circuitry. For 8-bit and 9-bit data element widths, the control signal from the scheduler 230 disconnects the four 9×9-bit multipliers so that each multiplier 626 performs four multiplications, and the multipliers 620 perform 4 multiplications in one cycle. Implement 32 independent multiplications. For 16-bit data elements, the control circuit operates by concatenating pairs of 9x9-bit multipliers. Multiplier 620 implements 16 parallel multiplications. For 32-bit integer data element types, eight multipliers 626 implement eight parallel multiplications per clock cycle. The result of the multiplication provides a 576-bit result for 9-bit data element widths and a 512-bit result for other data lengths.

ALU 630能在2个时钟周期中处理来自乘法器620的576位或512位结果。ALU 630包括8个独立的36位ALUs 636，每个ALU 636包括用于浮点加和乘的一个32×32位浮点部件。附加电路实现整数移位、算术和逻辑功能。对于整数操作，每个ALU 636包括能独立进行8位和9位操作的4个单元，对16位和32位整型数据元素，每2个或4个能够组成一组连在一起。ALU 630 can process a 576-bit or 512-bit result from multiplier 620 in 2 clock cycles. ALU 630 includes eight independent 36-bit ALUs 636, each ALU 636 includes a 32x32-bit floating point unit for floating point addition and multiplication. Additional circuitry implements integer shift, arithmetic, and logic functions. For integer operations, each ALU 636 includes 4 units that can independently perform 8-bit and 9-bit operations, and for 16-bit and 32-bit integer data elements, each 2 or 4 can form a group and connect together.

累加器640累加结果，并包括2个576位寄存器，以便实现中间结果的较高精度。Accumulator 640 accumulates results and includes two 576-bit registers for higher precision of intermediate results.

在写阶段516，来自执行阶段的结果存储在寄存器文件610。在一个时钟周期内，能写2个寄存器，输入多路选择器602和605选择要被写的2个数据值。一次操作的写阶段516的持续时间取决于作为操作结果要被写的数据量和来自LSU 250的竞争，LSU 250可能正在通过对寄存器文件610进行写来完成装载指令。来自执行寄存器426和427的信号选择把来自逻辑单元630、累加器640和乘法器620的数据写入的寄存器。In the write phase 516 , the results from the execute phase are stored in the register file 610 . In one clock cycle, 2 registers can be written, and input multiplexers 602 and 605 select the 2 data values to be written. The duration of the write phase 516 of an operation depends on the amount of data to be written as a result of the operation and contention from the LSU 250, which may be completing a load instruction by writing to the register file 610. Signals from execute registers 426 and 427 select the registers into which data from logic unit 630 , accumulator 640 and multiplier 620 are written.

图5B示出执行装载指令的执行流水线520。对于执行流水线520的指令取指阶段511、译码阶段512和发出阶段513与所说明的寄存器到寄存器的操作是相同的。读阶段514也同上面的说明相同，只是执行数据通路240用来自寄存器文件610的数据以确定调用高速缓存子系统130的地址。在地址阶段525，多路选择器652、654和656选择地址，该地址被提供给执行阶段526和527的加载/存储单元250。在加载/存储单元250处理操作的同时，在阶段526和527期间，加载操作的信息保持在FIFO 410中。Figure 5B shows an execution pipeline 520 that executes a load instruction. The instruction fetch stage 511 , decode stage 512 and issue stage 513 of the execution pipeline 520 are identical to the illustrated register-to-register operations. The read phase 514 is also the same as described above, except that the execution datapath 240 uses data from the register file 610 to determine the address of the calling cache subsystem 130 . In address stage 525 , multiplexers 652 , 654 , and 656 select addresses, which are provided to load/store unit 250 in execute stages 526 and 527 . Information for the load operation is maintained in FIFO 410 during stages 526 and 527 while load/store unit 250 processes the operation.

图7示出了加载/存储单元250的一个实施例。在阶段256期间调用高速缓存子系统130，以请求阶段525所确定的地址的数据。本实施例使用基于事务的(transaction based)高速缓存调用，其中包括处理器110和120的多个设备可通过高速缓存子系统130存取局部地址空间。在调用高速缓存子系统130之后的几个周期中，被请求的数据可能得不到，但在其它调用挂起时，加载/存储单元250能调用高速缓存子系统。因此，加载/存储单元250不致停顿。高速缓存子系统130提供被请求的数据所需的时钟周期数取决于数据高速缓存194的命中或未命中(hit or miss)。One embodiment of a load/store unit 250 is shown in FIG. 7 . The cache subsystem 130 is invoked during stage 256 to request data at the address determined by stage 525 . This embodiment uses transaction based cache calls, where multiple devices including processors 110 and 120 can access a local address space through cache subsystem 130 . The requested data may not be available for several cycles after a call to cache subsystem 130, but load/store unit 250 can call the cache subsystem while other calls are pending. Therefore, load/store unit 250 does not stall. The number of clock cycles required for cache subsystem 130 to provide the requested data depends on a data cache 194 hit or miss.

在驱动阶段527，高速缓存子系统130为加载/存储单元250确认(assert)一个数据信号。高速缓存子系统130可在每个周期给加载/存储单元250提供256位(32个字节)的数据，字节对准器710在相应9位存储位置对准32个字节的每个字节，以提供288位的值。288位的格式对例如MPEG编码和解码的多媒体应用是方便的，它们有时用9位数据元素。288位值写入读数据缓冲器720。对写阶段528，调度器230把FIFO缓冲器410的字段4传送到执行寄存器426或427，将数据缓冲器720的288位的值写入寄存器文件610。During drive phase 527 , cache subsystem 130 asserts a data signal for load/store unit 250 . The cache subsystem 130 can provide 256 bits (32 bytes) of data to the load/store unit 250 per cycle, and the byte aligner 710 aligns each word of 32 bytes at a corresponding 9-bit storage location. section to provide a 288-bit value. The 288-bit format is convenient for multimedia applications such as MPEG encoding and decoding, which sometimes use 9-bit data elements. A 288-bit value is written to read data buffer 720 . For write phase 528 , scheduler 230 transfers field 4 of FIFO buffer 410 to execute register 426 or 427 and writes the 288-bit value of data buffer 720 to register file 610 .

图5C示出了执行存储指令所用的执行流水线530。执行流水线530的取指阶段511、译码阶段512和发出阶段513同前面说明的相同。读阶段514亦同前面说明的相同，只是读阶段读要存储的数据和地址计算所用的数据。要被存储的数据被写入加载/存储单元250中的写数据缓冲器730。多路选择器740把9位字节格式的数据转换成传统的8位字节的格式。从缓冲器730来的转换过的数据和来自地址计算阶段525的相关地址，在SRAM阶段536期间被并行地送到高速缓存子系统130。Figure 5C shows the execution pipeline 530 used to execute the store instruction. The instruction fetch stage 511 , the decode stage 512 and the issue stage 513 of the execution pipeline 530 are the same as those described above. The reading stage 514 is also the same as that described above, except that the data to be stored and the data used for address calculation are read in the reading stage. Data to be stored is written to the write data buffer 730 in the load/store unit 250 . Multiplexer 740 converts data in 9-bit byte format to conventional 8-bit byte format. The translated data from buffer 730 and the associated addresses from address computation stage 525 are sent to cache subsystem 130 during SRAM stage 536 in parallel.

在向量处理器的实施例中，每个指令是32位长并具有图8中所示的9种格式中的一种格式，且标记为REAR、REAI、RRRM5、RRRR、RI、CT、RRRM9、RRRM9^*、和RRRM9^**。附录E说明了向量处理器120的指令集。In the vector processor embodiment, each instruction is 32 bits long and has one of the nine formats shown in FIG. 8 and is labeled REAR, REAI, RRRM5, RRRR, RI, CT, RRRM9, RRRM9 ^* , and RRRM9 ^** . Appendix E describes the instruction set for vector processor 120 .

当确定一个有效地址时，使用标量寄存器的某些加载、存储和高速缓存操作具有REAR格式。REAR格式指令用位29-31是000b来标识且有3个操作数通过3个寄存器号标识，2个寄存器号SRb和SRi为标量寄存器，寄存器号Rn可以是标量或向量寄存器，这取决于位D。组位B或者为寄存器Rn标识一个组，或者如果缺省向量寄存器大小是双倍长度时指示是否向量寄存器Rn是双倍长度。操作码字段Opc标识对操作数实行的操作，而字段TT指示传送类型为加载或存贮。典型的REAR格式指令是指令VL，它从标量寄存器SRb和SRi内容相加确定的地址来加载寄存器Rn。如果位A被置位，所计算的地址被存储在标量寄存器SRb中。Some load, store, and cache operations using scalar registers have a REAR format when determining an effective address. The REAR format instruction is identified by bits 29-31 as 000b and has 3 operands identified by 3 register numbers, the 2 register numbers SRb and SRi are scalar registers, and the register number Rn can be a scalar or vector register, depending on the bit d. The group bit B either identifies a group for register Rn, or indicates whether vector register Rn is double length if the default vector register size is double length. The opcode field Opc identifies the operation to be performed on the operand, and the field TT indicates the type of transfer as load or store. A typical REAR format instruction is instruction VL, which loads register Rn from an address determined by adding the contents of scalar registers SRb and SRi. If bit A is set, the calculated address is stored in scalar register SRb.

REAI格式指令同REAR指令相同，只是来自字段IMM的8位立即值被用来代替标量寄存器SRi的内容。REAR和REAI格式无数据元素长度字段。The REAI format instruction is the same as the REAR instruction, except that the 8-bit immediate value from field IMM is used instead of the contents of the scalar register SRi. The REAR and REAI formats have no data element length field.

RRRM5格式用于具有2个源操作数和一个目的操作数的指令。这些指令具有3个寄存器操作数或2个寄存器操作数及1个5位立即值。在附录E中所示的字段D、S和M的编码确定是否第一个源操作数Ra是标量或向量寄存器；是否第2个源操作数Rb/IM5是标量寄存器、向量寄存器或5位立即值；及是否目的寄存器Rd是标量或向量寄存器。The RRRM5 format is used for instructions with 2 source operands and one destination operand. These instructions have 3 register operands or 2 register operands and a 5-bit immediate value. The encoding of fields D, S, and M shown in Appendix E determines whether the first source operand Ra is a scalar or vector register; whether the second source operand Rb/IM5 is a scalar register, a vector register, or a 5-bit immediate value; and whether the destination register Rd is a scalar or vector register.

RRRR格式用于具有4个寄存器操作数的指令。寄存器号Ra和Rb指示源寄存器。寄存器号Rd指示目的寄存器，而寄存器号Rc指示源或目的寄存器，这取决于字段Opc。全部操作数是向量寄存器，除非位S被置位指示寄存器Rb是标量寄存器。字段DS指示向量寄存器的数据元素长度。字段Opc选择32位数据元素的数据类型。The RRRR format is used for instructions with 4 register operands. Register numbers Ra and Rb indicate source registers. The register number Rd indicates the destination register, while the register number Rc indicates the source or destination register, depending on the field Opc. All operands are vector registers, unless bit S is set to indicate that register Rb is a scalar register. Field DS indicates the data element length of the vector register. Field Opc selects the data type of the 32-bit data element.

RI格式指令给寄存器加载一个立即值。字段IMM包括可多达18位的立即值。寄存器号Rd指示目的寄存器，该目的寄存器是当前组内的向量寄存器或标量寄存器，这取决于位D。字段DS和F分别指示数据元素的长度和类型。对32位整型数据元素，18位立即值在加载到寄存器Rd以前作符号扩展。对浮点数据元素，位18、位17到10及位9到0分别表示32位浮点值的符号、指数和尾数。The RI format instruction loads an immediate value into a register. Field IMM contains an immediate value of up to 18 bits. Register number Rd indicates the destination register, which is either a vector register or a scalar register within the current bank, depending on bit D. Fields DS and F indicate the length and type of the data element, respectively. For 32-bit integer data elements, the 18-bit immediate value is sign-extended before being loaded into register Rd. For floating-point data elements, bit 18, bits 17 through 10, and bits 9 through 0 represent the sign, exponent, and mantissa, respectively, of a 32-bit floating-point value.

CT格式用于流控制指令，它包括操作码字段Opc、条件字段Cond、和23位的立即值IMM。当条件字段指示条件为真时，则发生分支。可能的条件码是“always(无条件)”、“Less than(小于)”、“equal(等于)”、“Less than orequal(小于或等于)”、“greater than(大于)”、“not equal(不等于)”、“greaterthan or equal(大于或等于)”和“overflow(溢出)”。状态和控制寄存器VCSR中的位GT、EQ、LT和SO用于评价条件。The CT format is used for flow control instructions, and it includes an opcode field Opc, a condition field Cond, and a 23-bit immediate value IMM. Branching occurs when the condition field indicates that the condition is true. Possible condition codes are "always (unconditional)", "Less than (less than)", "equal (equal)", "Less than orequal (less than or equal to)", "greater than (greater than)", "not equal ( Not equal to), "greaterthan or equal" and "overflow". The bits GT, EQ, LT and SO in the status and control register VCSR are used to evaluate the conditions.

格式RRRM9提供3个寄存器操作数或2个寄存器操作数及1个9位立即值。位D、S和M的组合指示哪些操作数是向量寄存器、标量寄存器或9位立即值。字段DS指示数据元素长度。RRRM9^*和RRRM9^**格式是RRRM9格式的特殊情况，并且用操作码字段Opc来区别。RRRM9*格式用条件码Cond和ID字段替代源寄存器号Ra。RRRM9^**格式用条件码Cond和位K代替立即值的各最高有效位。RRRM9^*和RRRM9^**的进一步说明在附录E中给出，涉及到条件转移指令VCMOV、元素屏蔽条件转移CMOVM及比较和设置屏蔽指令CMPV。The format RRRM9 provides 3 register operands or 2 register operands and a 9-bit immediate value. Combinations of bits D, S, and M indicate which operands are vector registers, scalar registers, or 9-bit immediate values. Field DS indicates the data element length. The RRRM9 ^* and RRRM9 ^** formats are special cases of the RRRM9 format and are distinguished by the opcode field Opc. The RRRM9* format replaces the source register number Ra with the condition code Cond and ID fields. The RRRM9 ^** format replaces the most significant bits of the immediate value with the condition code Cond and bit K. Further descriptions of RRRM9 ^* and RRRM9 ^** are given in Appendix E, involving conditional transfer instruction VCMOV, element mask conditional transfer CMOVM and compare and set mask instruction CMPV.

虽然结合具体的实施例对本发明作出了说明，但这些说明仅仅是本发明应用的一个例子，不应该当成是一种限制，此外所公开的实施例特点的各种修改和组合仍属于下面权利要求所界定的本发明的范围。Although the present invention has been described in conjunction with specific embodiments, these descriptions are only an example of the application of the present invention and should not be regarded as a limitation. In addition, various modifications and combinations of the disclosed embodiment features still belong to the following claims The scope of the invention is defined.

附录AAppendix A

在示范性实施例中，处理器110是按照ARM7处理器标准的通用处理器。在ARM7中对寄存器的说明参考ARM结构文件或ARM7数据表(文件号ARM DDI 0020C，1994年12月发行)。In the exemplary embodiment, processor 110 is a general-purpose processor according to the ARM7 processor standard. For the description of the register in ARM7, refer to the ARM structure file or the ARM7 data sheet (document number ARM DDI 0020C, issued in December 1994).

为了与向量处理器120相互配合，110处理器：起动和停止向量处理器；测试向量处理器状态，包括同步状态；从向量处理器120中的标量/专用寄存器把数据传到处理器110的通用寄存器中；以及把通用寄存器中的数据传到向量处理器的标量/专用寄存器。在通用寄存器和向量处理器的向量寄存器之间，没有直接的传送装置，这些传送需要存贮器作为中介体。To interoperate with vector processor 120, processor 110: starts and stops the vector processor; tests vector processor status, including synchronization status; passes data from scalar/special purpose registers in vector processor 120 to general purpose registers; and transfer data from general-purpose registers to scalar/special-purpose registers of the vector processor. There are no direct transfers between the general-purpose registers and the vector registers of the vector processor, and these transfers require memory as an intermediary.

表A.1说明了为了向量处理器的交互作用而扩展的ARM7指令集。Table A.1 illustrates the ARM7 instruction set extensions for vector processor interaction.

表A.1：扩展的ARM7指令集指令结果 STARTVP 该指令使向量处理器进入VP-RUN状态，若向量处理器已经进入VP-RUN状态则无影响。STARTVP作为ARM7结构中处理器数据操作(CDP)类来执行，无结果返回到ARM7，ARM7继续其执行。 INTVP 该指令使向量处理器进入VP-IDEL状态，若向量处理器已经进入VP-IDEL状态则无影响。INTVP作为ARM7结构中处理器数据操作(CDP)类来执行，无结果返回到ARM7，ARM7继续其执行。 TESTSET 该指令读用户扩展寄存器，并把寄存器位30置成1，以使在向量和ARM7处理器之间提供生产者/消费者类型的同步。在ARM7结构中，TESTSET作为处理器寄存器传输(MRC)类来执行。ARM7被阻塞，直到指令被执行为止(寄存器被传送)。 MFER 从扩展寄存器转移到ARM通用寄存器，在ARM7结构中，MFER作为处理器寄存器传送(MRC)类来执行。ARM7被阻塞，直到指令被执行为止(寄存器被传送)。指令结果 MFVP 从向量处理器的标量/专用寄存器转移到ARM7通用寄存器。不同于其它ARM7指令，该指令只在向量处理器处于VP-IDLE状态时执行。否则其结果未定义。在ARM7结构中，MFVP作为处理器寄存器传送(MRC)类来执行。ARM7被阻塞，直到指令被执行为止(寄存器被传送)。 MTER 从ARM7通用寄存器转移到扩展寄存器，在ARM7结构中，MTER作为协处理器寄存器传送(MCR)类来执行。ARM7被阻塞，直到该指令被执行为止(寄存器被传送)。 MTVP 从ARM7通用寄存器转移到向量处理器的标量/专用寄存器，不同于其它ARM7指令，该指令仅在向量处理器处于VP_IDLE状态时执行。否则其结果未定义。在ARM7结构中，MTVP作为协处理器寄存器传送(MCR)类未执行。ARM7被阻塞，直到该指令被执行为止(寄存器被传送)。 CACHE 提供ARM7数据高速缓存的软件管理 PFTCH 预取一个高速缓存行，送到ARM7数据高速缓存。 WBACK 把ARM7数据高速缓存来的一高速缓存行回写到存贮器中。 Table A.1: Extended ARM7 instruction set instruction result STARTVP This instruction makes the vector processor enter the VP-RUN state, and it has no effect if the vector processor has already entered the VP-RUN state. STARTVP is implemented as a processor data operation (CDP) class in the ARM7 structure, and no result is returned to the ARM7, and the ARM7 continues its execution. INTVP This instruction makes the vector processor enter the VP-IDEL state, and it has no effect if the vector processor has already entered the VP-IDEL state. INTVP is implemented as a processor data operation (CDP) class in the ARM7 structure, and no result is returned to the ARM7, and the ARM7 continues its execution. TESTSET This instruction reads the user extension register and sets register bit 30 to 1 to provide producer/consumer type synchronization between vector and ARM7 processors. In the ARM7 architecture, TESTSET is implemented as a processor register transfer (MRC) class. ARM7 is blocked until the instruction is executed (registers are transferred). MFER Transferring from extended registers to ARM general purpose registers, in the ARM7 architecture, MFER is implemented as a processor register transfer (MRC) class. ARM7 is blocked until the instruction is executed (registers are transferred). instruction result MFVP Transfer from vector processor scalar/special purpose registers to ARM7 general purpose registers. Unlike other ARM7 instructions, this instruction is only executed when the vector processor is in the VP-IDLE state. Otherwise the result is undefined. In the ARM7 architecture, MFVP is implemented as a processor register transfer (MRC) class. ARM7 is blocked until the instruction is executed (registers are transferred). MTER Transferring from ARM7 general-purpose registers to extended registers, MTER is implemented as a coprocessor register transfer (MCR) class in the ARM7 architecture. ARM7 is blocked until the instruction is executed (registers are transferred). MTVP Transfer from ARM7 general purpose registers to vector processor scalar/special purpose registers, unlike other ARM7 instructions, this instruction is only executed when the vector processor is in VP_IDLE state. Otherwise the result is undefined. In the ARM7 architecture, MTVP is not implemented as a coprocessor register transfer (MCR) class. ARM7 is blocked until the instruction is executed (registers are transferred). CACHE Provides software management of the ARM7 data cache PFTCH Prefetch a cache line and send it to the ARM7 data cache. WBACK Write back a cache line from the ARM7 data cache to memory.

表A.2列出了ARM7的异常，在执行故障指令之前，检测和报告这些异常。异常向量地址以十六进制表示法给出。Table A.2 lists the ARM7 exceptions that are detected and reported before the faulting instruction is executed. Exception vector addresses are given in hexadecimal notation.

表A.2：ARM7异常异常向量说明 0x00000000 ARM7复位 0x00000004 ARM7未定义指令异常 0x00000004 向量处理器不可得到异常 0x00000008 ARM7软件中断 0x0000000C ARM7单步异常 0x0000000C ARM7指令地址断点异常 0x00000010 ARM7数据地址断点异常 0x00000010 ARM7非法数据地址异常 0x00000018 ARM7保护违章异常 Table A.2: ARM7 Exceptions exception vector illustrate 0x00000000 ARM7 reset 0x00000004 ARM7 undefined instruction exception 0x00000004 vector processor unreachable exception 0x00000008 ARM7 software interrupt 0x0000000C ARM7 single step exception 0x0000000C ARM7 instruction address breakpoint exception 0x00000010 ARM7 data address breakpoint exception 0x00000010 ARM7 illegal data address exception 0x00000018 ARM7 protection violation exception

下面说明对ARM7指令集扩充的句法。关于术语的说明和指令的格式参考ARM结构文件或ARM7数据表(文件号ARM DDI 0020C，1994年12月发表)。The following explains the syntax of the ARM7 instruction set extension. Refer to the ARM structure document or the ARM7 data sheet (document number ARM DDI 0020C, published in December 1994) for the description of the terms and the format of the instruction.

ARM结构为协处理器接口提供3种指令格式：The ARM architecture provides 3 instruction formats for the coprocessor interface:

1.协处理器数据操作(CDP)1. Coprocessor Data Operation (CDP)

2.协处理器数据传送(LDC，STC)2. Coprocessor data transfer (LDC, STC)

3.协处理器寄存器传送(MRC，MCR)3. Coprocessor register transfer (MRC, MCR)

MSP结构的扩展使用全部二种格式。The extension of the MSP structure uses both formats.

为操作使用的协处理器数据操作格式(CDP)不需返回给ARM7。CDP格式The coprocessor data manipulation format (CDP) used for the operation need not be returned to the ARM7. CDP format

30 25 20 15 10 5 030 25 20 15 10 5 0

CDP格式字段具有下列约定：字段意义 Cond 条件字段，该字段指定指令执行条件 Opc 协处理器操作码 CRn 协处理器操作数寄存器 CRd 协处理器目的寄存器 CP# 协处理器号；下面的协处理器号是当前使用的：1111-ARM7数据高速缓存0111-向量处理器，扩展的寄存器 CP 协处理器信息 CPm 协处理器操作数寄存器 The CDP format fields have the following conventions: field significance Cond Condition field, which specifies the instruction execution condition OPC coprocessor opcode CRn coprocessor operand register CRd coprocessor destination register CP# Coprocessor number; the following coprocessor numbers are currently used: 1111 - ARM7 data cache 0111 - vector processor, extended registers CP Coprocessor Information CPm coprocessor operand register

协处理器数据传送格式(LDC，STC)用于直接地加载或存贮向量处理器的寄存器子集到存贮器。ARM7处理器负责提供字地址，而向量处理器提供或接收数据，并且控制传送的字数。更详细内容参考ARM7数据表。LDC，STC格式Coprocessor data transfer formats (LDC, STC) are used to directly load or store a subset of vector processor registers to memory. The ARM7 processor is responsible for providing word addresses, while the vector processor provides or receives data and controls the number of words transferred. For more details, refer to the ARM7 data sheet. LDC, STC format

30 25 20 15 10 5 030 25 20 15 10 5 0

格式字段具有下列约定：字段意义 Cond 条件字段，该字段指定指令执行条件 P Pre/Post标志位 U Up/Down位 N 传送长度，由于CRd字段没有足够的位数，位N作为一部分源或目的寄存器标识符使用。 W 回写位 L 加载/存贮位 Rn 基址寄存器 CRn 协处理器源/目的寄存器 CP# 协处理器号，下列协处理器号是当前使用的：1111-ARM7数据高速缓存0111-向量处理器，扩展的寄存器 Offset 无符号8位立即偏移量 Format fields have the following conventions: field significance Cond Condition field, which specifies the instruction execution condition P Pre/Post flag u Up/Down bit N Transfer length, since the CRd field does not have enough bits, bit N is used as part of the source or destination register identifier. W write back bit L load/store bit n base register CRn Coprocessor source/destination registers CP# Coprocessor number, the following coprocessor numbers are currently used: 1111 - ARM7 data cache 0111 - vector processor, extended registers Offset unsigned 8-bit immediate offset

协处理器寄存器传送格式(MRC，MCR)用于直接地在ARM7和向量处理器之间传送信息。该格式用于在ARM7寄存器和向量处理器的标量或专用寄存器之间的转移。The coprocessor register transfer format (MRC, MCR) is used to transfer information directly between the ARM7 and the vector processor. This format is used for transfers between ARM7 registers and vector processor scalar or special purpose registers.

MRC，MCR格式MRC, MCR format

30 25 20 15 10 5 030 25 20 15 10 5 0

该格式字段具有下列约定：字段意义 Cond 条件字段，该字段指定指令执行的条件 Opc 协处理器操作码 L 加载/存贮位L＝0移到向量处理器L＝1移自向量处理器 CRn：Crm 协处理器源/目的寄存器。仅CRn<1：0>：CRm<3：0>被使用 Rd ARM源/目的寄存器 CP# 协处理器号，下列协处理器号是当前使用的：1111＝ARM7数据高速缓存0111＝向量处理器，扩展的寄存器 CP 协处理器信息 The format field has the following conventions: field significance Cond Condition field, which specifies the conditions under which the instruction executes OPC coprocessor opcode L Load/Store bit L=0 is moved to the vector processor L=1 is moved from the vector processor CRn: Crm Coprocessor source/destination registers. Only CRn<1:0>:CRm<3:0> are used Rd ARM source/destination registers CP# Coprocessor numbers, the following coprocessor numbers are currently used: 1111 = ARM7 data cache 0111 = vector processor, extended registers CP Coprocessor Information

扩展的ARM指令说明Extended ARM instruction description

扩展的ARM指令按字母顺序予以说明。Extended ARM instructions are described in alphabetical order.

CACHE 高速缓存操作CACHE cache operation

格式Format

30 25 20 15 10 5 030 25 20 15 10 5 0

汇编器句法assembler syntax

STC{cond}p15，cOpc，<Address>STC{cond}p15, cOpc, <Address>

CACHE{cond}Opc，<Address>CACHE{cond}Opc,<Address>

其中Cond＝{eq，he，cs，cc，mi，pl，vs，vc，hi，Is，ge，It，gt，le，ai，nv}和Opc＝{0，1，3}。注意，因为LDC/STC格式的CRn字段用于指定Opc。在第一种句法中Opcode的十进制表示必须由字母“C”打头(即用CO代表0)。关于地址方式句法参考ARM7数据表。where Cond = {eq, he, cs, cc, mi, pl, vs, vc, hi, Is, ge, It, gt, le, ai, nv} and Opc = {0, 1, 3}. Note that because the CRn field of the LDC/STC format is used to specify the Opc. In the first syntax, the decimal representation of Opcode must start with the letter "C" (that is, CO represents 0). Refer to the ARM7 datasheet for address mode syntax.

说明illustrate

只有在Cond为真时，执行该指令。Opc<3：0>指示如下操作： Opc<3：0> 意义 0000 回写和作废由EA指定的改动过的高速缓存行。如果匹配的行包含未改动过的数据，该行被作废，不予回写。如果找不着包含EA的高速缓存行，数据高速缓存保持原封不动。 0001 回写和作废由EA牵引指定的改动过的高速缓存行。如果匹配行包含未改动的数据，该行被作废不予回写。 0010 用于PFTCH和WBACK指令 0011 作废由EA指定的高速缓存行。即使该行被改动过，该高速缓存行亦被作废(不回写)。这是一种特权操作，如果试图在用户方式下使用，它将引起ARM7保护违章其它保留 Execute the instruction only if Cond is true. Opc<3:0> indicates the following operations: Opc<3:0> significance 0000 Write back and invalidate changed cache lines specified by EA. If the matching row contains unchanged data, the row is invalidated and not written back. If no cache line containing the EA can be found, the data cache is left intact. 0001 Write back and invalidate changed cache lines specified by EA pulls. If a matching row contains unchanged data, the row is discarded and not written back. 0010 For PFTCH and WBACK instructions 0011 Invalidate the cache line specified by the EA. Even if the line has been modified, the cache line is invalidated (not written back). This is a privileged operation and will cause an ARM7 protection violation if attempted in user mode other reserve

操作operate

参考ARM7数据表，EA如何计算。Refer to the ARM7 datasheet, how the EA is calculated.

异常abnormal

ARM7保护违章。ARM7 protection violation.

INTVP中断向量处理器INTVP interrupt vector processor

格式 30 25 20 15 10 5 0Format 30 25 20 15 10 5 0

汇编器句法assembler syntax

CDP{cond}p7，1，c0，c0，coCDP{cond}p7,1,c0,c0,co

INTVP{cond}INTVP{cond}

其中cond＝{eq，ne，cs，cc，mi，pl，vs，vc，hi，Is，ge，It，gt，le，al，ns}。where cond = {eq, ne, cs, cc, mi, pl, vs, vc, hi, Is, ge, it, gt, le, al, ns}.

说明illustrate

该指令仅在Cond为真时执行。该指令发信号使向量处理器停止。ARM7不必等待向量处理器停止，继续执行下一指令。This instruction is only executed when Cond is true. This instruction signals the vector processor to stall. ARM7 does not have to wait for the vector processor to stop and continue to execute the next instruction.

应使用MFER忙等待循环以便看看向量处理器是否在该指令执行后已停止。如果向量处理器已经在VP_IDLE状态，则该指令不起作用。位19：12、7：15和3：0被保留。A MFER busy-wait loop should be used to see if the vector processor has stalled after this instruction has been executed. This instruction has no effect if the vector processor is already in VP_IDLE state. Bits 19:12, 7:15 and 3:0 are reserved.

异常abnormal

向量处理器不可得到。Vector processor not available.

MFER 自扩展寄存器转移MFER transfer from extended register

格式Format

30 25 20 15 10 5 030 25 20 15 10 5 0

汇编器句法assembler syntax

MRC{cond}p7，2，Rd，cP，cER，0MRC{cond}p7,2,Rd,cP,cER,0

MFER{cond}Rd，RNAMEMFER{cond}Rd, RNAME

其中Cond＝{eg，he，cs，cc，mi，pl，rs，vc，hi，ls，ge，lt，gt，le，al，nv}，Rd＝{r0，...r15}，P＝{0，1}，ER＝{0，....15}而RNAME指的是结构上指定的寄存器助记符(即，PERO或CSR)。where Cond={eg, he, cs, cc, mi, pl, rs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd={r0,...r15}, P= {0, 1}, ER = {0, . . . 15} and RNAME refers to the architecturally specified register mnemonic (ie, PERO or CSR).

说明illustrate

该指令仅在Cond为真时执行。ARM7寄存器Rd根据以P：ER<3：0>指定的扩展寄存器ER转移，如下表所示。参考章节1.2扩展寄存器的说明。 ER<3：0> P＝0 P＝1 0000 UER0 PER0 0001 UER1 PER1 0010 UER2 PER2 0011 UER3 PER3 0100 UER4 PER4 0101 UER5 PER5 0110 UER6 PER6 0111 UER7 PER7 1000 UER8 PER8 1001 UER9 PER9 ER<3：0> P＝0 P＝1 1010 UER10 PER10 1011 UER11 PER11 1100 UER12 pER12 1101 UER13 PER13 1110 UER14 PER14 1111 UER15 PER15 This instruction is only executed when Cond is true. The ARM7 register Rd is transferred according to the extended register ER specified by P:ER<3:0>, as shown in the table below. Refer to Chapter 1.2 Explanation of Extended Registers. ER<3:0> P = 0 P=1 0000 UER0 PER0 0001 UER1 PER1 0010 UER2 PER2 0011 UER3 PER3 0100 UER4 PER4 0101 UER5 PER5 0110 UER6 PER6 0111 UER7 PER7 1000 UER8 PER8 1001 UER9 PER9 ER<3:0> P = 0 P=1 1010 UER10 PER10 1011 UER11 PER11 1100 UER12 pER12 1101 UER13 PER13 1110 UER14 PER14 1111 UER15 PER15

位19：17及7：5被保留Bits 19:17 and 7:5 are reserved

异常abnormal

当试图在用户方式中访问PERx时，保护违章。Protection violation when attempting to access PERx in user mode.

MFVP 自向量处理器转移MFVP Transfer from Vector Processor

格式Format

汇编器句法assembler syntax

MRC{cond}p7，1，Rd，Crn，CRm，0MRC{cond}p7,1,Rd,Crn,CRm,0

MFVP{cond}Rd，RNAMEMFVP{cond}Rd, RNAME

其中Cond＝{eq，ne，cs，cc，mi，pl，vs，vc，hi，ls，ge，lt，gt，le，al，nv}，Rd＝{r0，...r15}，CRn＝{c0，....c15}，CRm＝{c0，....c15}而RNAME指的是结构上指定的寄存器助记符(即，SPO或VCS)where Cond={eq, ne, cs, cc, mi, pl, vs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd={r0,...r15}, CRn= {c0,....c15}, CRm = {c0,....c15} and RNAME refers to the structurally specified register mnemonic (ie, SPO or VCS)

说明illustrate

该指令只在Cond为真时执行。ARM7寄存器Rd根据向量处理器的标量/专用寄存器CRn<1：0>：CRm<3：0>转移。参考章节3.2.3中寄存器传送向量处理器寄存器号的分配。This instruction is only executed when Cond is true. ARM7 register Rd is transferred according to the vector processor's scalar/special registers CRn<1:0>:CRm<3:0>. Refer to Section 3.2.3 Register Transfer Vector Processor Register Number Assignment.

位7.5及CRn<3：2>被保留。Bit 7.5 and CRn<3:2> are reserved.

向量处理器寄存器映射显示在下面。参考向量处理器专用寄存器(SP0-SP15)的表15。 CRM<3：0> CRn<1：0>＝00 CRn<1：0>＝01 CRn<1：0>＝10 CRn<1：0>＝111 0000 SR0 SR16 SP0 RASR0 0001 SR1 SR17 Sp0 RASR1 0010 SR2 SR18 SP0 RASR2 0011 SR3 SR19 SP0 RASR3 0100 SR4 SR20 SP0 RASR4 0101 SR5 SR21 SP0 RASR5 0110 SR6 SR22 SP0 RASR6 0111 SR7 SR23 SP0 RASR7 1000 SR8 SR24 SP0 RASR8 1001 SR9 SR25 SP0 RASR9 1010 SR10 SR26 SP0 RASR10 1011 SR11 SR27 SP0 RASR11 1100 SR12 SR28 SP0 RASR12 1101 SR13 SR29 SP0 RASR13 1110 SR14 SR30 SP0 RASR14 1111 SR15 SR31 SP0 RASR15 The vector processor register map is shown below. Refer to Table 15 of the Vector Processor Special Registers (SP0-SP15). CRM<3:0> CRn<1:0>=00 CRn<1:0>=01 CRn<1:0>=10 CRn<1:0>=111 0000 SR0 SR16 SP0 RASR0 0001 SR1 SR17 Sp0 RASR1 0010 SR2 SR18 SP0 RASR2 0011 SR3 SR19 SP0 RASR3 0100 SR4 SR20 SP0 RASR4 0101 SR5 SR21 SP0 RASR5 0110 SR6 SR22 SP0 RASR6 0111 SR7 SR23 SP0 RASR7 1000 SR8 SR24 SP0 RASR8 1001 SR9 SR25 SP0 RASR9 1010 SR10 SR26 SP0 RASR10 1011 SR11 SR27 SP0 RASR11 1100 SR12 SR28 SP0 RASR12 1101 SR13 SR29 SP0 RASR13 1110 SR14 SR30 SP0 RASR14 1111 SR15 SR31 SP0 RASR15

SR0经常读出32位零，并忽略对它的写入。SR0 always reads 32 bits of zero and ignores writes to it.

异常abnormal

向量处理器不可得到。Vector processor not available.

MTER转移到扩展寄存器MTER transfer to extended register

格式 30 25 20 15 10 5 0Format 30 25 20 15 10 5 0

汇编器句法assembler syntax

MRC{cond}p7，2，Rd，cP，cER，0MRC{cond}p7,2,Rd,cP,cER,0

MFVP{cond}Rd，RNAMEMFVP{cond}Rd, RNAME

这里Cond＝{eq，he，cs，cc，mi，pl，rs，vc，hi，ls，ge，lt，gt，le，al，nv}，Rd＝{r0，...r15}，P＝{0，1}，ER＝{0，...15}。RNAME指的是结构上指定的寄存器助记符(即，PERO或CSR)。Here Cond={eq, he, cs, cc, mi, pl, rs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd={r0,...r15}, P= {0, 1}, ER={0, . . . 15}. RNAME refers to a structurally specified register mnemonic (ie, PERO or CSR).

说明illustrate

该指令只在条件为真时执行。ARM7寄存器Rd根据以P：ER<3：0>指定的扩展寄存器ER转移。如下表所示 ER<3：0> P＝0 P＝1 0000 UER0 PER0 0001 UER1 PER1 0010 UER2 PER2 0011 UER3 PER3 0100 UER4 PER4 0101 UER5 PER5 0110 UER6 PER6 0111 UER7 PER7 1000 UER8 PER8 1001 UER9 PER9 1010 UER10 PER10 1011 UER11 PER11 1100 UER12 PER12 1101 UER13 PER13 1110 UER14 PER14 1111 UER15 PER15 This instruction is only executed if the condition is true. The ARM7 register Rd is transferred according to the extended register ER specified by P:ER<3:0>. as shown in the table below ER<3:0> P = 0 P=1 0000 UER0 PER0 0001 UER1 PER1 0010 UER2 PER2 0011 UER3 PER3 0100 UER4 PER4 0101 UER5 PER5 0110 UER6 PER6 0111 UER7 PER7 1000 UER8 PER8 1001 UER9 PER9 1010 UER10 PER10 1011 UER11 PER11 1100 UER12 PER12 1101 UER13 PER13 1110 UER14 PER14 1111 UER15 PER15

位19：17和7：5备用Bits 19:17 and 7:5 spare

异常abnormal

企图在用户方式访问PERx时，保护违章。Protection violation when attempting to access PERx in user mode.

MTVP转移到向量处理器MTVP transfer to vector processor

格式 30 25 20 15 10 5 0Format 30 25 20 15 10 5 0

汇编器句法assembler syntax

MRC{cond}p7，1，Rd，Crn，CRm，0MRC{cond}p7,1,Rd,Crn,CRm,0

MFVP{cond}Rd，RNAMEMFVP{cond}Rd, RNAME

这里Cond＝{eq，ne，cs，cc，mi，pl，vs，vc，hi，ls，ge，lt，gt，le，al，nv}，Rd＝{r0，...r15}，CRn＝{c0，....c15}，CRm＝{c0，....c15}。RNAME指的是结构上指定的寄存器助记符(即，SPO或VCS)。Here Cond={eq, ne, cs, cc, mi, pl, vs, vc, hi, ls, ge, lt, gt, le, al, nv}, Rd={r0,...r15}, CRn= {c0,...c15}, CRm={c0,...c15}. RNAME refers to a structurally specified register mnemonic (ie, SPO or VCS).

说明illustrate

该指令只在Cond为真时执行。ARM7寄存器Rd根据向量处理器的标量/专用寄存器CRn<1：0>：CRm<3：0>转移。This instruction is only executed when Cond is true. ARM7 register Rd is transferred according to the vector processor's scalar/special registers CRn<1:0>:CRm<3:0>.

位7：5及CRn<3：2>保留。Bits 7:5 and CRn<3:2> are reserved.

向量处理器寄存器映射如下所示 CRM<3：0> CRn<1：0>＝00 CRn<1：0>＝01 CRn<1：0>＝10 CRn<1：0>＝111 0000 SR0 SR16 SP0 RASR0 0001 SR1 SR17 SP0 RASR1 0010 SR2 SR18 SP0 RASR2 0011 SR3 SR19 SP0 RASR3 0100 SR4 SR20 SP0 RASR4 0101 SR5 SR21 SP0 RASR5 0110 SR6 SR22 SP0 RASR6 0111 SR7 SR23 SP0 RASR7 1000 SR8 SR24 SP0 RASR8 1001 SR9 SR25 SP0 RASR9 1010 SR10 SR26 SP0 RASR10 1011 SR11 SR27 SP0 RASR11 1100 SR12 SR28 SP0 RASR12 1101 SR13 SR29 SP0 RASR13 1110 SR14 SR30 SP0 RASR14 1111 SR15 SR31 SP0 RASR15 The vector processor register map is shown below CRM<3:0> CRn<1:0>=00 CRn<1:0>=01 CRn<1:0>=10 CRn<1:0>=111 0000 SR0 SR16 SP0 RASR0 0001 SR1 SR17 SP0 RASR1 0010 SR2 SR18 SP0 RASR2 0011 SR3 SR19 SP0 RASR3 0100 SR4 SR20 SP0 RASR4 0101 SR5 SR21 SP0 RASR5 0110 SR6 SR22 SP0 RASR6 0111 SR7 SR23 SP0 RASR7 1000 SR8 SR24 SP0 RASR8 1001 SR9 SR25 SP0 RASR9 1010 SR10 SR26 SP0 RASR10 1011 SR11 SR27 SP0 RASR11 1100 SR12 SR28 SP0 RASR12 1101 SR13 SR29 SP0 RASR13 1110 SR14 SR30 SP0 RASR14 1111 SR15 SR31 SP0 RASR15

异常abnormal

向量处理器不可得到。Vector processor not available.

PFTCH 预取PFTCH prefetch

格式Format

30 25 20 15 10 530 25 20 15 10 5

00

汇编器句法assembler syntax

LDC{cond}p15，2，<Address>LDC{cond}p15,2,<Address>

PFTCH{cond}<Address>PFTCH{cond}<Address>

这里Cond＝{eq，he，cs，cc，mi，pl，rs，vc，hi，Is，ge，lt，gt，le，al，nv}，参考地址方式句法的ARM7数据表。Here Cond={eq, he, cs, cc, mi, pl, rs, vc, hi, Is, ge, lt, gt, le, al, nv}, refer to the ARM7 data table of the address mode syntax.

说明illustrate

该指令仅在Cond为真时执行。由EA指定的高速缓存行被预取到ARM7数据高速缓存中。This instruction is only executed when Cond is true. The cache line specified by the EA is prefetched into the ARM7 data cache.

操作operate

关于EA如何被计算，参考ARM7数据表。For how the EA is calculated, refer to the ARM7 datasheet.

异常：无Exception: None

STARTVP启动向量处理器STARTVP starts the vector processor

格式Format

30 25 20 15 10 530 25 20 15 10 5

00

汇编器句法assembler syntax

CDP{cond}p7，0，cO，cO，cOCDP{cond}p7,0,cO,cO,cO

STARTVP{cond}STARTVP{cond}

其中cond＝{eq，he，cs，cc，mi，pl，vs，vc，hi，Is，ge，it，gt，le，al，nv}。where cond = {eq, he, cs, cc, mi, pl, vs, vc, hi, Is, ge, it, gt, le, al, nv}.

说明illustrate

该指令仅在cond为真时执行。该指令向向量处理器发信号，启动执行和自动地清除VISRC<vjp>和VISRC<vip>。ARM7不等待向量处理器启动执行，继续执行下一指令。This instruction is only executed when cond is true. This instruction signals the vector processor to start execution and automatically clears VISRC<vjp> and VISRC<vip>. ARM7 does not wait for the vector processor to start execution, and continues to execute the next instruction.

向量处理器的状态，在这个指令执行以前必须要初始化成所要的状态。如果向量处理器已经在VP-RUN状态，则该指令无作用。The state of the vector processor must be initialized to the desired state before this instruction is executed. If the vector processor is already in VP-RUN state, this instruction has no effect.

位19：12，7：5及3：0保留。Bits 19:12, 7:5 and 3:0 are reserved.

异常abnormal

向量处理器不可得到。Vector processor not available.

TESTSET 测试和设置TESTSET test and set

格式Format

30 25 20 15 10 530 25 20 15 10 5

00

汇编器句法assembler syntax

MRC{cond}p7，0，Rd，cO，cER，0MRC{cond}p7,0,Rd,cO,cER,0

TESTSET{cond}Rd，RNAMETESTSET{cond}Rd, RNAME

这里cond＝{eq，he，cs，cc，mi，p1，rs，re，hi，ls，ge，It，gt，le，al，nv}。Rd＝{r0....r15}，ER＝{0，.....15}，RNAME指结构上指定的寄存器助记符(即，VER1或VASYNC)。Here cond={eq, he, cs, cc, mi, p1, rs, re, hi, ls, ge, It, gt, le, al, nv}. Rd={r0....r15}, ER={0,...15}, RNAME refers to the structurally specified register mnemonic (ie, VER1 or VASYNC).

说明illustrate

该指令只在cond为真时执行，该指令把UERX的内容返回到RD中，并设定UERX<30>为1。如果ARM7寄存器15指定为目的寄存器则UERx<30>在CPSR的Z位返回，以便能实现短的忙等待循环。This instruction is only executed when cond is true. This instruction returns the content of UERX to RD and sets UERX<30> to 1. If ARM7 register 15 is specified as the destination register then UERx<30> returns in the Z bit of the CPSR so that a short busy-wait cycle can be implemented.

当前，只有UER1被规定随同读指令工作。Currently, only UER1 is specified to work with read commands.

位19：17及7：5保留。Bits 19:17 and 7:5 are reserved.

异常：无Exception: none

附录BAppendix B

多媒体处理器100的结构定义了处理器110用MFER和MTER指令访问的扩展寄存器，扩展寄存器包括特许扩展寄存器和用户扩展寄存器。The architecture of the multimedia processor 100 defines the extended registers that the processor 110 accesses with the MFER and MTER instructions. The extended registers include licensed extended registers and user extended registers.

特许扩展寄存器主要用于控制多媒体信号处理器的操作。它们被示于表B.1The privileged extension register is mainly used to control the operation of the multimedia signal processor. They are shown in Table B.1

表B.1：特许扩展寄存器号助记符说明 PER0 CTR 控制寄存器 PER1 PVR 处理器类型寄存器 PER2 VIMSK 向量中断屏蔽寄存器 PER3 ALABR ARM7指令地址断点寄存器 PER4 ADABR ARM7数据地址断点寄存器 PER5 SPREG 高速暂存寄存器 PER6 STR 状态寄存器 Table B.1: Licensed Extension Registers Number mnemonic illustrate PER0 CTR control register PER1 PVR Processor Type Register PER2 VIMSK Vectored Interrupt Mask Register PER3 ALABR ARM7 instruction address breakpoint register PER4 ADABR ARM7 Data Address Breakpoint Register PER5 SPREG scratch register PER6 STR status register

控制寄存器控制MSP100的操作，CTR中的所有位在复位时被清除，寄存器的定义如表B.2所示。The control register controls the operation of the MSP100. All bits in CTR are cleared at reset. The definition of the register is shown in Table B.2.

表B.2：CTR的定义位助记符说明 31-13 保留位永远作为0读出 12 VDCI 向量数据高速缓存无效位。置位时，使全部向量处理器数据高速缓存变成无效。因高速缓存无效操作通常会与正常高速缓存操作冲突，所以只能支持一个无效码序列。 11 VDE 向量数据高速缓存使能位。当清除时，禁止向量处理器数据高速缓存 10 VICI 向量指令高速缓存无效位。当置位时使全部向量处理器指令高速缓存变成无效。因高速缓存无效操作通常会与正常高速缓存操作冲突。所以只能支持一个无效码序列。 9 VICE 向量指令高速缓存使能位。当清除时，禁止向量处理器指令高速缓存。位记忆符说明 8 ADCI ARM7数据高速缓存无效位。当置位时，使全部ARM7数据高速缓存变成无效。因高速缓存无效操作通常会同正常高速缓存操作冲突，所以只支持一个无效码序列。 7 ADCE ARM7数据高速缓存使能位。当清除时禁止ARM7数据高速缓存。 6 AICI ARM7指令高速缓存无效位。当置位时，使全部ARM7指令高速缓存变成无效。因高速缓存无效操作通常会同正常高速缓存操作冲突，所以只支持一个无效码序列。 5 AICE ARM7指令高速缓存使能位。当清除时，禁止ARM7指令高速缓存 4 APSE ARM7处理器单步使能位。当置位时，使ARM7处理器在执行一条指令后，发生ARM7处理器单步异常。单步功能只在用户或管理方式下得到。 3 SPAE 高速暂存存取使能位。当设定时，允许ARM7处理从高速暂存加载或存到高速暂存。当清除时，试图加载或存贮到高速暂存以产生ARM7无效数据地址异常 2 VPSE 向量处理器单步使能位。当设定时，使向量处理器在执行一条指令后，发生向量处理器单步异常。 1 VPPE 向量处理器流水线使能位。当清除时，配置向量处理器以便在非流水线方式下操作。此时在向量处理器执行流水线中，只有一条指令是活动的。 0 VPAE 向量处理器访问使能位。当设定时如上所述使ARM7处理执行扩展的ARM7指令。当清除时，阻止ARM7处理执行扩展ARM7指令。凡这样的企图会产生向量处理器不可得到的异常 Table B.2: Definition of CTR bit mnemonic illustrate 31-13 Reserved bits are always read as 0 12 VDCI Vector data cache invalidation bit. When set, invalidates all vector processor data caches. Only one invalidation code sequence is supported because cache invalidation operations typically conflict with normal cache operations. 11 VDE Vector data cache enable bit. When cleared, disables the vector processor data cache 10 VICI Vector instruction cache invalidation bit. When set, invalidates the entire vector processor instruction cache. Because cache invalidation operations usually conflict with normal cache operations. So only one invalid code sequence can be supported. 9 VICE Vector instruction cache enable bit. When cleared, disables the vector processor instruction cache. bit mnemonic illustrate 8 ADCI ARM7 data cache invalidation bit. When set, invalidates the entire ARM7 data cache. Since cache invalidation operations typically conflict with normal cache operations, only one invalidation code sequence is supported. 7 ADCE ARM7 data cache enable bit. Disables the ARM7 data cache when cleared. 6 AICI ARM7 instruction cache invalidation bit. When set, invalidates the entire ARM7 instruction cache. Since cache invalidation operations typically conflict with normal cache operations, only one invalidation code sequence is supported. 5 AICE ARM7 instruction cache enable bit. When cleared, disable the ARM7 instruction cache 4 APSE ARM7 processor single step enable bit. When set, causes the ARM7 processor to generate a single-step exception after executing an instruction. The single-step function is only available in user or administrative mode. 3 SPAE Cache scratchpad access enable bit. When set, allows the ARM7 to handle loading from or storing to scratchpad. When cleared, an attempt to load or store to scratchpad generated an ARM7 Invalid Data Address exception 2 VPSE Vector processor single step enable bit. When set, causes the vector processor to generate a vector processor single-step exception after executing an instruction. 1 VPPE Vector processor pipeline enable bit. When cleared, configures the vector processor to operate in non-pipelined mode. At this point in the vector processor execution pipeline, only one instruction is active. 0 VPAE Vector processor access enable bit. When set, causes the ARM7 process to execute extended ARM7 instructions as described above. When cleared, prevents ARM7 processing from executing extended ARM7 instructions. Where such an attempt would generate a vector processor unreachable exception

状态寄存器指示MSP100的状态。字段STR中的所有位在复位时被清除，寄存器的定义如表B.3所示。The Status Register indicates the status of the MSP100. All bits in the field STR are cleared on reset, and the definition of the register is shown in Table B.3.

表B.3 STR定义位助记符说明 31：23 保留位-永远读作0 22 ADAB 当ARM7数据地址断点匹配发生时，ARM7数据地址断点异常位被设置，通过数据异常中断报告该异常。 21 AIDA 当ARM7加载或存贮指令试图访问未决定的地址或MSP具体方案未完成时，或当试图访问一个不允许的高速暂存器时，产生ARM7无效数据地址异常。这种异常可通过数据终止中断来报告。 20 AIAB 当ARM7指令地址断点匹配出现时，ARM7指令地址断点异常位被设定。该异常通过预取终止中断来报告。 19 AIIA ARM7无效指令地址异常。该例外通过预取终止中断来报告。 18 ASTP ARM7单步异常。该异常通过预取终止中断来报告。 17 APV ARM7保护违例。该异常通过IRQ中断来报告 16 VPUA 向量处理器不可得到异常，该异常通过协处理器不可得到中断来报告 15-0 保留-永远读作0 Table B.3 STR definitions bit mnemonic illustrate 31:23 Reserved bits - always read as 0 twenty two ADAB When an ARM7 data address breakpoint match occurs, the ARM7 data address breakpoint exception bit is set, and the exception is reported through the data exception interrupt. twenty one AIDA An ARM7 Invalid Data Address exception is generated when an ARM7 load or store instruction attempts to access an undecided address or MSP specific scheme is not complete, or when an attempt is made to access an unallowable high-speed scratchpad. This exception can be reported by a data termination interrupt. 20 AIAB When an ARM7 instruction address breakpoint match occurs, the ARM7 instruction address breakpoint exception bit is set. This exception is reported by the prefetch terminated interrupt. 19 AIIA ARM7 invalid instruction address exception. This exception is reported by the prefetch termination interrupt. 18 ASTP ARM7 single step exception. This exception is reported by the prefetch terminated interrupt. 17 APV ARM7 protection violation. The exception is reported by an IRQ interrupt 16 VPUA vector processor unreachable exception reported via coprocessor unreachable interrupt 15-0 reserved - always read as 0

处理器类型(Version)寄存器标识处理器的多媒体信号处理器系列的具体处理器的类型。The processor type (Version) register identifies the specific processor type of the multimedia signal processor family of the processor.

向量处理器中断屏蔽寄存器VIMSK控制对处理器110的向量处理器异常的报告。当随着VISRC寄存器中相应的位而设定时，VIMSK中的每一位对中断ARM7产生异常。它并不影响如何检测向量处理器的异常，但影响是否异常将中断ARM7。在VIMSK中的所有位在复位时被清除。寄存器的定义如表B.4所示The vector processor interrupt mask register VIMSK controls the reporting of vector processor exceptions to processor 110 . Each bit in VIMSK generates an exception for interrupt ARM7 when set along with the corresponding bit in the VISRC register. It does not affect how exceptions are detected for vector processors, but affects whether exceptions will interrupt ARM7. All bits in VIMSK are cleared on reset. The definition of the register is shown in Table B.4

表B.4：VIMSK的定义位助记符说明 31 DABE 数据地址断点中断使能 30 LABE 指令地址断点中断使能 29 SSTPE 单步中断使能 28-14 保留-永远读作0。 13 FOVE 浮点溢出中断使能 12 FINVE 浮点非法操作数中断使能 11 FDIVE 浮点被零除中断使能 10 IOVE 整数溢出中断使能 9 IDIVE 整数被除零中断使能 8-7 保留-永远读作0 6 VIE VCINT中断使能 5 VJE VCJOIN中断使能 4-1 保留-永远读作0 0 CSE 上下文转换使能 Table B.4: Definition of VIMSK bit mnemonic illustrate 31 DABE Data address breakpoint interrupt enable 30 LABE Instruction Address Breakpoint Interrupt Enable 29 SSTPE Single step interrupt enable 28-14 reserved - always read as 0. 13 FOVE Floating point overflow interrupt enable 12 FINVE Floating point illegal operand interrupt enable 11 FDIVE Floating point divide by zero interrupt enable 10 IOVE Integer overflow interrupt enable 9 IDIVE Integer divide by zero interrupt enable 8-7 reserved - always read as 0 6 VIE VCINT interrupt enable 5 VJE VCJOIN interrupt enable 4-1 reserved - always read as 0 0 CSE context switch enable

ARM7指令地址断点寄存器辅助调试ARM7程序。寄存器的定义如表B.5所示。ARM7 instruction address breakpoint register assists in debugging ARM7 programs. The definitions of the registers are shown in Table B.5.

表B.5：AIABR的定义位助记符说明 31-2 LADR ARM7指令地址 1 保留，永远读成0 0 LABE 指令地址断点能使，在复位时清除。如果置位，当“ARM7指令访问地址”匹配ALABR<31：2>，及VCSR<AIAB>被清除时，发生ARM7指令地址断点异常，VCSR<ALAB>置位以表示异常。当匹配发生时，若VCSR<ALAB>已置位，则该VCSR<AIAB>被清除，匹配被忽视。在指令执行前报告异常。 Table B.5: Definition of AIABR bit mnemonic illustrate 31-2 LADR ARM7 instruction address 1 Reserved, always read as 0 0 LABE The instruction address breakpoint can be enabled and cleared at reset. If set, when the "ARM7 instruction access address" matches ALABR<31:2>, and VCSR<AIAB> is cleared, an ARM7 instruction address breakpoint exception occurs, and VCSR<ALAB> is set to indicate the exception. When a match occurs, if VCSR<ALAB> is set, the VCSR<AIAB> is cleared and the match is ignored. Report an exception before the instruction executes.

“ARM7数据地址断点寄存器”辅助调试ARM7程序。寄存器的定义如表B.6所示。"ARM7 Data Address Breakpoint Register" assists in debugging ARM7 programs. The definitions of the registers are shown in Table B.6.

表B.6：ADABR定义位助记符说明 31-2 DADR ARM数据地址。在复位时无定义 1 SABE 存贮“地址断点使能”，在复位时清除。如果置位，当ARM7存贮访问地址的高30位匹配ADABR<31：2>且VCSR<ADAB>被清除时，发生“ARM7数据地址断点”异常。VCSR<ADAB>置位表示异常。当匹配发生时，如果VCSR<ADAB>已经设置，则此VCSR<ADAB>被清除，匹配被忽略。在存贮指令执行之前，异常被报告。 0 LABE 加载地址断点使能。在复位时清除。如果置位，当ARM7加载访问地址的高30位匹配ADABR<31：2>且VCSR<ADAB>被清除时发生“ARM7数据地址断点”异常。VCSR<ADAB>被置位以表示异常。当匹配发生时如果VCSR<ADAB>已经置位，此VCSR<ADAB>被清除，匹配被忽略。在加载指令之前报告异常。 Table B.6: ADABR definitions bit mnemonic illustrate 31-2 DADR ARM data address. undefined at reset 1 SABE Store "address breakpoint enable", cleared when reset. If set, an "ARM7 Data Address Breakpoint" exception occurs when the upper 30 bits of an ARM7 memory access address match ADABR<31:2> and VCSR<ADAB> is cleared. VCSR<ADAB> is set to indicate an exception. When a match occurs, if VCSR<ADAB> is already set, this VCSR<ADAB> is cleared and the match is ignored. Exceptions are reported before the store instruction is executed. 0 LABE Load address breakpoint enable. Cleared on reset. If set, an "ARM7 Data Address Break" exception occurs when the upper 30 bits of an ARM7 load access address match ADABR<31:2> and VCSR<ADAB> is cleared. VCSR<ADAB> is set to indicate an exception. If VCSR<ADAB> has been set when a match occurs, the VCSR<ADAB> is cleared and the match is ignored. An exception is reported before the load instruction.

“高速暂存寄存器”配置在高速缓存子系统130中使用SRAM形成的高速暂存的地址和尺寸。寄存器定义示于表B.7The “scratch register” configures the address and size of the scratch pad formed using SRAM in the cache subsystem 130 . Register definitions are shown in Table B.7

表B.7：SPREG的定义位助记符说明 31-11 SPBASE “高速暂存基址”指示高速暂存的起始地址的高21位。根据MSP_BASE寄存器中的值，这值必须有4M字节的偏移 10-2 保留 1-0 SPSIZE 高速暂存的尺寸00-＞0K(用4K向量处理器数据高速缓存)01-＞2K(用2K向量处理器数据高速缓存)10-＞3K(用1K向量处理器数据高速缓存)11-＞4K(不用向量处理器数据高速缓存) Table B.7: Definition of SPREG bit mnemonic illustrate 31-11 SPBASE "Cache base address" indicates the upper 21 bits of the start address of the scratch pad. According to the value in the MSP_BASE register, this value must have a 4M byte offset 10-2 reserve 1-0 SPSIZE Cache size 00->0K (with 4K vector processor data cache) 01->2K (with 2K vector processor data cache) 10->3K (with 1K vector processor data cache) 11-> 4K (without vector processor data cache)

用户扩展寄存器主要用于处理器110和120的同步。用户扩展寄存器当前被定义的只有一位，映射到位30，并且例如“MFERR15、UERx”的指令将位的值返回成为Z标志。位UERx<31>和UERx<29：0>总是被读作0。用户扩展寄存器在表B.8中被说明。The user extension registers are mainly used for synchronization of processors 110 and 120 . The user extension register is currently defined with only one bit, mapped to bit 30, and instructions such as "MFERR15, UERx" return the value of the bit as the Z flag. Bits UERx<31> and UERx<29:0> are always read as '0'. The user extension registers are described in Table B.8.

表B.8：用户扩展寄存器号助记符说明 UER0 VPSTATE 向量寄存器状态标志。当置位时，位30表示向量处理器在VP-RUN状态，并执行指令。当清除时，表示向量处理器处在VP_IDLE状态，并已经停止VPC寻址下一条指令以便执行。VPSTATE<30>在复位时被清除。 UER1 VASYNC 向量和ARM7同步标志。位30提供向量和ARM7处理器120及110之间的生产者/消费者类型同步。向量处理器120能用VMOV指令设置或清除这标志。该标志亦可以通过ARM7用MFER或MTER指令的处理被设置或清除。此外，标志能用TESTSET指令读出或置位。 Table B.8: User extension registers Number mnemonic illustrate UER0 VPSTATE Vector register status flags. When set, bit 30 indicates that the vector processor is in the VP-RUN state and executing instructions. When cleared, indicates that the vector processor is in the VP_IDLE state and has stopped the VPC from addressing the next instruction for execution. VPSTATE<30> is cleared on Reset. UER1 VASYNC Vector and ARM7 synchronization flags. Bit 30 provides producer/consumer type synchronization between vector and ARM7 processors 120 and 110 . Vector processor 120 can set or clear this flag with the VMOV instruction. This flag can also be set or cleared by ARM7 using MFER or MTER instructions. In addition, flags can be read or set with the TESTSET instruction.

表B.9显示了在加电复位时扩展寄存器的状态。Table B.9 shows the state of the extension registers at power-on reset.

表B.9：扩展寄存器加电状态寄存器复位状态 CTR 0 PVR TBD VIMSK 0 ALABR AIABR<0>＝0，其它均未定义 ADABR ADABR<0>＝0，其它均未定义 STR 0 VPSTATE VPSTATE<30>＝0，其它均未定义 VASYNE VASYNC<3>＝0，其它均未定义 Table B.9: Extended register power-up status register Reset state CTR 0 PVR TBD VIMSK 0 ALABR AIABR<0>＝0, others are undefined ADABR ADABR<0>＝0, others are undefined STR 0 VPSTATE VPSTATE<30>＝0, others are undefined VASYNE VASYNC<3>=0, others are undefined

附录CAppendix C

向量处理器120的结构状态包括32个32位标量寄存器；32个288位向量寄存器的2个组；一对576位向量累加寄存器；一组32位专用寄存器。标量、向量及累加器寄存器拟供通用编程用，并支持许多不同数据类型。The architectural state of the vector processor 120 includes 32 32-bit scalar registers; 2 banks of 32 288-bit vector registers; a pair of 576-bit vector accumulation registers; and a bank of 32-bit special purpose registers. Scalar, vector, and accumulator registers are intended for general-purpose programming and support many different data types.

下面的标记用于这里及以后各部分：VR表示向量寄存器；VRi表示第i个向量寄存器(零偏移)；VR[i]表示在向量寄存器VR中第i个数据元素；VR<a：b>表示在向量寄存器中位a到b，而VR[i]<a：b>表示在向量寄存器VR中第i个数据元素的位a到b。The following notations are used here and in subsequent sections: VR denotes a vector register; VRi denotes the i-th vector register (zero offset); VR[i] denotes the i-th data element in the vector register VR; VR<a:b > denotes bits a to b in the vector register, and VR[i]<a:b> denotes bits a to b of the ith data element in vector register VR.

对于一个向量寄存器中的多个元素，向量结构有一个额外数据类型和数据长度的尺寸。由于向量寄存器有固定尺寸，它能保持的数据元素数取决于元素的长度。MSP结构定义了如表C.1所示的5种元素长度。For multiple elements in a vector register, the vector structure has an additional dimension of data type and data length. Since a vector register has a fixed size, the number of data elements it can hold depends on the length of the element. The MSP structure defines five element lengths as shown in Table C.1.

表C.1：数据元素的长度长度名字长度(位) 布尔 1 字节 8 字节9 9 半字 16 字 32 Table C.1: Length of data elements length name length (bit) Boolean 1 byte 8 Byte 9 9 half word 16 Character 32

MSP结构，根据指令中指定的数据类型和长度来解释向量数据。通常，大部分算术指令中字节、字节9、半字和字元素长度支持2的补码(整数)格式。此外，对大部分算术指令，字元素长度支持IEEE754单精度格式。MSP structure, which interprets vector data according to the data type and length specified in the instruction. In general, byte, byte9, halfword, and word element lengths are supported in 2's complement (integer) format for most arithmetic instructions. In addition, for most arithmetic instructions, the word element length supports IEEE754 single-precision format.

一个程序员可以任何所要的方式解释数据，只要指令序列产生有意义的结果。例如，程序员可自由用字节9来存贮8位无符号数，相当于可自由把8位无符号数存到字节数据元素中，并用提供的2的补码的算术指令去操作它们，只要程序能处理“假”溢出结果。A programmer can interpret data in any way desired, as long as the sequence of instructions produces meaningful results. For example, programmers are free to use byte 9 to store 8-bit unsigned numbers, which is equivalent to being free to store 8-bit unsigned numbers into byte data elements, and use the provided 2’s complement arithmetic instructions to operate them , as long as the program can handle "false" overflow results.

有32个标量寄存器，称为SR0到SR31。标量寄存器为32位长并能容纳任何一种所定义长度的一个数据元素。标量寄存器SR0是一个特殊寄存器。寄存器SR0总是读出32位零。并无视对SR0寄存器的写入。字节、字节9和半字数据类型被存储在标量寄存器的最低有效位中，而那些最高有效位具有未定义的值。There are 32 scalar registers, called SR0 through SR31. Scalar registers are 32 bits long and can hold a data element of any defined length. Scalar register SR0 is a special register. Register SR0 always reads 32 bits of zero. Writes to the SR0 register are ignored. Byte, byte9, and halfword data types are stored in the least significant bits of scalar registers, and those most significant bits have undefined values.

由于寄存器没有数据类型指示器，程序员必须知道每条指令所用的寄存器的数据类型。这不同于认为32位寄存器包含32位值的其它结构。MSP结构规定数据类型A的结果仅正确地修改为数据类型A所定义的位。例如，字节9加的结果，只能修改32位目标标量寄存器的低9位。较高的23位值没有定义。除非用指令另外指出。Since registers have no data type indicator, the programmer must know the data type of the register used by each instruction. This is different from other structures that consider 32-bit registers to contain 32-bit values. The MSP structure dictates that the result of data type A only correctly modifies the bits defined by data type A. For example, the result of adding byte 9 can only modify the lower 9 bits of the 32-bit target scalar register. The higher 23-bit values are undefined. Unless otherwise indicated with the instruction.

64个向量寄存器被构成2个组，每个组32个寄存器。组0包含起始32个寄存器，组体1包括其次的32个寄存器。这两个组中一个设置为当前组，另一个设置或替代组。全部向量指令通过缺省值使用当前组中的寄存器，除了加载/存贮和寄存器传送指令，它们能访问替代组中的向量寄存器。在“向量控制”和“状态寄存器VCSR”中的CBANK位可用来设置组0或者1成为当前组(另外一个成为替代组)。在当前组中的向量寄存器被指定为VR0到VR31，而在替代组中指定为VRA0到VRA31。The 64 vector registers are organized into 2 banks of 32 registers each. Group 0 contains the first 32 registers, and group body 1 contains the next 32 registers. One of these two groups is set as the current group and the other is set or replaced. All vector instructions use registers in the current bank by default, except for load/store and register transfer instructions, which can access vector registers in an alternate bank. The CBANK bit in the "Vector Control" and "Status Register VCSR" can be used to set bank 0 or 1 to be the current bank (and the other to be the alternate bank). The vector registers in the current bank are designated as VR0 through VR31 and in the alternate bank as VRA0 through VRA31.

另外，这两个组从概念上能合并提供32个双倍尺寸的向量寄存器，每个寄存器576位。在控制寄存器VCSR中VEC64位指定这种方式。在VEC64模式中，不分当前组和替代组，而向量寄存器表示二个组中一对相应的288位向量寄存器，也就是In addition, these two groups can conceptually be combined to provide 32 double-sized vector registers, each register 576 bits. This mode is specified by the VEC64 bit in the control register VCSR. In VEC64 mode, there is no distinction between the current group and the replacement group, and the vector register represents a pair of corresponding 288-bit vector registers in the two groups, that is

VRi<575：0>＝VR₁i<287：0>：VR₀i<287：0>VRi<575:0>= _VR1i <287:0>: _VR0i <287:0>

这里VR₀i和VR₁i分别表示在组1和0中寄存器号为VRi的向量寄存器。双倍宽的向量寄存器被称为VR0到VR31。Here VR ₀ i and VR ₁ i represent vector registers with register number VRi in banks 1 and 0, respectively. The double-wide vector registers are called VR0 through VR31.

向量寄存器能容纳字节、字节9、半字或字长度的多个元素，如表C.2所示。Vector registers can hold multiple elements of byte, byte9, halfword, or word size, as shown in Table C.2.

表C.2：每个向量寄存器的元素数元素长度名字元素长度(位) 元素最大数所用的总位数字节9 9 32 288 字节 8 32 256 半字 16 16 256 字 32 8 256 Table C.2: Number of elements per vector register element length name element length (bits) Maximum number of elements total number of bits used Byte 9 9 32 288 byte 8 32 256 half word 16 16 256 Character 32 8 256

不支持在一个寄存器中混合多种元素长度。除字节9元素外只用288位中的256位。尤其是每个第9位不用。在字节、半字和字长度中不用的32位被保留。程序员对它们的值应不作任何假设。Mixing multiple element lengths in one register is not supported. Only 256 of the 288 bits are used except for byte 9 elements. Especially every 9th bit is not used. The unused 32 bits in byte, halfword, and word lengths are reserved. Programmers should make no assumptions about their values.

向量累加器寄存器要为比目的寄存器中的结果具有更高精度的中间结果提供存贮。向量累加器寄存器由4个288位寄存器组成，其为VAC1H、VAC1L、VAC0H和VAC0L。VAC0H：VAC0L对通过缺省由3条指令使用。只在VEC64模式中，VCL1H：VAC1L对用于模拟64字节9向量操作。即使在VEC32方式中设置组1为当前组，仍使用此VAC0H：VAC0L对。The vector accumulator registers provide storage for intermediate results with higher precision than the result in the destination register. The vector accumulator register consists of four 288-bit registers, VAC1H, VAC1L, VAC0H, and VAC0L. The VAC0H:VAC0L pair is used by 3 instructions by default. In VEC64 mode only, the VCL1H:VAC1L pair is used to emulate 64-byte 9-vector operations. Even if group 1 is set as the current group in VEC32 mode, this VAC0H:VAC0L pair is still used.

为产生同在源矢量寄存器中有相同元素数的扩展精度的结果，通过一对寄存器来保存扩展精度元素，如表C.3所示。To produce an extended-precision result with the same number of elements as in the source vector registers, a pair of registers are used to hold the extended-precision elements, as shown in Table C.3.

表C.3：向量累加器格式元素长度逻辑视区 VAC格式字节9 VAC[i]<17：0> VAC0H[i]<8>：VAC0L<8：0>用于i＝0..31以及VAC1H[i-32]<8：0>：VAC1L[i-32]<8：0>用于i＝32..63 字节 VAC[i]<15：0> VAC0H[i]<7：0>：VAC0L<7：0>用于i＝0..31以及VAC1H[i-32]<7：0>：VAC1L[i-32]<7：0>用于i＝32..63 半字 VAC[i]<31：0> VAC0H[i]<15：0>：VAC0L<15：0>用于i＝0..15以及VAC1H[i-16]<15：0>：VAC1L[i-16]用于i＝16..31 字 VAC[i]<63：0> VAC0H[i]<31：0>：VAC0L<31：0>用于i＝0..7以及VAC1H[i-8]<31：0>：VAC1L[i-8]<31：0>用于i＝8..15 Table C.3: Vector accumulator format element length logical viewport VAC format Byte 9 VAC[i]<17:0> VAC0H[i]<8>: VAC0L<8:0> for i=0..31 and VAC1H[i-32]<8:0>: VAC1L[i-32]<8:0> at i=32..63 byte VAC[i]<15:0> VAC0H[i]<7:0>: VAC0L<7:0> for i=0..31 and VAC1H[i-32]<7:0>: VAC1L[i-32]<7:0 > for i=32..63 half word VAC[i]<31:0> VAC0H[i]<15:0>: VAC0L<15:0> for i=0..15 and VAC1H[i-16]<15:0>: VAC1L[i-16] for i= 16..31 Character VAC[i]<63:0> VAC0H[i]<31:0>: VAC0L<31:0> for i=0..7 and VAC1H[i-8]<31:0>: VAC1L[i-8]<31:0 > for i = 8..15

仅在VEC64模式才用VAC1H：VAC1L对，在这时的元素数，字节9(和字节)、半字和字分别为64、32或16。The VAC1H:VAC1L pair is used only in VEC64 mode, where the number of elements, byte 9 (and byte), halfword and word are 64, 32 or 16, respectively.

有33个专用寄存器不能直接从存贮器装载或直接存入存贮器。16个专用寄存器被称为RASR0到RASR15，形成一内部返回地址栈并由子例程调用和返回指令而使用。另外17个32位的专用寄存器示于表C.4There are 33 special purpose registers that cannot be directly loaded from or stored directly to memory. Sixteen special purpose registers, referred to as RASR0 through RASR15, form an internal return address stack and are used by subroutine call and return instructions. Another 17 32-bit special purpose registers are shown in Table C.4

表C.4：专用寄存器号助记符说明 SP0 VCSR 向量控制和状态寄存器 SP1 VPC 向量程序计数器 SP2 VEPC 向量异常程序计数器 SP3 VISRC 向量中断源寄存器 SP4 VIINS 向量中断指令寄存器 SP5 VCR1 向量计数寄存器1 SP6 VCR2 向量计数寄存器2 SP7 VCR3 向量计数寄存器3 SP8 VGMR0 向量总屏蔽寄存器0 SP9 VGMR1 向量总屏蔽寄存器1 SP10 VOR0 向量溢出寄存器0 SP11 VOR1 向量溢出寄存器1 SP12 VLABR 向量数据地址断点寄存器 SP13 VDABR 向量指令地址断点寄存器 SP14 VMMR0 向量转移屏蔽寄存器0 SP15 VMMR1 向量转移屏蔽寄存器1 SP16 VASYNC 向量和ARM7同步寄存器 Table C.4: Special purpose registers Number mnemonic illustrate SP0 VCSR Vector Control and Status Registers SP1 VPC vector program counter SP2 VEPC Vectored Exception Program Counter SP3 VISRC Vectored Interrupt Source Register SP4 VIINS Vectored Interrupt Instruction Register SP5 VCR1 Vector Count Register 1 SP6 VCR2 Vector Count Register 2 SP7 VCR3 Vector Count Register 3 SP8 VGMR0 Vector total mask register 0 SP9 VGMR1 Vector total mask register 1 SP10 VOR0 vector overflow register 0 SP11 VOR1 vector overflow register 1 SP12 VLABR Vector Data Address Breakpoint Register SP13 VDABR Vector Instruction Address Breakpoint Register SP14 VMMR0 Vector Branch Mask Register 0 SP15 VMMR1 Vector Branch Mask Register 1 SP16 VASYNC Vector and ARM7 Synchronization Registers

向量控制和状态寄存器VCSR的定义示于表C.5The definition of vector control and status register VCSR is shown in Table C.5

表C.5：VCSR的定义位助记符说明 31：18 保留 17：13 VSP<4：0> 返回地址栈指针。VSP通过转移到子例程和从子例程指令返回来使用以跟踪内部返回地址的栈顶。在返回地址栈中只有16个入口，VSP<4>用于检测栈溢出条件。 12 SO 汇总溢出状态标志。当算术操作结果溢出时，该位被置位。该位在一旦设置后是不变的，直到给这位写0 时才清除。位助记符说明 11 GT 大于状态标志。当SRa＞SRb时，用VSUBS指令设置该位。 10 EQ 等于状态标志。当SRa＝SRb时，用VSUBS指令设置该位。 9 LT 小于状态标志。当SRa＜SRb时用VSUBS指令设置该位 8 SMM 选择转移屏蔽。当该位设置时，VMMR0/1对成为算术操作的元素屏蔽。 7 CEM 补码元素屏蔽。当该位设置时，无论那个配置成算术操作的元素屏蔽，元素屏蔽被定义成VGMR0/1或VMMR0/1的1的补码。该位并不改变VGMR0/1或VMMR0/1的内容，它只改变这些寄存器的使用。SMM：CEM编码规定：00-用VGMR0/1作为除VCMOVM外的所有元素屏蔽。01-用VGMR0/1作为除VCMOVM外的所有元素屏蔽。10-用VMMR0/1作为除VCMOVM外的所有元素屏蔽。11-用VMMR0/1作为除VCMOVM外的所有元素屏蔽。 6 OED 溢出异常禁止。当该位置位时，处理器120检测到溢出条件后仍继续执行。 5 ISAT 整型数饱和方式。OED：ISAT位的组合规定为：00 无OED：ISAT位饱和规定：00 不饱和，当发生溢出异常时报告。X1 饱和，不引起溢出位助记符说明 10不饱和，当发生溢出异常时不报告。 4：3 RMODE IEEE754浮点操作舍入方式。00舍入方向负穷大01舍入方向零10舍入方向最接近值11舍入方向正无穷大 2 FSAT 浮点饱和方式位(快IEEE方式) 1 CBANK 当前组位。当置位时表示组1为当前组。当清除时表示组0为当前组，当VEC64位置位时，CBANK忽略。 0 VEC64 64字节9向量模式位。当置位时规定向量寄存器和累加器有576位。缺省模式规定32字节9长度，其被称为VEC32模式。 Table C.5: Definition of VCSR bit mnemonic illustrate 31:18 reserve 17:13 VSP<4:0> Return address stack pointer. The VSP is used by branching to and returning from subroutine instructions to track the top of the stack for internal return addresses. With only 16 entries in the return address stack, VSP<4> is used to detect stack overflow conditions. 12 SO Rollup overflow status flags. This bit is set when the result of an arithmetic operation overflows. Once set, this bit is unchanged until a 0 is written to this bit cleared only when. bit mnemonic illustrate 11 GT Greater than status flag. When SRa>SRb, use the VSUBS command to set this bit. 10 EQ Equal to status flags. When SRa = SRb, set this bit with the VSUBS command. 9 LT Less than status flag. Set this bit with the VSUBS instruction when SRa<SRb 8 SMM Select Transfer Shield. When this bit is set, VMMR0/1 pairs are masked from elements that become arithmetic operations. 7 CEM Two's complement element-wise masking. When this bit is set, the element mask is defined as the 1's complement of VGMR0/1 or VMMR0/1, regardless of which element mask is configured as an arithmetic operation. This bit does not change the contents of VGMR0/1 or VMMR0/1, it only changes the usage of these registers. SMM: CEM coding regulation: 00-use VGMR0/1 as shielding for all elements except VCMOVM. 01 - Use VGMR0/1 as mask for all elements except VCMOVM. 10 - Use VMMR0/1 as mask for all elements except VCMOVM. 11 - Use VMMR0/1 as mask for all elements except VCMOVM. 6 OED Overflow exceptions are prohibited. When this bit is set, processor 120 continues execution after detecting an overflow condition. 5 ISAT Integer saturation method. Combination of OED: ISAT bits specified: 00 None OED: ISAT bit saturated specified: 00 No saturation, reported when an overflow exception occurs. X1 saturates, does not cause overflow bit mnemonic illustrate 10 does not saturate, do not report when an overflow exception occurs. 4:3 RMODE IEEE754 floating-point operation rounding method. 00 rounding direction negative infinity 01 rounding direction zero 10 rounding direction nearest value 11 rounding direction positive infinity 2 FSAT Floating point saturation mode bit (fast IEEE mode) 1 CBANK Current group bit. When set, indicates that group 1 is the current group. When it is cleared, it means that group 0 is the current group. When the VEC64 bit is set, CBANK is ignored. 0 VEC64 64 bytes of 9 vector mode bits. When set, specifies that the vector registers and accumulators have 576 bits. The default mode specifies a length of 32 bytes 9, which is called VEC32 mode.

向量程序计数器寄存器VPC是由向量处理器120执行的下一指令的地址。ARM7处理器110在发出STARTVP指令以启动向量处理器120操作之前应当装载寄存器VPC。 The vector program counter register VPC is the address of the next instruction to be executed by the vector processor 120 . ARM7 processor 110 should load register VPC before issuing a STARTVP instruction to start vector processor 120 operation.

向量异常程序计数器VEPC指出最可能引起最新异常的指令地址。MSP100不支持精确异常，因此，用了“最可能”一词。 The Vectored Exception Program Counter VEPC indicates the address of the instruction most likely to cause the latest exception. MSP100 does not support exact exceptions, hence the term "most probable".

向量中断源寄存器VISRC对ARM7处理器110指出中断源。适当的位通过硬件在异常检测到时被设置。在向量处理器120重新开始执行之前软件必须清除寄存器VISRC。在寄存器VISRC中的任何位置位均引起向量处理器120进入状态VP-IDLE。若相应的中断使能位在VIMSK中被设置，给处理器110的中断就被发出。表C.6定义了寄存器VISRC的内容。 The vectored interrupt source register VISRC indicates to the ARM7 processor 110 the source of the interrupt. The appropriate bit is set by hardware when an exception is detected. Software must clear register VISRC before vector processor 120 resumes execution. Setting any bit in register VISRC causes vector processor 120 to enter state VP-IDLE. An interrupt to processor 110 is issued if the corresponding interrupt enable bit is set in VIMSK. Table C.6 defines the contents of register VISRC.

C.6：VISRC定义位记忆符说明 31 DAB 数据地址断点异常 30 LAB 指令地址断点异常 29 SSTP 单步异常 28-18 保留 17 IIA 无效指令地址异常 16 IINS 无效指令异常 15 IDA 无效数据地址异常 14 UDA 非对齐数据访问异常 13 FOV 浮点溢出异常 12 FINV 浮点无效操作数异常 11 FDIV 浮点被零除异常 10 IOV 整数溢出异常 9 IDIV 整数被零除异常 8 RASO 返回地址栈上溢异常 7 RASU 返回地址栈下溢异常 6 VIP VCINT异常挂起，执行STARTVP指令清除该位 5 VJP VCJOIN异常挂起，执行STARTVP指令清除该位 4-0 VPEV 向量处理器异常向量 C.6: VISRC definition bit mnemonic illustrate 31 DAB Data address breakpoint exception 30 LAB Instruction address breakpoint exception 29 SSTP single step exception 28-18 reserve 17 IIA invalid instruction address exception 16 IINS invalid instruction exception 15 IDA invalid data address exception 14 UDA Unaligned data access exception 13 FOV floating point overflow exception 12 FINV floating point invalid operand exception 11 FDIV floating point division by zero exception 10 IOV integer overflow exception 9 IDIV integer division by zero exception 8 RASO return address stack overflow exception 7 RASU return address stack underflow exception 6 VIP VCINT is abnormally suspended, execute the STARTVP instruction to clear this bit 5 VJP VCJOIN is suspended abnormally, execute the STARTVP instruction to clear this bit 4-0 VPEV vector processor exception vector

向量中断指令寄存器VIINS，当VCINT或VCJOIN指令被执行以中断ARM7处理器110时，VCINT或VCJOIN指令被更新。 The vector interrupt instruction register VIINS , when the VCINT or VCJOIN instruction is executed to interrupt the ARM7 processor 110, the VCINT or VCJOIN instruction is updated.

向量计数寄存器VCR1、VCR2和VCR3是用于“减量和分支”指令VD1CBR、VD2CBR及VD3CBR，并用执行的循环计数来被初始化。当执行VD1CBR指令时，寄存器VCR1被减1。如果计数值不为零，并且在指令内所指条件匹配VFLAG，则发生分支。否则，不发生分支。寄存器VCR1在任何情况下都可减1。寄存器VCR2和VCR3以相同方式被使用。 Vector count registers VCR1, VCR2 and VCR3 are used for the "decrement and branch" instructions VD1CBR, VD2CBR and VD3CBR and are initialized with the executed loop count. When the VD1CBR instruction is executed, the register VCR1 is decremented by 1. If the count value is not zero and the condition indicated within the instruction matches VFLAG, then branch occurs. Otherwise, no branching occurs. Register VCR1 can be decremented by 1 in any case. Registers VCR2 and VCR3 are used in the same way.

向量全屏蔽寄存器VGMR0表示在VEC32模式中将被影响的目的向量寄存器的元素和在VEC64模式中处在VR<287：0>内的元素。在VGMR0中的每一位控制向量目的寄存器中9个位的更新。具体讲，VGMR0控制VEC32模式中VRd<9i+8：9i>的更新和VEC64模式中VR₀d<9i+8：9i>的更新。注意，VR₀d指的是在VEC64模式中组0内的目的寄存器，而VRd指的是当前组中的目的寄存器。在VEC32模式中，它既中是组0，也可为组1。向量全屏蔽寄存器VGMR0用于将VCMOVM指令以外的所有指令的执行中。 The vector full mask register VGMR0 indicates the elements of the destination vector register to be affected in VEC32 mode and the elements in VR<287:0> in VEC64 mode. Each bit in VGMR0 controls the update of 9 bits in the vector destination register. Specifically, VGMR0 controls the update of VRd<9i+8:9i> in VEC32 mode and the update of VR ₀ d<9i+8:9i> in VEC64 mode. Note that VR ₀ d refers to the destination register in bank 0 in VEC64 mode, and VRd refers to the destination register in the current bank. In VEC32 mode, it can be either bank 0 or bank 1. The vector full mask register VGMR0 is used in the execution of all instructions except the VCMOVM instruction.

向量全屏蔽寄存器VGMR1表示在VEC64模式中将被影响的VR<575：288>内的元素。在寄存器VGMR1中的每位控制组1中向量目的寄存器中9个位的更新。具体讲VGMR1控制VR1<9i+8：9i>的更新。寄存器VGMR1在VEC32模式中不使用，但在VEC64模式中，影响除VCMOVM指令外的所有指令的执行。Vector full mask register VGMR1 indicates the elements within VR<575:288> that will be affected in VEC64 mode. Each bit in register VGMR1 controls the updating of the 9 bits in the vector destination register in group 1. Specifically, VGMR1 controls the updating of VR1<9i+8:9i>. The register VGMR1 is not used in VEC32 mode, but in VEC64 mode, it affects the execution of all instructions except the VCMOVM instruction.

向量溢出寄存器VOR0表示在VEC32模式中的元素和VEC64模式中VR<287：0>内的元素，它们包括一向量算术运算后的溢出结果。该寄存器并不受标量算术运算的修改。位VOR0设置表示字节和字节9的第i个元素、半字的第<i，idiv2>个元素、或字数据类型操作的第(i，idiv4)个元素包括溢出结果。例如，位1和位3可能分别被设置以表示第一个半字和字元素的溢出。在VOR0中位的映射不同于在VGMR0或VGMR1中位的映射。 The vector overflow register VOR0 represents the elements in VEC32 mode and VR<287:0> in VEC64 mode, which contain the overflow result after a vector arithmetic operation. This register is not modified by scalar arithmetic operations. A bit VOR0 setting indicates that the ith element of byte and byte9, the <i, idiv2> element of a halfword, or the (i, idiv4)th element of a word data type operation includes the overflow result. For example, bit 1 and bit 3 may be set to indicate overflow of the first halfword and word elements, respectively. The mapping of bits in VOR0 is different from the mapping of bits in VGMR0 or VGMR1.

向量溢出寄存器VOR1用来表示在VEC64模式中VR<575：288>内的元素，它们包括在向量算术运算后的溢出结果。寄存器VOR1在VEC32模式中并不用，也不由标量算术运算来修改。设置的位VOR1表示字节或字节9的第i个元素、半字的第(i，idiv2)个元素、或字数据类型操作的第(i idiv4)个元素包括溢出结果。例如，位1和位3可能分别被设置以表示在VR<575：288>中第一个半字或字元素的溢出。在VOR1中位的映射不同于在VGMR0或VGMR1中位的映射。 The vector overflow register VOR1 is used to represent the elements within VR<575:288> in VEC64 mode, which contain the overflow results after vector arithmetic operations. Register VOR1 is not used in VEC32 mode and is not modified by scalar arithmetic operations. Bit VOR1 set indicates that the i-th element of a byte or byte9, (i, idiv2)-th element of a halfword, or (i idiv4)-th element of a word datatype operation includes an overflow result. For example, bit 1 and bit 3, respectively, may be set to indicate overflow of the first halfword or word element in VR<575:288>. The mapping of bits in VOR1 is different from the mapping of bits in VGMR0 or VGMR1.

向量指令地址断点寄存器VLABR辅助调试向量程序。寄存器的定义如表C.7所示。 Vector instruction address breakpoint register VLABR assists in debugging vector programs. The definitions of the registers are shown in Table C.7.

表C.7：VLABR的定义位助记符说明 31-2 IADR 向量指令地址，在复位时不定义 1 保留位 0 IABE 指令地址断点使能。在复位时不定义。如果置位，当向量指令访问地址同VLABR<31：2>匹配时发生“向量指令地址断点”异常，设置位VISRC<IAB>以表示异常。该异常在指令执行前报告。 Table C.7: Definition of VLABR bit mnemonic illustrate 31-2 IADR Vector instruction address, undefined at reset 1 reserved bit 0 IABE Instruction address breakpoint enable. Undefined at reset. If set, a "vector instruction address breakpoint" exception occurs when the vector instruction access address matches VLABR<31:2>, and bit VISRC<IAB> is set to indicate the exception. This exception is reported before the instruction is executed.

向量数据地址断点寄存器VDABR辅助调试向量程序。寄存器的定义如表C.8表示。 Vector data address breakpoint register VDABR assists in debugging vector programs. The definition of the register is shown in Table C.8.

表C.8：VDABR的定义位助记符说明 31-2 DADR 向量数据地址。在复位时不定义 1 SABE 存贮地址断点使能。复位时不定义。如果置位，当向量存储访问地址同VDABR<31：2>匹配时，发生“向量数据地址断点”异常。位VISRC<DAB>被置位以表示异常。在存贮指令执行之前报告异常。 0 LABE 加载地址断点使能。在复位时清除。如果置位，当向量加载访问地址同VDABR<31：2>匹配时发生“向量数据地址断点”异常。VISRC<DAB>被置位以表示异常。在加载指令执行之前报告异常。 Table C.8: Definition of VDABR bit mnemonic illustrate 31-2 DADR Vector data address. undefined at reset 1 SABE Store address breakpoint enable. Undefined on reset. If set, a "Vector Data Address Break" exception occurs when a vector memory access address matches VDABR<31:2>. Bit VISRC<DAB> is set to indicate an exception. An exception is reported before the store instruction is executed. 0 LABE Load address breakpoint enable. Cleared on reset. If set, a "Vector Data Address Break" exception occurs when a vector load access address matches VDABR<31:2>. VISRC<DAB> is set to indicate an exception. An exception is reported before the load instruction executes.

向量转移屏蔽寄存器VMMR0在所有时间内供VCMOVM指令用，同时当VCSR<SMM>＝1时供所有指令用。寄存器VMMR0表示在VEC32模式中会受到影响的目的向量寄存器的元素，以及在VEC64模式中VRL<287：0>内元素。在VMMR0中的每个位控制向量目的寄存器中9个位的更新。具体说VMMR0在VEC32模式中控制VRd<9i+8：9i>的更新，在VEC64模式中控制VR₀d<9i+8：9i>的更新。在VEC64模式中VR₀d表示在组0中的目的寄存器，VRd指在当前组中的目的寄存器，在VEC32模式中VRd既可在组0中也可在组1中。 The vector transfer mask register VMMR0 is used for VCMOVM instruction at all times, and is used for all instructions when VCSR<SMM>=1. Register VMMR0 indicates the elements of the destination vector register that will be affected in VEC32 mode, and the elements in VRL<287:0> in VEC64 mode. Each bit in VMMR0 controls the update of 9 bits in the vector destination register. Specifically, VMMR0 controls the update of VRd<9i+8:9i> in VEC32 mode, and controls the update of VR ₀ d<9i+8:9i> in VEC64 mode. In VEC64 mode, VR ₀ d represents the destination register in group 0, and VRd refers to the destination register in the current group. In VEC32 mode, VRd can be in group 0 or group 1.

向量转移屏蔽寄存器VMMR1在所有时间内供VCMOVM指令用，同时当VCSR<SMM>＝1时供所有指令用。寄存器VMMR1表示在VEC64模式中会受影响的VR<575：288>中的元素，VMMR1中的每位控制在组1中向量目的寄存器中9个位的更新。具体说VGMR1控制VR1d<9i+8：9i>的更新。寄存器VGMR1在VEC32模式中不使用。 The vector transfer mask register VMMR1 is used for VCMOVM instruction at all times, and is used for all instructions when VCSR<SMM>=1. Register VMMR1 represents the elements in VR<575:288> that are affected in VEC64 mode, and each bit in VMMR1 controls the update of 9 bits in the vector destination register in group 1. Specifically, VGMR1 controls the updating of VR1d<9i+8:9i>. Register VGMR1 is not used in VEC32 mode.

向量和ARM7同步寄存器VASYNC提供在处理器110和120之间生产者/消费者型式的同步。当前，只定义了位30。当向量处理器120在VP-RUN或VP_IDLE时，ARM7处理器可用MFER、MTER和TESTSET指令访问寄存器VASYNC。寄存器VASYNC不能通过TVP或MFVP指令被ARM7处理器所访问。因为这些指令不能访问超出开始的16个向量处理器的专用寄存器。向量处理能通过VMOV指令访问寄存器VASYNC。 The vector and ARM7 synchronization register VASYNC provides producer/consumer style synchronization between processors 110 and 120 . Currently, only bit 30 is defined. When the vector processor 120 is in VP-RUN or VP_IDLE, the ARM7 processor can use the MFER, MTER and TESTSET instructions to access the register VASYNC. Register VASYNC cannot be accessed by ARM7 processors through TVP or MFVP instructions. Because these instructions cannot access special registers beyond the first 16 vector processors. Vector processing can access the register VASYNC through the VMOV instruction.

表C.9显示加电复位时的向量处理器的状态。Table C.9 shows the state of the vector processor at power-on reset.

表C.9：向量处理器加电复位状态寄存器复位状态 SR0 0 所有其它寄存器无定义 Table C.9: Vector processor power-on reset states register Reset state SR0 0 all other registers no definition

在向量处理器能够执行指令之前，通过ARM7处理器110初始化专用寄存器。Special purpose registers are initialized by the ARM7 processor 110 before the vector processor can execute instructions.

附录DAppendix D

每条指令隐含或规定了源和目的操作数的数据类型。某些指令具有同等地适用于多于一种数据类型的语义。某些指令具有的语义，对源采用一种数据类型，而对结果产生不同数据类型。本附录说明了示范性实施例支持的数据类型。在本申请中表1中说明了所支持的数据类型int8、int9、int16、int32和浮点数。不支持无符号的整型格式，对无符号的整型值在使用之前首先必须转换成2的补码格式。程序员可自由使用算术指令连同无符号整型或选择任何其它格式，只要合适地处理溢出。这种结构只定义了2的补码整型数和32位浮点数据类型的溢出。这些结构并不检测8、9、16或32位运算的执行，该运算是检测无符号溢出所必须的。表D.1显示了加载操作所支持的数据长度Each instruction either implies or specifies the data types of the source and destination operands. Certain instructions have semantics that apply equally to more than one data type. Certain instructions have the semantics of taking one data type for the source and producing a different data type for the result. This appendix describes the data types supported by the exemplary embodiment. The supported data types int8, int9, int16, int32 and floating point numbers are described in Table 1 in this application. Unsigned integer formats are not supported, and unsigned integer values must first be converted to 2's complement format before use. Programmers are free to use arithmetic instructions with unsigned integers or choose any other format as long as overflow is handled appropriately. This structure only defines overflow for 2's complement integers and 32-bit floating-point data types. These structures do not detect the execution of 8, 9, 16, or 32-bit operations that are necessary to detect unsigned overflow. Table D.1 shows the data length supported by the load operation

D1：加载操作支持的数据长度存贮器中数据长度寄存器中数据长度加载操作 8-bit 9-bit 加载8位，符号扩展到9位(用于加载8位2的补码) 8-bit 9-bit 加载8位，零扩展到9位(用于加载无符号的8位) 16-bit 16-bit 加载16位，(用于加载16位无符号或2的补码) 32-bit 32-bit 加载32位，(用于加载32位无符号、2的补码整型数或32位浮点) D1: The data length supported by the load operation Data length in memory Data length in register load operation 8-bit 9-bit Load 8 bits, sign extend to 9 bits (for loading 8-bit 2's complement) 8-bit 9-bit Load 8 bits, zero extend to 9 bits (for loading unsigned 8 bits) 16-bit 16-bit Load 16-bit, (for loading 16-bit unsigned or 2's complement) 32-bit 32-bit Load 32-bit, (for loading 32-bit unsigned, 2's complement integer or 32-bit floating point)

这种结构规定存贮器地址按数据类型边界对齐。也就是对字节无对齐要求；对半字，对齐条件是半字边界；对字，对齐条件是字边界。This structure specifies that memory addresses are aligned on data type boundaries. That is, there is no alignment requirement for bytes; for halfwords, the alignment condition is a halfword boundary; for words, the alignment condition is a word boundary.

表D.2显示了存贮操作所支持的数据长度Table D.2 shows the data lengths supported by storage operations

表D.2：存贮操作所支持的数据长度寄存器中数据长度存贮器中数据长度存贮操作 8-bit 8-bit 存贮8位(存贮8位无符号或2的补码) 9-bit 8-bit 截到低8位，存贮8位(存贮9位具有无符号的值在0-255之间的2的补码) 16-bit 16-bit 存贮16位(存贮16位无符号或2的补码)。 32-bit 32-bit 存贮32位 Table D.2: Data lengths supported by storage operations Data length in register Data length in memory storage operation 8-bit 8-bit Store 8 bits (store 8-bit unsigned or 2's complement) 9-bit 8-bit Cut to the lower 8 bits, store 8 bits (storage 9 bits with 2's complement with unsigned values between 0-255) 16-bit 16-bit Stores 16 bits (stores 16-bit unsigned or 2's complement). 32-bit 32-bit store 32 bits

因为多于一种数据类型被映射到不论是标量或矢量的寄存器。所以在目的寄存器中对某些数据类型可能有些位没有定义的结果。事实上，除了在向量目的寄存器中字节9数据长度的操作和在标量目的寄存器中字数据长度操作之外，在目的寄存器中有些位，它们的值并不因操作而被定义。对这些位，结构规定他们的值是未定义的，表D.3显示了对每种数据长度未被定义的位。Because more than one data type is mapped to a register whether scalar or vector. So there may be undefined results for some bits in the destination register for some data types. In fact, except for byte 9 data length operations in vector destination registers and word data length operations in scalar destination registers, there are some bits in the destination register whose value is not defined by the operation. For these bits, the structure specifies that their values are undefined. Table D.3 shows the undefined bits for each data length.

表D.3：数据长度的未定义位数据长度向量目的寄存器标量目的寄存器字节 VR<9i+8>，for i＝0 to 31 SR<31：8> 字节9 none SR<31：9> 半字 VR<9i+8>，for i＝0 to 31 SR<31：16> 字 VR<9i+8>，for i＝0 to 31 none Table D.3: Undefined bits for data length Data length vector destination register scalar destination register byte VR<9i+8>, for i=0 to 31 SR<31:8> Byte 9 none SR<31:9> half word VR<9i+8>, for i=0 to 31 SR<31:16> Character VR<9i+8>, for i=0 to 31 none

程序员在编程时必须知道源和目的寄存器或存贮器的数据类型。数据类型从一种元素长度转换成另一种潜在地造成以不同元素数存贮在向量寄存器中。例如从半字到字数据类型的向量寄存器的转换需要2个向量寄存器去存贮同样数目的被转换元素。相反，从在向量寄存器中具有用户定义格式的字数据类型转换成半字格式，在向量寄存器的一半中产生相同数目元素，而剩余位在另一半中。在这两种情况下，数据类型转换产生一种具有被转换元素配置的结构，这些元素的长度不同于源元素的长度。Programmers must know the data types of source and destination registers or memory when programming. Data type conversion from one element size to another potentially results in a different number of elements being stored in the vector register. For example, a vector register conversion from halfword to word data type requires 2 vector registers to store the same number of converted elements. Conversely, converting from a word data type that has a user-defined format in a vector register to halfword format results in the same number of elements in one half of the vector register and the remaining bits in the other half. In both cases, datatype conversion produces a structure with a configuration of converted elements whose length differs from that of the source elements.

作为一个原理，MSP结构并不提供隐含地改变元素数量作为结果操作，这种结构意味着程序员必须知道改变目的寄存器中元素数的后果。此结构只提供从一种数据类型转换成同样长度的另一种数据类型的操作。并且当从一种数据类型转或另一种不同长度的数据类型时，需要程序员为数据长度的差异作出调整。As a matter of principle, the MSP structure does not provide implicitly changing the number of elements as a result of an operation. This structure means that the programmer must be aware of the consequences of changing the number of elements in the destination register. This structure only provides operations to convert from one data type to another data type of the same length. And when switching from one data type to another with a different length, the programmer needs to make adjustments for the difference in data length.

在附录E中所述的专用指令如VSHFLL和VUNSHFLL使第一种数据长度的向量转换成第2种数据长度的向量简单化。在向量寄存器VRa中，从较小的元素长度(如int8)转换成较大元素长度，(如int16)的2补码数据类型所包括的基本步骤为：Dedicated instructions such as VSHFLL and VUNSHFLL described in Appendix E simplify the conversion of vectors of the first data length to vectors of the second data length. In the vector register VRa, the basic steps involved in converting from a smaller element length (such as int8) to a larger element length, (such as int16) 2's complement data type are:

1.使用字节数据类型，将VRa中的元素与另一向量VRb混洗成为2个向量VRc：VRd。在VRa中的元素转移倍宽寄存器VRc：VRd中int16数据元素的较低字节处。而其值不相关的VRb的元素转移到VRc：VRd的高字节处。该操作有效地把VRa元素的一半移到VRc中，另一半移到VRd中。这时，每个元素的长度，从字节加倍成半字。1. Use the byte data type to shuffle the elements in VRa and another vector VRb into two vectors VRc:VRd. Element shift double-wide register VRc in VRa: lower byte of int16 data element in VRd. The elements of VRb whose values are irrelevant are transferred to the high byte of VRc:VRd. This operation effectively moves half of the VRa elements into VRc and the other half into VRd. At this time, the length of each element is doubled from bytes to halfwords.

2.对VRc：VRd中的元素算术移位移8位，并对它作符号扩展2. For VRc: The arithmetic shift of elements in VRd is shifted by 8 bits, and it is sign-extended

在向量寄存器VRa中将2的补码数从较大元素长度(int16)转成较小长度(如int8)所包括的基本步骤为：The basic steps involved in converting a 2's complement number from a larger element size (int16) to a smaller size (eg int8) in vector register VRa are:

1.检查确认int16数据类型中的每个元素能以字节长度来表示。如果需要，在两端饱和该元素，以适合较小长度。1. Check that each element in the int16 data type can be expressed in bytes. Saturation of the element at both ends, if desired, for smaller lengths.

2.将VRa中的元素同另一向量VRb去混洗，转移到2个向量VRc：VRd中，在VRa：VRd中，每个元素的高半部转移到VRc，低半部转移到VRd中，这样在VRd的低半部中有效地集合了VRa中所有元素的低半部元素。2. Deshuffle the elements in VRa with another vector VRb, and transfer them to two vectors VRc:VRd. In VRa:VRd, the high half of each element is transferred to VRc, and the low half is transferred to VRd. , which effectively aggregates the lower half elements of all elements in VRa in the lower half of VRd.

为了下列数据类型转换提供一些特殊指令：int32转成单精度浮点；单精度浮点转成定点(X.Y表示法)；单精度浮点转成int32；int8转成int9；int9转成int16；及int16转成int9。Some special instructions are provided for the conversion of the following data types: int32 to single precision floating point; single precision floating point to fixed point (X.Y notation); single precision floating point to int32; int8 to int9; int9 to int16; and int16 to int9.

为了提供向量程序设计的灵活性，大部分向量指令使用元素屏蔽以及只操作在向量寄存器内所选定的元素。“向量全屏蔽寄存器”VGMR0和VGMR1标识的元素是通过向量指令在目的寄存器和向量累加器中要被修改的元素。对字节和字节9数据长度的操作，在VGMR0(或VGMR1)中32位中的每位标识要被操作的一个元素，位VGMR0置位表示字节长度的元素i将被作用。这里的i是0到31。对半字数据长度操作来讲，在VGMR0(或VGMR1)中32位中的每二位标识要被操作的一个元素。位VGMR0<2i：2i+1>置位表示元素i将受作用，i为0到15。如果在半字数据长度操作中VGMR0的一对中只有一位被置位，则只有相应字节中的那些位被修改。对字数据长度操作，在VGMR0(或VGMR1)中每4位一组置位标识一元素被操作。位VGMR0<4i：4i+3>置位表示元素i将受作用，i是0到7。如果在VGMR0中不是4位一组的所有位为字数据长度操作而置位，则只是对应字节的那些位被修改。To provide flexibility in vector programming, most vector instructions use element masking and operate only on selected elements in vector registers. The elements identified by "vector full mask register" VGMR0 and VGMR1 are the elements to be modified in the destination register and vector accumulator by vector instructions. For the operation of byte and byte 9 data length, each of the 32 bits in VGMR0 (or VGMR1) identifies an element to be operated, and the bit VGMR0 is set to indicate that the element i of the byte length will be acted on . Here i is 0 to 31. For half-word data length operations, every two bits in the 32 bits in VGMR0 (or VGMR1) identify an element to be operated on. Bit VGMR0<2i:2i+1> is set to indicate that element i will be affected, and i is 0 to 15. If only one bit in a pair of VGMR0 is set in a halfword data length operation, only those bits in the corresponding byte are modified. For the word data length operation, in VGMR0 (or VGMR1), every group of 4 bits is set to indicate that an element is operated. Bit VGMR0<4i:4i+3> is set to indicate that element i will be affected, and i is 0 to 7. If not all bits in a 4-bit group in VGMR0 are set for a word data length operation, only those bits of the corresponding byte are modified.

VGMR0和VGMR1的设置可用VCMPV指令将向量寄存器同向量或标量寄存器比较或向量寄存器同立即值比较确定，该指令根据所规定的数据长度适合地设置屏蔽。因为标量寄存器被定义成只包含一个数据元素，所以标量操作(即目的寄存器是标量)不会受元素屏蔽影响。The setting of VGMR0 and VGMR1 can be determined by comparing a vector register with a vector or scalar register or comparing a vector register with an immediate value with the VCMPV instruction, which sets the mask appropriately according to the specified data length. Because scalar registers are defined to contain only one data element, scalar operations (ie, the destination register is a scalar) are not affected by element masking.

为了向量编程的灵活性，大部分MSP指令支持向量和标量操作的3种形式，它们为：For the flexibility of vector programming, most MSP instructions support three forms of vector and scalar operations, which are:

1.向量＝向量操作向量1. Vector = Vector Operation Vector

2.向量＝向量操作标量2. Vector = Vector Operation Scalar

3.向量＝标量操作标量3. Vector = scalar operation scalar

情况2中标量寄存器规定作为B操作数，在标量寄存器中的单个元素复制成为了匹配向量A操作数中元素数所需的数量。复制的元素同被指定标量操作数中的元素具有相同值。标量操作数以立即操作数形式可来自标量寄存器或指令。在立即操作数的情况下，如果规定数据类型用的数据长度比可得到的立即字段长度大，则采用适当的符号扩展。In case 2 the scalar register is specified as the B operand, and the individual elements in the scalar register are copied as many as are needed to match the number of elements in the vector A operand. The copied elements have the same value as the elements in the specified scalar operand. Scalar operands may come from scalar registers or instructions as immediate operands. In the case of immediate operands, if the data length specified for the data type is greater than the available immediate field length, appropriate sign extension is applied.

在许多多媒体应用中，特别注意源立即数和最后结果的精度。另外，整数乘指令产生能存贮在2个向量寄存器中的“双精度”中间结果。In many multimedia applications, special attention is paid to the precision of the source immediate and final result. In addition, integer multiply instructions produce "double precision" intermediate results that can be stored in two vector registers.

通常，MSP结构支持8、9、16和32位元素的2的补码的整型格式和32位元素IEEE754单精度格式。定义的溢出，表示结果超出用规定数据类型所能表示的最大正值或最大负值范围。当发生溢出时，写到目的寄存器中的值不是有效数，所定义的下溢只用于浮点操作。In general, MSP structures support 8, 9, 16, and 32-bit element 2's complement integer formats and 32-bit element IEEE754 single precision format. Defined overflow, which means that the result exceeds the range of the maximum positive value or the maximum negative value that can be represented by the specified data type. When an overflow occurs, the value written to the destination register is not a valid number, and the defined underflow is only used for floating-point operations.

除非另有说明，所有浮点操作用位VCSR<RMODE>指定的4种舍入方式之一。某些指令用熟知的舍去零(偶舍入)的舍入方式。这些指令是明显指出的。Unless otherwise specified, all floating-point operations use one of four rounding modes specified by bits VCSR<RMODE>. Certain instructions use the well-known rounding method of truncating zeros (even rounding). These instructions are clearly stated.

在许多多媒体应用中，饱和是一种重要功能。MSP结构支持所有4种整型和浮点操作饱和。在寄存器VCSR中位ISAT指定整型饱和方式。浮点饱和方式，亦称快速IEEE方式，它用VCSR中FSAT位来指定。当使能饱和方式时，超过最大正或大负值的结果分别设置成最大正或最大负值。在这种情况下，不发生溢出，溢出位不能被设置。Saturation is an important function in many multimedia applications. The MSP structure supports saturation for all 4 integer and floating point operations. Bit ISAT in register VCSR specifies the integer saturation mode. The floating-point saturation mode, also known as the fast IEEE mode, is specified by the FSAT bit in the VCSR. When saturation mode is enabled, results exceeding the maximum positive or maximum negative value are set to the maximum positive or maximum negative value, respectively. In this case, no overflow occurs and the overflow bit cannot be set.

表D.4列出精确异常，这些异常在执行故障指定之前被检测和报告。异常向量地址用16进制表示Table D.4 lists the precise exceptions that are detected and reported prior to execution of the fault specification. The exception vector address is expressed in hexadecimal

表D.4：精确异常异常向量说明 0x00000018 向量处理器指令地址断点异常 0x00000018 向量处理器数据地址断点异常 0x00000018 向量处理器无效指令异常 0x00000018 向量处理器单步异常 0x00000018 向量处理器返回地址栈上溢异常 0x00000018 向量处理器返回地址栈下溢异常 0x00000018 向量处理器VCINT异常 0x00000018 向量处理器VCJOIN异常 Table D.4: Precise exceptions exception vector illustrate 0x00000018 Vector processor instruction address breakpoint exception 0x00000018 Vector processor data address breakpoint exception 0x00000018 vector processor invalid instruction exception 0x00000018 Vector Processor Single Step Exception 0x00000018 Vector processor return address stack overflow exception 0x00000018 Vector processor return address stack underflow exception 0x00000018 Vector processor VCINT exception 0x00000018 Vector processor VCJOIN exception

表D.5列出不精确异常，这些异常在执行了某些在程序中处于故障指令之后的指令后，被检测和报告。Table D.5 lists imprecise exceptions that are detected and reported after the execution of certain instructions that follow the faulting instruction in the program.

表D.5：非精确异常异常向量说明 0x00000018 向量处理器无效指令地址异常 0x00000018 向量处理器无效数据地址异常 0x00000018 向量处理器不对齐数据存取异常 0x00000018 向量处理器整数上溢异常 0x00000018 向量处理器浮点上溢异常 0x00000018 向量处理器浮点无效操作数异常 0x00000018 向量处理器浮点数被0除异常 0x00000018 向量处理器整数被0除异常 Table D.5: Inexact exceptions exception vector illustrate 0x00000018 vector processor invalid instruction address exception 0x00000018 Vector processor invalid data address exception 0x00000018 Vector processor misaligned data access exception 0x00000018 vector processor integer overflow exception 0x00000018 Vector Processor Floating Point Overflow Exception 0x00000018 Vector processor floating point invalid operand exception 0x00000018 Vector processor floating-point number is divided by 0 exception 0x00000018 vector processor integer division by 0 exception

附录EAppendix E

本向量处理器指令包括示于表E.1中的十一个分类表E.1 向量指令类汇总类说明控制流本类所含的指令用于控制包括转移和ARM7接口指令的程序流。逻辑(按位方式，屏蔽) 本类包括按位方式的逻辑指令。虽然(按位方式，屏蔽)数据类型是布尔类，但逻辑指令使用元素屏蔽以修改结果，从而要求数据类型。移位和循环移位(按元素方式，屏蔽) 本类所含的指令用于每个元素中的移位和循环位的屏蔽。本类区分元素的长度，并且受元素屏蔽的影响。算术(按元素方式，屏蔽) 本类包括按元素方式的算术指令。(按元素方式，屏蔽)就是说第i个元素结果是源中的第i个元素计算得到的，本类区分元素的类型，并受元素屏蔽的影响。多媒体(按元素方式，屏蔽) 本类所含的指令用于优化多媒体(按元素方式，屏蔽)的应用，本类区分元素类型，并受元素屏蔽影响。数据类型转换(按元素方式，无屏蔽) 本类包含的指令用于转换元素从一种(元素方式，无屏蔽)数据类型到另一种。本类的指令支持指定的数据类型集，并且不经过元素屏蔽，因为此结构不支持寄存器中多于一种的数据类型。元素间算术本类包括的指令用于从向量的不同位置取两个元素以产生算术结果。元素间转移本类包括的指令用于从向量的不同位置取两个元素以重新排列元素。加载/存储本类包括的指令用于加载或存储寄存器。这些指令不受元素屏蔽的影响。高速缓存操作本类所含指令用于控制指令和数据高速缓存。这些指令不受元素屏蔽的影响。寄存器转移本类包含的指令用于在两个寄存器间转移数据。这些指令通常不受元素屏蔽的影响，但某些可以选用元素屏蔽。 This vector processor instruction includes eleven categories shown in Table E.1 Table E.1 Vector instruction class summary kind illustrate control flow This class contains instructions for controlling program flow including branch and ARM7 interface instructions. logical (bitwise, masked) This class includes bitwise logical instructions. While the (bitwise, masked) datatype is boolean-like, logic instructions use element-wise masking to modify the result, thus requiring the datatype. shift and rotate (element-wise, masking) This class contains instructions for masking of shift and rotate bits within each element. This class distinguishes the length of elements and is affected by element masking. Arithmetic (element-wise, masking) This class includes element-wise arithmetic instructions. (by element, masking) means that the result of the i-th element is calculated by the i-th element in the source. This class distinguishes the types of elements and is affected by element masking. Multimedia (by element, masked) The instructions contained in this class are used to optimize the application of multimedia (by element, masking). This class distinguishes element types and is affected by element masking. Data type conversion (element-wise, without masking) This class contains instructions for converting elements from one (element-wise, unmasked) data type to another. Instructions of this class support the specified set of data types and are not element-masked because this structure does not support more than one data type in a register. Element-wise arithmetic This class includes instructions for taking two elements from different positions in a vector to produce an arithmetic result. transfer between elements This class includes instructions for taking two elements from different positions in a vector to rearrange the elements. load/store This class includes instructions for loading or storing registers. These directives are not affected by element masking. cache operation This class contains instructions for controlling instruction and data caches. These directives are not affected by element masking. register transfer This class contains instructions for moving data between two registers. These directives are generally not affected by element masking, but some can optionally be element masked.

表E.2列出流控制指令。Table E.2 lists the flow control instructions.

表E.2：流控指令助记符说明 VCBR 条件分支 VCBRI 间接条件分支 VD1CBR 减量VCR1和条件分支 VD2CBR 减量VCR2和条件分支 VD3CBR 减量VCR3和条件分支 VCJSR 条件转子例程 VCJSRI 间接条件转子例程 VCRSR 从程序条件返回 VCINT 条件中断ARM7 VCJOIN 条件与ARM7汇合 VCCS 条件上下文切换 VCBARR 条件屏障 VCHGCR 改变控制寄存器(VCSR) Table E.2: Flow Control Instructions mnemonic illustrate VCBR conditional branch VCBRI indirect conditional branch VD1CBR Decrement VCR1 and conditional branches VD2CBR Decrement VCR2 and conditional branches VD3CBR Decrement VCR3 and conditional branches VCJSR Conditional Rotor Routine VCJSRI indirect condition rotor routine VCRSR return from program condition VCINT Conditional Interrupt ARM7 VC JOIN Conditional confluence with ARM7 VCCS conditional context switch VCBARR conditional barrier VCHGCR Change Control Register (VCSR)

逻辑类支持布尔数据类型，并受元素屏蔽影响。表E.3列出流控指令。Logical classes support Boolean data types and are affected by element masking. Table E.3 lists flow control instructions.

表E.3：逻辑指令助记符说明 VNOT NOT--B VAND AND-(A&B) VCAND 补码AND-(-A&B) VANDC AND补码-(A&-B) VNAND NAND--(A&B) VOR OR-(A|R) VCOR 补码OR-(-A|R) VORC OR补码-(A|-R) VNOR NOR--(A|R) VXOR 异或-(A^R) VXNOR 异或非--(A^R) Table E.3: Logic instructions mnemonic illustrate VNOT NOT--B VAND AND-(A&B) VCAND Complementary AND-(-A&B) VANDC AND Complement -(A&-B) VNAND NAND--(A&B) VOR OR-(A|R) VCOR Complement OR-(-A|R) VORC OR complement -(A|-R) VNOR NOR--(A|R) VXOR XOR-(A^R) VXNOR XOR--(A^R)

移位/循环转移类指令对int8、int9、int16和int32数据类型操作(非浮点数据类型)，并受元素屏蔽的影响。表E.4列出了移位/循环移位类指令。Shift/rotate instructions operate on int8, int9, int16 and int32 data types (non-floating point data types), and are affected by element masking. Table E.4 lists the shift/rotate instructions.

表E.4：移位和循环移位类助记符说明 VDIV2N 除2的幂 VLSL 逻辑左移 VLSR 逻辑右移 VROL 循环左移 VROR 循环右移 Table E.4: Shift and rotate classes mnemonic illustrate VDIV2N power of 2 VLSL logical shift left VLSR logical shift right VROL cycle left VROR cycle right

通常，算术类指令支持int8、int9、int16和int32和浮点数据类型，并受元素屏蔽的影响。对于不支持的数据类型的专门限制，参考下面每条指令的详细说明。VCMPV指令是不受元素屏蔽的影响，因其工作于元素屏蔽情况。表E.5列出算术类指令。In general, arithmetic instructions support int8, int9, int16, and int32 and floating-point data types, and are affected by element masking. For specific restrictions on unsupported data types, refer to the detailed description of each instruction below. The VCMPV instruction is not affected by element masking because it works in element masking situations. Table E.5 lists the arithmetic instructions.

表E.5：算术类助记符说明 VASR 算术右移 VADD 加 VAVG 平均 VSUB 减 VASUB 减绝对值 VMUL 乘 VMULA 累加器乘 VMULAF 乘累加器小数 VMULF 乘小数 VMULFR 乘小数和并舍入 VMULL 乘低位 VMAD 乘和加 VMADL 乘和加低位 VADAC 加和累加助记符说明 VADACL 加和累加低位 VMAC 乘和累加 VMACF 乘和累加小数 VMACL 乘和累加低位 VMAS 乘和从累加器减 VMASF 乘和从累加器小数减 VMASL 乘和从累加器低位减 VSATU 饱和到上限 VSATL 饱和到下限 VSUBS 减标量和置条件 VCMPV 比较向量和置屏蔽 VDIVI 除初始化 VDIVS 除 VASL 算术右移 VASA 累加器算术移一位 Table E.5: Arithmetic classes mnemonic illustrate VASR arithmetic right shift VADD add VAVG average VSUB reduce VASUB subtract absolute value VMUL take VMULA accumulator multiply VMULAF multiply accumulator decimal VMULF multiply decimals VMULFR multiply decimal sum and round VMULL multiplied by low VMAD multiply and add VMADL multiply and add low VADAC Add and accumulate mnemonic illustrate VADACL Add and accumulate low bit VMAC multiply and accumulate VMACF multiply and add decimals VMACL multiply and accumulate low VMAS multiply and subtract from accumulator VMASF Multiply and Subtract Decimal from Accumulator VMASL Multiply and subtract from low accumulator VSATU saturated to the upper limit VSATL Saturation to lower limit VSUBS decrement and conditional VCMPV compare vector and set mask VDIVI In addition to initialization VDIVS remove VASL arithmetic right shift VASA Arithmetic shift of accumulator by one bit

MPEG指令是专门适合于MPEG编码和解码的一类指令，但可以以不同的方式使用。MPEG指令不支持int8、int9、int16和int32数据类型，并受元素屏蔽的影响。表E.6列出了MPEG指令。MPEG commands are a class of commands specifically suited for MPEG encoding and decoding, but can be used in different ways. MPEG instructions do not support int8, int9, int16 and int32 data types and are subject to element masking. Table E.6 lists the MPEG directives.

表E.6：MPEG类助记符说明 VAAS3 加和加(-1，0，1)符号 VASS3 加和减(-1，0，1)符号 VEXTSGN2 抽取(-1，1)符号 VEXTSGN3 抽取(-1，0，1)符号 VXORALL 异或全部元素的最低有效位。 Table E.6: MPEG profiles mnemonic illustrate VAAS3 plus and add(-1, 0, 1) signs VASS3 Plus and minus (-1, 0, 1) signs VEXTSGN2 Extract the (-1, 1) sign VEXTSGN3 Extract the (-1, 0, 1) symbol VXORALL XOR the least significant bits of all elements.

每种数据类型转换指令支持特定的数据类型，并且不受元素屏蔽的影响，因为此结构不支持寄存器中多于一种的数据类型。表E.7列出了数据类型转换指令。Each data type conversion instruction supports a specific data type and is not affected by element masking because this structure does not support more than one data type in a register. Table E.7 lists the data type conversion instructions.

表E.7：数据类型转换类助记符说明 VCVTIF 转换整数到浮点数 VCVTFF 转换浮点到定点数 VROUND 舍入浮点到整数(支持4个IEEE舍入模式) VCNTLZ 计数前导0 VCVTB9 转换字节9数据类型 Table E.7: Data type conversion classes mnemonic illustrate VCVTIF convert integer to float VCVTFF convert floating point to fixed point VROUND Round floating point to integer (supports 4 IEEE rounding modes) VCNTLZ count leading 0 VCVTB9 Convert byte 9 data type

内部元素算术类指令支持int8、int9、int16和int32和浮点数据类型。Intrinsic element-wise arithmetic instructions support int8, int9, int16 and int32 and floating-point data types.

表E.8列出了内部元素算术类指令。Table E.8 lists the internal element-wise arithmetic class instructions.

表E.8：内部元素算术类助记符说明 VADDH 两相邻元素加 VAVGH 两相邻元素平均 VAVGQ 四元素平均 VMAXE 最大交换奇/偶元素 Table E.8: Inner element arithmetic classes mnemonic illustrate VADDH add two adjacent elements VAVGH Average of two adjacent elements VAVGQ four-element average VMAXE Max Swap Odd/Even Elements

元素间转移类指令支持字节、字节9、半字和字数据长度，表E.9列出了元素间转移类指令。Inter-element transfer instructions support byte, byte 9, halfword, and word data lengths. Table E.9 lists the inter-element transfer instructions.

表E.9：元素间转移类助记符说明 VESL 元素左移一位 VESR 元素右移一位 VSHFL 偶/奇元素混洗 VSHFL 偶/奇元素混洗 VSHFLH 高偶/奇元素混洗 VSHFLL 低偶/奇元素混洗 VUNSHFL 偶/奇元素去混洗 VUNSHFLH 高偶/奇元素去混洗 VUNSHFLL 低偶/奇元素去混洗 Table E.9: Transfer classes between elements mnemonic illustrate VESL Element shifted one bit to the left VESR Shift the element to the right VSHFL Even/odd element shuffling VSHFL Even/odd element shuffling VSHFLH High even/odd element shuffle VSH FLL Low even/odd element shuffle VUNSHFL Even/odd element deshuffle VUNSHFLH High even/odd element deshuffle VUNSH FLL Low even/odd element deshuffle

加载/存储指令除支持字节、半字和字数据长度外还特别支持字节9有关的数据长度操作，并受元素屏蔽的影响。表E.10列出了加载/存储类指令。In addition to supporting byte, halfword, and word data lengths, load/store instructions also specifically support data length operations related to byte 9, and are affected by element masking. Table E.10 lists load/store class instructions.

表E10：加载/存储类助记符说明 VL 加载 VLD 加载双字 VLQ 加载四字 VLCB 从环形缓存器加载 VLR 逆元素序列加载 VLWS 跨距加载 VST 存储 VSTD 存储双字 VSTQ 存储四字 VSTCB 存储到环形缓存器 VSTR 逆元素序列存储 VSTWS 跨距存储 Table E10: Load/store classes mnemonic illustrate VL load VLD load dword QUR load quadword VLCB load from ring buffer VLR Reverse element sequence loading VLW span loading VST storage VSTD store double word VSTQ store quadword VSTCB store to ring buffer VSTR Inverse element sequence storage VSTWS stride store

大多数寄存器转移指令支持int8、int9、int16和int32和浮点数类型，并不受元素屏蔽的影响，只有VCMOVM指令是受元素屏蔽的影响。表E.11列出寄存器转移类指令。Most register transfer instructions support int8, int9, int16 and int32 and floating point types, and are not affected by element masking, only VCMOVM instructions are affected by element masking. Table E.11 lists register transfer instructions.

表E.11：寄存器转移类助记符说明 VLI 立即数加载 VMOV 转移 VCMOV 条件转移 VCMOVM 带元素屏蔽的条件转移 VEXTRT 抽取一元素 VINSERT 插入一元素 Table E.11: Register transfer classes mnemonic illustrate VLI immediate load VMOV transfer VCMOV conditional branch VCMOVM Conditional branch with element masking VEXTRT extract an element VINSERT insert an element

表E.12列出控制高速缓存子系统130的高速缓冲操作类指令。Table E.12 lists the cache operation class instructions that control the cache subsystem 130 .

表E.12：高速缓存操作类助记符说明 VCACHE 对数据或指令高速缓存的高速缓存操作 VPFTCH 预取到数据高速缓存 VWBACK 从数据高速缓存回写 Table E.12: Cache operation classes mnemonic illustrate VCACHE Cache Operations to Data or Instruction Cache VPFTCH prefetch to data cache VWBACK Write back from data cache

指令说明述语instruction statement

为简化指令集的说明，在整个附录中采用了专门的术语。例如，指令操作数是字节、字节9、半字或字长度的带符号2的补码的整型数，除非另行注释。术语“寄存器”用于指通用(标量或向量)寄存器，其它类型的寄存器被清楚地说明。按汇编语言句法，尾标b、b9、h和w表示数据长度(字节、字节9、半字和字)以及整型数据类型(int8、int9、int16和int2)。另外，用于描述指令操作数、操作、以及汇编语言句法的术语和符号如下。To simplify the description of the instruction set, terminology is used throughout the appendix. For example, instruction operands are signed 2's complement integers of byte, byte9, halfword, or word length, unless otherwise noted. The term "register" is used to refer to general purpose (scalar or vector) registers, other types of registers are explicitly stated. In assembly language syntax, the suffixes b, b9, h, and w indicate the data length (byte, byte9, halfword, and word) and the integer data type (int8, int9, int16, and int2). In addition, terms and symbols used to describe instruction operands, operations, and assembly language syntax are as follows.

Rd 目的寄存器(向量、标量或专用)Rd destination register (vector, scalar or special purpose)

Ra，Rb 源寄存器a和b(向量、标量或专用)Ra, Rb Source registers a and b (vector, scalar or dedicated)

Rc 源或目的寄存器c(向量或标量)Rc source or destination register c (vector or scalar)

Rs 存储数据源寄存器(向量或标量)Rs store data source register (vector or scalar)

S 32-bit标量或专用寄存器S 32-bit scalar or special purpose register

VR 当前组向量寄存器VR current group vector register

VRA 替代组向量寄存器VRA Alternative Group Vector Register

VR0 0组向量寄存器VR0 0 set of vector registers

VR1 1组向量寄存器VR1 1 set of vector registers

VRd 向量目的寄存器(缺省为当前组，除非VRA被指定)VRd Vector destination register (defaults to current bank unless VRA is specified)

VRa，VRb 向量源寄存器a和bVRa, VRb Vector source registers a and b

VRC 向量源或目的寄存器CVRC Vector source or destination register C

VRS 向量存储数据源寄存器VRS Vector storage data source register

VAC0H 向量累加器寄存器0高VAC0H Vector Accumulator Register 0 High

VAC0L 向量累加器寄存器0低VAC0L Vector Accumulator Register 0 Low

VAC1H 向量累加器寄存器1高VAC1H Vector Accumulator Register 1 High

VAC1L 向量累加器寄存器1低VAC1L Vector Accumulator Register 1 Low

SRd 标量目的寄存器SRd Scalar destination register

SRa，SRb 标量源寄存器a和bSRa, SRb Scalar source registers a and b

SRb+ 以有效地址更新基址寄存器SRb+ Update base register with valid address

SRs 标量存储数据源寄存器SRs Scalar store data source registers

SP 专用寄存器SP Special purpose register

VR[i] 向量寄存器VR中的第i个元素VR[i] The ith element in the vector register VR

VR[i]<a：b> 向量寄存器VR中第i个元素的a到b位VR[i]<a:b> bit a to b of the i-th element in the vector register VR

VR[i]<msb> 向量寄存器VR中第i元素的最高有效位VR[i]<msb> The most significant bit of the i-th element in the vector register VR

EA 存储器访问的有效地址EA Effective address for memory access

MEM 存储器MEM memory

BYTE[EA] 存储器地址EA中的一个字节BYTE[EA] A byte in memory address EA

HALF[EA] 存储器地址EA中的半个字，地址EA+1为位<15：8>。HALF[EA] Half word in memory address EA, address EA+1 is bit <15:8>.

WORD[EA] 存储器地址EA中的一个字，地址EA+3为位<31：24>。WORD[EA] A word in memory address EA, address EA+3 is bit <31:24>.

NumElem 为给出数据类型而指明的元素数目。在VEC32模式，对字NumElem The number of elements specified for a given data type. In VEC32 mode, the word

节和字节9、半字或字数据长度分别为32、16或8；在Byte and byte 9, halfword or word data lengths of 32, 16 or 8 respectively; in

VEC64模式，对字节和字节9、半字或字数据长度分别为In VEC64 mode, the data lengths of byte and byte 9, halfword or word are respectively

64、32或16。对标量操作NumElem是0。64, 32 or 16. NumElem is 0 for scalar operations.

EMASK[i] 表示对第i元素的元素屏蔽。对字节和字节9、半字或字数EMASK[i] Indicates the element masking of the i-th element. For byte and byte9, halfword or word count

据长度，在VGMR0/1、～VGMR0/1、VMMR0/1或～ According to the length, in VGMR0/1, ~VGMR0/1, VMMR0/1 or ~

VMMR0/1中分别代表1、2或4个位。为标量操作，即使Represent 1, 2 or 4 bits in VMMR0/1 respectively. is a scalar operation, even if

EMASK[i]＝0，也认为元素屏蔽被设置。EMASK[i]=0 also considers element masking to be set.

MMASK[i] 表示对第i元素的元素屏蔽。在字节和字节9、半字或字数MMASK[i] Indicates the element masking of the i-th element. in bytes and byte9, halfword or word count

据长度，在VMMR0或VMMR1中分别代表1、2或4个According to the length, it represents 1, 2 or 4 in VMMR0 or VMMR1

位。bit.

VCSR 向量控制和状态寄存器VCSR Vector Control and Status Register

VCSR<x> 表示VCSR中一个位或多个位。“x”是字段名VCSR<x> Indicates one or more bits in VCSR. "x" is the field name

VPC 向量处理器程序计数器VPC Vector Processor Program Counter

VECSIZE 向量寄存器长度，在VEC32模式是32，在VEC64模式是VECSIZE vector register length, in VEC32 mode is 32, in VEC64 mode is

64。64.

SPAD 暂存器SPAD scratchpad

C编程结构用于描述流控操作。异常部分注明如下：C programming constructs are used to describe flow control operations. The exception section is noted as follows:

＝赋值= Assignment

连接 connect

{X‖Y} X或Y之间选择(不是逻辑或){X‖Y} Choose between X or Y (not logical OR)

sex 对指定的数据长度的符号扩展sex Sign extension to the specified data length

sex_dp 对指定的数据长度双精度数符号扩展sex_dp Extends the sign of a double-precision number with a specified data length

sign》 (算术)右移符号扩展sign (arithmetic) right shift sign extension

zex 对指定的数据长度的零扩展zex Zero extension to the specified data length

zero》 (逻辑)右移零扩展zero " (logical) shift right zero-extend

《左移(填入零)《 Shift left (fill with zeros)

trnc7 截去前面的7位(从半字)trnc7 truncate the first 7 bits (from halfword)

trnc1 截去前面的1位(从字节9)trnc1 Truncate the first 1 bit (from byte 9)

％取模操作% Modulo operation

|expression| 取表达式的绝对值|expression| Take the absolute value of the expression

/ 除(对于浮点数据类型采用四种IEEE舍入模式之一)/ Divide (using one of four IEEE rounding modes for floating-point data types)

// 除(采用零舍入模式的舍入)// Division (rounding with zero rounding mode)

Saturate() 对整数类型饱和到最大负值或最大正值，不产生溢出；对Saturate() For integer types saturated to the maximum negative value or maximum positive value, no overflow will be generated; for

于浮点数据类型，饱和可到正无穷大、正零、负零、或负For floating-point data types, saturation can be to positive infinity, positive zero, negative zero, or negative

或无穷大。or infinity.

通用指令格式显示在图8中并说明如下。The general command format is shown in Figure 8 and explained below.

REAR格式由load、store和cache操作指令使用，而REAR格式中的字段具 The REAR format is used by the load, store, and cache operation instructions, and the fields in the REAR format have

有下面表E.13给出的意义。Has the meaning given in Table E.13 below.

表E.13：REAR格式字段意义 OPC<4：0> 操作码 B Rn寄存器的组标识符 D 目的/源标量寄存器。当设置时，Rn<4：0>指出标量寄存器。在VEC32模式下，对B：D编码的合法值是：00 Rn是当前组的向量寄存器01 Rn是标量寄存器(在当前组中)10 Rn是在替代组中的向量寄存器11 未定义在VEC64模式下，对B：D编码的合法值是：00在向量寄存器Rn中只有4、8、16或32个字节被使用01 Rn是标量寄存器10 向量寄存器Rn的全部64个字节被使用11 未定义 TT<1：0> 传送类型，指示具体的加载或存储操作。见下面的LT和ST编码表。 C 高速缓存关闭。设置该位以旁路加载时的数据高速缓存。这个位用加载和存储指令的cache-off助记符设置(连接OFF到助记符) A 地址更新，设置此位用有效地址更新SRb。有效地址按SRb+SRi计算。 Rn<4：0> 目的/源寄存器号 SRb<4：0> 标量基址寄存器号 SRi<4：0> 标量变址寄存器号 Table E.13: REAR format field significance OPC<4:0> opcode B Bank identifier for the Rn register D. Destination/source scalar register. When set, Rn<4:0> indicates a scalar register. In VEC32 mode, legal values for encoding B:D are: 00 Rn is a vector register in the current bank 01 Rn is a scalar register (in the current bank) 10 Rn is a vector register in the alternate bank 11 undefined In VEC64 mode, legal values for encoding B:D are: 00 Only 4, 8, 16 or 32 bytes are used in vector register Rn 01 Rn is a scalar register 10 All 64 bytes of vector register Rn are used use 11 undefined TT<1:0> The transfer type, indicating a specific load or store operation. See LT and ST code table below. C Cache is off. Set this bit to bypass the data cache on load. This bit is set with the cache-off mnemonic for load and store instructions (link OFF to mnemonic) A Address update, setting this bit updates SRb with valid address. The effective address is calculated by SRb+SRi. Rn<4:0> destination/source register number SRb<4:0> Scalar base register number SRi<4:0> Scalar index register number

位17：15被保留且应为零，以确保结构将来要扩展的兼容性。B：D和TT字段的某些编码未定义，编程者不应使用这些编码，因为结构没有规定当这样一种编码被使用的预期结果。表E.14示出VEC32和VEC64模式都支持的标量加载操作(在TT字段作为LT被编码)。Bits 17:15 are reserved and should be zero to ensure compatibility for future extensions of the structure. B: Certain encodings for the D and TT fields are undefined and programmers should not use these encodings because the structure does not specify the expected results when such an encoding is used. Table E.14 shows the scalar load operations (encoded as LT in the TT field) supported by both VEC32 and VEC64 modes.

表E.14 在VEC32和VEC64模式下REAR加载操作 D：LT 助记符意义 100 .bs9 加载8位成为字节9长度，符号扩展 101 .h 加载16位成为半字长度 110 .bz9 加载8位成为字节9长度，零扩展 111 .w 加载32位成为字长度 Table E.14 REAR load operation in VEC32 and VEC64 modes D: LT mnemonic significance 100 .bs9 Load 8 bits into byte 9 length, sign extend 101 .h Load 16 bits into a halfword length 110 .bz9 Load 8 bits into byte 9 length, zero-extended 111 .w Load 32 bits into word length

表E.15显示VEC32模式支持的向量加载操作(在TT字段作为LT被编码)，这时VCSR<0>位被清除。Table E.15 shows the vector load operations supported by VEC32 mode (encoded as LT in the TT field), when the VCSR<0> bit is cleared.

表E.15：VEC32模式下REAR加载操作 D：LT 助记符意义 000 .4 从存储器加载4个字节进入寄存器较低的4个字节9，并保持剩下的字节9不改变。4个字节9的每个第9位根据相应的第8位作符号扩展。 001 .8 从存储器加载8个字节进入寄存器较低的8个字节9，并保持剩下的字节9不改变。8个字节9的每个第9位根据相应的第8位作符号扩展。 010 .16 从存储器加载16个字节进入寄存器较低的16个字节9，并保持剩下的字节9不改变。16个字节9的每个第9位根据相应的第8位作符号扩展。 011 .32 从存储器加载32个字节进入寄存器较低的32个字节9，并保持剩下的字节9不改变。32个字节9的每个第9位根据相应的第8位作符号扩展。 Table E.15: REAR loading operation in VEC32 mode D: LT mnemonic significance 000 .4 Load 4 bytes from memory into the lower 4 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of the 4 bytes 9 is sign-extended according to the corresponding 8th bit. 001 .8 Load 8 bytes from memory into the lower 8 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of the 8 bytes 9 is sign-extended according to the corresponding 8th bit. 010 .16 Load 16 bytes from memory into the lower 16 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of the 16 bytes 9 is sign-extended according to the corresponding 8th bit. 011 .32 Load 32 bytes from memory into the lower 32 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of the 32 bytes 9 is sign-extended according to the corresponding 8th bit.

B位用于指示当前或替代组。The B bit is used to indicate the current or alternate group.

表E.16给出VEC64模式下支持的向量加载操作(在TT字段作为LT被编码)。此时VCSR<0>位被设置。Table E.16 gives the supported vector load operations in VEC64 mode (encoded as LT in the TT field). At this time the VCSR<0> bit is set.

表E.16：VEC32模式下REAR的加载操作 B：D：LT 助记符意义 0000 .4 从存储器加载4个字节进入寄存器较低的4个字节9，并保持剩下的字节9不改变。4个字节9的每个第9位，根据相应的第8位作符号扩展。 0001 .8 从存储器加载8个字节进入寄存器较低的8个字节9，并保持剩下的字节9不改变。8个字节9的每个第9位根据相应的第8位作符号扩展。 0010 .16 从存储器加载16个字节进入寄存器较低的16个字节9，并保持剩下的字节9不改变。16个字节9的每个第9位根据相应的第8位作符号扩展。 B：D：LT 助记符意义 0011 .32 从存储器加载32个字节进入寄存器较低的32个字节9，并保持剩下的字节9不改变。32个字节9的每个第9位根据相应的第8位作符号扩展。 1000 未定义 1001 未定义 1010 未定义 1011 .64 从存储器加载64个字节进入寄存器较低的64个字节9，并保持剩下的字节9不改变。64个字节9的每个第9位根据相应的第8位作符号扩展。 Table E.16: Load operation of REAR in VEC32 mode B:D:LT mnemonic significance 0000 .4 Load 4 bytes from memory into the lower 4 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of the 4 bytes 9 is sign-extended according to the corresponding 8th bit. 0001 .8 Load 8 bytes from memory into the lower 8 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of the 8 bytes 9 is sign-extended according to the corresponding 8th bit. 0010 .16 Load 16 bytes from memory into the lower 16 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of the 16 bytes 9 is sign-extended according to the corresponding 8th bit. B:D:LT mnemonic significance 0011 .32 Load 32 bytes from memory into the lower 32 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of the 32 bytes 9 is sign-extended according to the corresponding 8th bit. 1000 undefined 1001 undefined 1010 undefined 1011 .64 Load 64 bytes from memory into the lower 64 byte 9 of the register and leave the remaining byte 9 unchanged. Each 9th bit of 64 bytes 9 is sign-extended according to the corresponding 8th bit.

位B用于指示64字节向量操作，因为在VEC64模式中不存在当有组和替代组的概念。Bit B is used to indicate a 64-byte vector operation, since in VEC64 mode there is no concept of when there are groups and alternative groups.

表E.17列出了VEC32和VEC64模式均支持的标量存储操作(在TT字段作为ST被编码)。Table E.17 lists the scalar store operations (encoded as ST in the TT field) supported in both VEC32 and VEC64 modes.

表E.17：REAR的标量存储操作 D：ST 助记符意义 100 .b 存储字节或字节9长度成为8位(从字节9截断1位) 101 .h 存储半字长度成为16位 110 未定义 111 .w 存储字长度成为32位 Table E.17: Scalar storage operations for REAR D: ST mnemonic significance 100 .b Store byte or byte 9 length into 8 bits (1 bit truncated from byte 9) 101 .h Storage halfword length becomes 16 bits 110 undefined 111 .w The storage word length becomes 32 bits

表E.18列出VEC32模式支持的向量存储操作(在TT字段作为ST被编码)，此时VCSR<0>位被清除。Table E.18 lists the vector storage operations supported by VEC32 mode (coded as ST in the TT field), and the VCSR<0> bit is cleared at this time.

表E18：VEC32模式下REAR的向量存储操作 D：ST 助记符意义 000 .4 从寄存器存储4个字节到存储器，寄存器中4个字节9的每个第9位被忽略。 001 .8 从寄存器存储8个字节到存储器，寄存器中8个字节9的每个第9位被忽略。 010 .1b 从寄存器存储16个字节到存储器，寄存器中16个字节9的每个第9位被忽略。 011 .32 从寄存器存储32个字节到存储器，寄存器中32个字节9的每个第9位被忽略。 Table E18: Vector storage operations of REAR in VEC32 mode D: ST mnemonic significance 000 .4 To store 4 bytes from a register to memory, every 9th bit of the 4 bytes 9 in the register is ignored. 001 .8 To store 8 bytes from a register to memory, every 9th bit of the 8 bytes 9 in the register is ignored. 010 .1b To store 16 bytes from a register to memory, every 9th bit of the 16 bytes 9 in the register is ignored. 011 .32 To store 32 bytes from a register to memory, every 9th bit of the 32 bytes 9 in the register is ignored.

表E.19列出VEC64模式支持的向量存储操作(在TT字段作为ST被编码)，这时VCSR<0>位被设置。Table E.19 lists the vector storage operations supported by VEC64 mode (encoded as ST in the TT field), when the VCSR<0> bit is set.

表E.19：在VEC32模式下REAR向量存储操作 B：D：ST 助记符意义 0000 .4 从寄存器存储4个字节到存储器，寄存器中4个字节9的每个第9位被忽略。 0001 .8 从寄存器存储8个字节到存储器，寄存器中8个字节9的每个第9位被忽略。 0010 .16 从寄存器存储16个字节到存储器，寄存器中16个字节9的每个第9位被忽略。 0011 .32 从寄存器存储32个字节到存储器，寄存器中32个字节9的每个第9位被忽略。 1000 未定义 1001 未定义 1010 未定义 1011 .64 从寄存器存储64个字节到存储器，寄存器中64个字节9的每个第9位被忽略。 Table E.19: REAR vector storage operations in VEC32 mode B: D: ST mnemonic significance 0000 .4 To store 4 bytes from a register to memory, every 9th bit of the 4 bytes 9 in the register is ignored. 0001 .8 To store 8 bytes from a register to memory, every 9th bit of the 8 bytes 9 in the register is ignored. 0010 .16 To store 16 bytes from a register to memory, every 9th bit of the 16 bytes 9 in the register is ignored. 0011 .32 To store 32 bytes from a register to memory, every 9th bit of the 32 bytes 9 in the register is ignored. 1000 undefined 1001 undefined 1010 undefined 1011 .64 To store 64 bytes from a register to memory, every 9th bit of the 64 bytes 9 in the register is ignored.

位B用于指出64字节向量操作，因为在VEC64模式中不存在当前组和替代组的概念。Bit B is used to indicate 64-byte vector operations, since in VEC64 mode there is no concept of current and alternate groups.

REAI格式由加载、存储和高速缓存操作指令使用，表E.20显示了REAI格式下各字段的意义。 The REAI format is used by load, store, and cache operation instructions. Table E.20 shows the meaning of each field in the REAI format.

表E.20：REAI格式字段意义 OPC<4：0> 操作码 B Rn寄存器的组标识符。当在VEC32模式下设置时，Rn<4：0>表示在替代组中向量寄存器号；当在VEC64模式下设置时，表示全向量(64字节)操作。 D 目的/源标量寄存器。当设置时，Rn<4：0>表示一标量寄存器。在VEC32模式下B：D编码的合法值是： 00 Rn是当前组的向量寄存器01 Rn是标量寄存器(在当前组中)10 Rn是替代组的向量寄存器11 未定义在VEC64模式下B：D编码的合法值是：00 仅在向量寄存器Rn中的4、8、16或32个字节被使用01 Rn是标量寄存器10 向量寄存器Rn中整个64个字节被使用11 未定义 TT<1：0> 传送类型，表示具体的加载或存储操作。见下面的LT和ST编码表。 C 高速缓存关闭，设置该位以旁路加载时的数据高速缓存。这个位用加载和存储指令的Cache-off助记符设置(连接OFF到助记符)。 A 地址更新，设置此位用有效地址更新SRb。有效地址按SRb+IM<7：0>计算。 Rn<4：0> 目的/源寄存器号 SRb<4：0> 标量基址寄存器号 IMM<7：0> 一个8位的立即数偏移量，按二的补码数字说明。 Table E.20: REAI format field significance OPC<4:0> opcode B Bank identifier for the Rn register. When set in VEC32 mode, Rn<4:0> indicates the vector register number in the alternate group; when set in VEC64 mode, it indicates full vector (64 bytes) operation. D. Destination/source scalar register. When set, Rn<4:0> indicates a scalar register. Legal values for B:D encoding in VEC32 mode are: 00 Rn is the vector register for the current bank 01 Rn is the scalar register (in the current bank) 10 Rn is the vector register for the alternate bank 11 undefined Legal values for B:D encoding in VEC64 mode are: 00 in vector register Rn only 4, 8, 16 or 32 bytes of Rn are used 01 Rn is a scalar register 10 The entire 64 bytes in vector register Rn are used 11 Undefined TT<1:0> Transfer type, which represents a specific load or store operation. See LT and ST code table below. C Cache off, set this bit to bypass the data cache on load. This bit is set with the Cache-off mnemonic for load and store instructions (link OFF to mnemonic). A Address update, setting this bit updates SRb with valid address. The effective address is calculated by SRb+IM<7:0>. Rn<4:0> destination/source register number SRb<4:0> Scalar base register number IMM<7:0> An 8-bit immediate offset, specified as a two's complement number.

REAR和REAI格式用于传送类型的相同编码。对进一步的编码细节参看REAR格式。The REAR and REAI formats are used for the same encoding of the transport type. See the REAR format for further encoding details.

RRRM5格式提供了三个寄存器或二个寄存器及一5位的立即操作数。表E.21定义了RRRM5格式的字段。The RRRM5 format provides three registers or two registers and a 5-bit immediate operand. Table E.21 defines the fields of the RRRM5 format.

表E.21：RRRM5格式字段意义 OP<4：0> 操作码 D 目的标量寄存器。当设置时，Rd<4：1>指示标量寄存器；当清除时，Rd<4：0>指示向量寄存器。 S 标量Rb寄存器。当设置时指出Rb<4：0>是标量寄存器；当清除时，Rb<4：0>是向量寄存器。 SD<1：0> 数据宽度，其编码为：00字节(用于int8数据类型)01字节9(用于int9数据类型)10半字(用于int16数据类型)11字(用于int2或浮点数据类型) M D：S位的修改符，见后面D：S：M编码表。 Rd<4：0> 目标D寄存器号 Ra<4：0> 源A寄存器号 Rb<4：0>或IM5<4：0> 源B寄存器号或5位的立即数，取决于D：S：M编码，5位的立即数看成无符号数。 Table E.21: RRRM5 format field significance OP<4:0> opcode D. Destination scalar register. When set, Rd<4:1> indicate a scalar register; when cleared, Rd<4:0> indicate a vector register. S Scalar Rb register. When set, indicates that Rb<4:0> is a scalar register; when cleared, Rb<4:0> is a vector register. SD<1:0> Data width, which is coded as: 00 bytes (for int8 data type) 01 bytes 9 (for int9 data type) 10 halfwords (for int16 data type) 11 words (for int2 or floating point data type) m D: Modifier of the S bit, see the following D: S: M coding table. Rd<4:0> Target D register number Ra<4:0> Source A register number Rb<4:0> or IM5<4:0> The source B register number or 5-bit immediate value depends on the D:S:M encoding, and the 5-bit immediate value is regarded as an unsigned number.

位19：15保留并且必须为零，以确保将来要扩展的兼容性。Bits 19:15 are reserved and must be zero to ensure compatibility for future extensions.

全部向量寄存器的操作数指的是当前组(既可是0组也可是1组)除非另做陈述。表E.22列出当DS<1：0>是00、01或10时的D：S：M编码。All vector register operands refer to the current bank (either bank 0 or bank 1) unless otherwise stated. Table E.22 lists the D:S:M encoding when DS<1:0> is 00, 01 or 10.

表E22：DS不等于11时RRRM5的D：S：M编码编码 Rd Ra Rb/IM5 注释 000 VRd VRa VRb 三个向量寄存器操作数 001 未定义 010 VRd VRa SRb B操作数是标量寄存器 011 VRd VRa IM5 B操作数是5位的立即数 100 未定义 101 未定义 110 SRd SRa SRb 三个标量寄存器操作数 111 SRd SRa IM5 B操作数是5位的立即数 Table E22: D:S:M encoding of RRRM5 when DS is not equal to 11 coding Rd Ra Rb/IM5 note 000 VRd VRa VRb Three vector register operands 001 undefined 010 VRd VRa SRb Operand B is a scalar register 011 VRd VRa IM5 Operand B is a 5-bit immediate value 100 undefined 101 undefined 110 SRd SRa SRb three scalar register operands 111 SRd SRa IM5 Operand B is a 5-bit immediate value

当DS<1：0>是11时D：S：M编码具有下面的意义：When DS<1:0> is 11, the D:S:M code has the following meanings:

表E.23：DS等于11时，RRRM5的D：S：M编码 D：S：M Rd Ra Rb/IM5 注释 000 VRd VRa VRb 三个向量寄存器操作数(int32) 001 VRd VRa VRb 三个向量寄存器操作数(float) 010 VRd VRa SRb B操作数是一标量寄存器(int32) 011 VRd VRa IM5 B操作数是5位的立即数(int32) 100 VRd VRa SRb B操作数是一标量寄存器(float) 101 SRb SRa SRb 三个标量寄存器操作数(float) 110 SRd SRa SRb 三个标量寄存器操作数(int32) 111 SRd SRa IM5 B操作数是5位的立即数(int32) Table E.23: D:S:M encoding for RRRM5 when DS is equal to 11 D:S:M Rd Ra Rb/IM5 note 000 VRd VRa VRb Three vector register operands (int32) 001 VRd VRa VRb Three vector register operands (float) 010 VRd VRa SRb Operand B is a scalar register (int32) 011 VRd VRa IM5 The B operand is a 5-bit immediate value (int32) 100 VRd VRa SRb Operand B is a scalar register (float) 101 SRb SRa SRb Three scalar register operands (float) 110 SRd SRa SRb Three scalar register operands (int32) 111 SRd SRa IM5 The B operand is a 5-bit immediate value (int32)

RRRR格式提供四种寄存器操作数，表E.24显示了RRRR格式下的字段。 The RRRR format provides four register operands, and Table E.24 shows the fields in the RRRR format.

表E.24：RRRR格式字段意义 Op<4：0> 操作码 S 标量Rb寄存器。当设置时指出Rb<4：0>是一标量寄存器；当清除时，Rb<4：0>是一向量寄存器。 DS<1：0> 数据长度，其编码为：00字节(用于int8数据类型)01字节9(用于int9数据类型)10半字(用于int16数据类型)11字(用于int32数据类型) Rc<4：0> 源/目的C寄存器号 Rd<4：0> 目的D寄存器号 Ra<4：0> 源A寄存器号 Rb<4：0> 源B寄存器号 Table E.24: RRRR format field significance Op<4:0> opcode S Scalar Rb register. When set, it indicates that Rb<4:0> is a scalar register; when cleared, Rb<4:0> is a vector register. DS<1:0> Data length, its encoding is: 00 bytes (for int8 data type) 01 bytes 9 (for int9 data type) 10 half-words (for int16 data type) 11 words (for int32 data type) Rc<4:0> source/destination C register number Rd<4:0> Destination D register number Ra<4:0> Source A register number Rb<4:0> Source B register number

全部向量寄存器操作数指的是当前组(既可以是0组也可以是1组)，除非另做陈述。All vector register operands refer to the current bank (either bank 0 or bank 1), unless otherwise stated.

RI格式仅由加载立即数指令使用。表E.25指明RI格式下的字段。 The RI format is only used by load immediate instructions. Table E.25 specifies the fields in the RI format.

表E.25：RI格式字段意义 D 目的标量寄存器。当设置时，Rd<4：0>表示一标量寄存器；当清除时，Rd<4：0>表示当前组中一向量寄存器。 F 浮点数据类型。当设置时表明为浮点数据类型，并且要求DS<1：0>为11。 DS<1：0> 数据长度，其编码为：00字节(用于int8数据类型)01字节9(用于intt9数据类型)10半字(用于int16数据类型) 11字(用于int32或浮点数据类型) Rd<4：0> 目的D寄存器号 IMM<18：0> 一19位的立即数值 Table E.25: RI format field significance D. Destination scalar register. When set, Rd<4:0> indicates a scalar register; when cleared, Rd<4:0> indicates a vector register in the current bank. f Floating point data type. When set, it indicates that it is a floating point data type, and requires DS<1:0> to be 11. DS<1:0> Data length, its encoding is: 00 bytes (for int8 data type) 01 bytes 9 (for intt9 data type) 10 half words (for int16 data type) 11 words (for int32 or float data types) Rd<4:0> Destination D register number IMM<18:0> A 19-bit immediate value

字段F：DS<1：0>的某些编码未定义。编程者应不用这些编码，因为此结构没有给定当使用这种编码时的预期后果。加载进入Rd的值取决于数据的类型，如表E.26所示。Field F: Some encodings of DS<1:0> are undefined. Programmers should not use these encodings, because this structure does not give expected results when using such encodings. The value loaded into Rd depends on the type of data, as shown in Table E.26.

表E.26：RI格式下的加载值格式数据类型寄存器操作数 .b 字节(8位) Rd<7：0>：＝Imm<7：0> .b9 字节(9位) Rd<8：0>：＝Imm<8：0> .h 半字(16位) Rd<15：0>：＝Imm<15：0> .w 字(32位) Rd<31：0>：＝符号扩展的IMM<18：0> .f 浮点(32位) Rd<31>：＝Imm<18>(符号)Rd<30：23>：＝Imm<17：0>(指数)Rd<22：13>：＝Imm<9：0>(尾数)Rd<12：0>：＝0 Table E.26: Load values in RI format Format type of data register operand .b byte (8 bits) Rd<7:0>:=Imm<7:0> .b9 byte (9 bits) Rd<8:0>:=Imm<8:0> .h halfword (16 bits) Rd<15:0>:=Imm<15:0> .w word (32 bits) Rd<31:0>: = sign-extended IMM<18:0> .f floating point (32 bit) Rd<31>:=Imm<18> (symbol) Rd<30:23>:=Imm<17:0> (exponent) Rd<22:13>:=Imm<9:0> (mantissa) Rd<12 :0>:=0

CT格式包含的字段示于表E.27。The fields contained in the CT format are shown in Table E.27.

表E.27：CT格式字段意义 Opc<3：0> 操作码 Cond<2：0> 转移条件：000无条件001小于010等于011小于或等于100大于101不等于110大于或等于111溢出 IMM<22：0> 23位的立即数字偏移量，按二的补码数说明。 Table E.27: CT format field significance Opc<3:0> opcode Cond<2:0> Transfer condition: 000 unconditional 001 less than 010 equal to 011 less than or equal to 100 greater than 101 not equal to 110 greater than or equal to 111 overflow IMM<22:0> 23-bit immediate numeric offset, specified as a two's complement number.

转移条件使用VCSR[GT：EQ：LT]字段。溢出条件使用VCSR[SO]位，当设置时，它优先于GT、EQ和LT位。VCCS和VCBARR则以不同于以上所说的来解释Cond<2：0>字段，参考它的指令说明细节。The transition condition uses the VCSR[GT:EQ:LT] field. The overflow condition uses the VCSR[SO] bit, which takes precedence over the GT, EQ, and LT bits when set. VCCS and VCBARR interpret the Cond<2:0> field differently from the above, please refer to its instruction description for details.

RRRM9格式指明三个寄存器或二个寄存器及一9位的立即操作数。表E.28给出RRRM9格式的字段。 The RRRM9 format specifies three registers or two registers and a 9-bit immediate operand. Table E.28 gives the fields of the RRRM9 format.

表E.28：RRRM9格式字段意义 Opc<5：0> 操作码 D 目的标量寄存器。当设置时，Rd<4：0>表示一标量寄存器；当清除时，Rd<4：0>表示一向量寄存器。 S 标量Rb寄存器。当设置时指示Rb<4：0>是一标量寄存器；当清除时，Rb<4：0>是一向量寄存器。 DS<1：0> 数据宽度，其编码为：00字节(用于int8数据类型)01字节9(用于int9数据类型)10半字(用于int16数据类型) 11字(用于int32或浮点数据类型) M 对D：S位的修改符，见后面的D：S：M编码表。 Rd<4：0> 目的寄存器号 Ra<4：0> 源A寄存器号 Rb<4：0>或IM5<4：0> 源B寄存器号或一5位立即数，取决于D：S：M编码。 IM9<3：0> 与IM5<4：0>一起提供一9位立即数，取决于D：S：M编码。 Table E.28: RRRM9 format field significance Opc<5:0> opcode D. Destination scalar register. When set, Rd<4:0> indicates a scalar register; when cleared, Rd<4:0> indicates a vector register. S Scalar Rb register. When set, indicates that Rb<4:0> is a scalar register; when cleared, Rb<4:0> is a vector register. DS<1:0> Data width, which is coded as: 00 bytes (for int8 data type) 01 bytes 9 (for int9 data type) 10 halfwords (for int16 data type) 11 words (for int32 or float data types) m For the modifier of the D:S bit, see the following D:S:M encoding table. Rd<4:0> destination register number Ra<4:0> Source A register number Rb<4:0> or IM5<4:0> Source B register number or a 5-bit immediate value, depending on D:S:M encoding. IM9<3:0> Together with IM5<4:0>, a 9-bit immediate value is provided, depending on the D:S:M encoding.

位19：15位被保留，当D：S：M编码没有指定一立即操作数，而且必须为0以确保将来的兼容性。Bits 19:15 are reserved when the D:S:M encoding does not specify an immediate operand, and must be 0 to ensure future compatibility.

全部向量寄存器操作数指的是当前组(既可是0组也可以是1组)除非另做陈述。D：S：M编码同RRRM5格式的表E.22和E.23所示的那些是相同的，除了根据DS<1：0>编码从立即数字段抽取的立即数外，见表E.29所示。All vector register operands refer to the current bank (either bank 0 or bank 1) unless otherwise stated. The D:S:M encodings are the same as those shown in Tables E.22 and E.23 of the RRRM5 format, except that the immediate value is extracted from the immediate field according to the DS<1:0> encoding, see Table E.29 shown.

表E.29：RRRM9格式的立即数值 DS 匹配的数据类型 B操作数 00 int8 Source B<7：0>：＝IM9<2：0>：IM5<4：0> 01 int9 Source B<8：0>：＝IM9<3：0>：IM5<4：0> 10 int16 Source B<15：0>：＝sex(IM9<3：0>：IM5<4：0>) 11 int32 Source B<31：0>：＝sex(IM9<3：0>：IM5<4：0>) Table E.29: Immediate values in RRRM9 format DS matching data type B operand 00 int8 Source B<7:0>:=IM9<2:0>:IM5<4:0> 01 int9 Source B<8:0>:=IM9<3:0>:IM5<4:0> 10 int16 Source B<15:0>:=sex(IM9<3:0>:IM5<4:0>) 11 int32 Source B<31:0>:=sex(IM9<3:0>:IM5<4:0>)

浮点数据类型不能得到立即数格式。Immediate format cannot be obtained for floating-point data types.

下面是按字母数字排列的MSP向量指令。注意：Below are the MSP vector instructions in alphanumeric order. Notice:

1.除非另外注明，指令是受元素屏蔽的影响。CT格式指令不受元素屏蔽影响。由加载、存储和高速缓存指令组成的REAR和REAI格式指令也不受元素屏蔽影响。1. Unless otherwise noted, commands are subject to elemental masking. CT format instructions are not affected by element masking. REAR and REAI format instructions consisting of load, store, and cache instructions are also not affected by element masking.

2.浮点数据类型下不能得到9位的立即操作数。2. The 9-bit immediate operand cannot be obtained under the floating-point data type.

3.在操作说明中只给出了向量形式。对标量操作，假定只有一个，即第0个元素被定义。3. Only the vector form is given in the operating instructions. For scalar operations, only one, the 0th element, is assumed to be defined.

4.对RRRM5和RRRM9格式，下面的编码用于整型数据类型(b、b9、h、w)： D：S：M 000 010 011 110 111 DS 00 01 10 11 4. For RRRM5 and RRRM9 formats, the following encodings are used for integer data types (b, b9, h, w): D:S:M 000 010 011 110 111 DS 00 01 10 11

5.对RRRM5和RRRM9格式，下面的编码用于浮点数据类型： D：S：M 001 100 n/a 101 n/a DS 11 5. For RRRM5 and RRRM9 formats, the following encodings are used for floating-point data types: D:S:M 001 100 n/a 101 n/a DS 11

6.对于全部可能引起溢出的指令，当VCSR<ISAT>位被设置时，饱和到int8、int9、int16、int32的最大或最小限制被采用。相应地，当VCSR<FSAT>位设置时，浮点结果饱和到-无穷大、-0、+0或+无穷大。6. For all instructions that may cause overflow, when the VCSR<ISAT> bit is set, the maximum or minimum limit of saturation to int8, int9, int16, int32 is adopted. Correspondingly, floating-point results saturate to -infinity, -0, +0, or +infinity when the VCSR<FSAT> bit is set.

7.按句法规则，.n可以用来代替.b9以表示字节9数据长度。7. According to the syntax rules, .n can be used instead of .b9 to indicate the data length of byte 9.

8.对全部指令，返回到目的寄存器或到向量累加器的浮点结果是IEEE754单精度格式。浮点结果写到累加器的较低部分，高位部分不改变。8. For all instructions, the floating point result returned to the destination register or to the vector accumulator is in IEEE754 single precision format. The floating-point result is written to the lower part of the accumulator, and the upper part is unchanged.

VAAS3 加和附加(-1，0，1)符号VAAS3 Add and append (-1, 0, 1) sign

格式Format

汇编器句法assembler syntax

VAAS3.dt VRd，VRa，VRbVAAS3.dt VRd, VRa, VRb

VAAS3.dt VRd，VRa，SRbVAAS3.dt VRd, VRa, SRb

VAAS3.dt SRd，SRa，SRbVAAS3.dt SRd, SRa, SRb

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

支持的模式 D：S：M V＜-V@V V＜-V@S S＜-S@S DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M V<-V@V V<-V@S S<-S@S DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

向量/标量寄存器Ra的内容被加到Rb以产生一中间结果，该中间结果再附加上Ra的符号；并且最终结果存储在向量/标量寄存器Rd中。The contents of vector/scalar register Ra are added to Rb to produce an intermediate result which is then appended with the sign of Ra; and the final result is stored in vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

if(Ra[i]＞0) extsgn3＝1；if(Ra[i]＞0) extsgn3=1;

else if(Ra[i]＜0) extsgn3＝-1；else if(Ra[i]<0) extsgn3=-1;

else extsgn3＝0；else extsgn3=0;

Rd[i]＝Ra[i]+Rb[i]+extsgn3；Rd[i]=Ra[i]+Rb[i]+extsgn3;

}}

异常abnormal

溢出。overflow.

VADAC 加和累加VADAC add and accumulate

格式Format

汇编器句法assembler syntax

VADAC.dt VRc，VRd，VRa，VRbVADAC.dt VRc, VRd, VRa, VRb

VADAC.dt SRc，SRd，SRa，SRbVADAC.dt SRc, SRd, SRa, SRb

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

支持的模式 S VR SR DS int8(b) int9(b9) int16(h) int32(w) supported modes S VR SR DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

将Ra和Rb操作数的每个元素与向量累加器的每个双精度元素相加，将每个元素的双精度和存储到向量累加器和目的寄存器Ra及Rd。Ra和Rb使用指定的数据类型，而VAC使用合适的双精度数据类型(16、18、32和64位分别对应int8、int9、int16和int32)。每个双精度元素的高位存储在VACH和Rc。如果Rc＝Rd，Rc中的结果未定义。Adds each element of the Ra and Rb operands to each double precision element of the vector accumulator, and stores the double precision sum of each element to the vector accumulator and destination registers Ra and Rd. Ra and Rb use the specified data types, while VAC uses the appropriate double data type (16, 18, 32, and 64 bits correspond to int8, int9, int16, and int32, respectively). The upper bits of each double precision element are stored in VACH and Rc. If Rc = Rd, the result in Rc is undefined.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Aop[i]＝{VRa[i]‖SRa}；Aop[i]={VRa[i]‖SRa};

Bop[i]＝{VRb[i]‖SRb}；Bop[i]={VRb[i]‖SRb};

VACH[i]：VACL[i]＝sex(Aop[i]+Bop[i])+VACH[i]：VACL[i]；VACH[i]:VACL[i]=sex(Aop[i]+Bop[i])+VACH[i]:VACL[i];

Rc[i]＝VACH[i]；Rc[i]=VACH[i];

Rd[i]＝VACL[i]；Rd[i]=VACL[i];

}}

VADACL 加和累加低位VADACL Add and accumulate low bit

格式Format

汇编器句法assembler syntax

VADACL.dt VRd，VRa，VRbVADACL.dt VRd, VRa, VRb

VADACL.dt VRd，VRa，SRbVADACL.dt VRd, VRa, SRb

VADACL.dt VRd，VRa，#IMMVADACL.dt VRd, VRa, #IMM

VADACL.dt SRd，SRa，SRbVADACL.dt SRd, SRa, SRb

VADACL.dt SRd，SRa，#IMMVADACL.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

支持的模式 D：S：MDS V＜-V@Vint8(b) V＜-V@Sint9(b9) V＜-V@Iint16(h) S＜-S@Sint32(w) S＜-S@I supported modes D:S:MDS V<-V@Vint8(b) V<-V@Sint9(b9) V<-V@Iint16(h) S<-S@Sint32(w) S<-S@I

将Ra和Rb/立即操作数的每个元素与向量累加器的每个扩展精度元素相加，将扩展精度的和存入向量累加器；将较低精度返回到目的寄存器Rd。Ra和Rb/立即数使用指定的数据类型，而VAC用适当的双精度数据类型(16、18、32和64位分别对应int8、int9、int16和int32)。每个扩展精度元素的高位存储在VACH中。Add each element of Ra and Rb/immediate operand to each extended precision element of vector accumulator, store extended precision sum in vector accumulator; return lower precision to destination register Rd. Ra and Rb/immediate use the specified data type, while VAC uses the appropriate double precision data type (16, 18, 32, and 64 bits correspond to int8, int9, int16, and int32, respectively). The high-order bits of each extended-precision element are stored in VACH.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}；Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)};

VACH[i]：VACL[i]＝sex(Ra[i]+Bop[i])+VACH[i]：VACL[i]；VACH[i]:VACL[i]=sex(Ra[i]+Bop[i])+VACH[i]:VACL[i];

Rd[i]＝VACL[i]；Rd[i]=VACL[i];

}}

VADD 加VADD plus

格式Format

汇编器句法assembler syntax

VADD.dt VRd，VRa，VRbVADD.dt VRd, VRa, VRb

VADD.dt VRd，VRa，SRbVADD.dt VRd, VRa, SRb

VADD.dt VRd，VRa，#IMMVADD.dt VRd, VRa, #IMM

VADD.dt SRd，SRa，SRbVADD.dt SRd, SRa, SRb

VADD.dt SRd，SRa，#IMMVADD.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w，f}。where dt = {b, b9, h, w, f}.

支持的模式 D：S：M V＜-V@V V＜-V@S V＜-V@I S＜-S@S S＜-S@I DS int8(b) int9(b9) int16(h) int32(w) float(f) supported modes D:S:M V<-V@V V<-V@S V<-V@I S<-S@S S<-S@I DS int8(b) int9(b9) int16(h) int32(w) float(f)

说明illustrate

加Ra及Rb/立即操作数，并将其和返回到目的寄存器Rd。Add Ra and Rb/immediate operands, and return the sum to the destination register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝[VRb[i]‖SRb‖sex(IMM<8：0>))；Bop[i]=[VRb[i]‖SRb‖sex(IMM<8:0>));

Rd[i]＝Ra[i]+Bop[i]；Rd[i]=Ra[i]+Bop[i];

}}

异常abnormal

溢出、浮点无效操作数。Overflow, floating-point invalid operand.

VADDH 加两个相邻元素VADDH Add two adjacent elements

格式Format

汇编器句法assembler syntax

VADDH.dt VRd，VRa，VRbVADDH.dt VRd, VRa, VRb

VADDH.dt VRd，VRa，SRbVADDH.dt VRd, VRa, SRb

其中dt＝{b，b9，h，w，f}。where dt = {b, b9, h, w, f}.

支持的模式 D：S：M V＜-V@V V＜-V@S DS int8.(b) int9(b9) int16(h) int32(w) float(f) 说明supported modes D:S:M V<-V@V V<-V@S DS int8.(b) int9(b9) int16(h) int32(w) float(f) illustrate

操作operate

for(i＝0；i＜NumElem-1；i++){for(i=0; i<NumElem-1; i++){

Rd[i]＝Ra[i]+Ra[i+1]；Rd[i]=Ra[i]+Ra[i+1];

}}

Rd[NumElem-1]＝Ra[NumElem-1]+{VRb[0]‖SRb}；Rd[NumElem-1]=Ra[NumElem-1]+{VRb[0]‖SRb};

异常abnormal

溢出、浮点无效操作数。Overflow, floating-point invalid operand.

编程注解programming notes

本指令不受元素屏蔽影响。This directive is unaffected by elemental shielding.

VAND 与VAND with

格式Format

汇编器句法assembler syntax

VAND.dt VRd，VRa，VRbVAND.dt VRd, VRa, VRb

VAND.dt VRd，VRa，SRbVAND.dt VRd, VRa, SRb

VAND.dt VRd，VRa，#IMMVAND.dt VRd, VRa, #IMM

VAND.dt SRd，SRa，SRbVAND.dt SRd, SRa, SRb

VAND.dt SRd，SRa，#IMMVAND.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。注意.w和.f指定相同的操作。where dt = {b, b9, h, w}. Note that .w and .f specify the same operation.

支持的模式 D：S：M V＜-V@V V＜-V@S V＜-V@I S＜-S@S S＜-S@I DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M V<-V@V V<-V@S V<-V@I S<-S@S S<-S@I DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

对Ra和Rb/立即操作数进行逻辑与，并将结果返回到目的寄存器Rd。Logical AND is performed on Ra and Rb/immediate operand, and the result is returned to the destination register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]<k>＝Ra[i]<k>&Bop[i]<k>，k＝for all bits in elementi；Rd[i]<k>＝Ra[i]<k>&Bop[i]<k>, k＝for all bits in elementi;

}}

异常abnormal

无。none.

VANDC 与补码VANDC and complement

格式Format

汇编器句法assembler syntax

VANDC.dt VRd，VRa，VRbVANDC.dt VRd, VRa, VRb

VANDC.dt VRd，VRa，SRbVANDC.dt VRd, VRa, SRb

VANDC.dt VRd，VRa，#IMMVANDC.dt VRd, VRa, #IMM

VANDC.dt SRd，SRa，SRbVANDC.dt SRd, SRa, SRb

VANDC.dt SRd，SRa，#IMMVANDC.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。注意.w和.f指定同样的操作。where dt = {b, b9, h, w}. Note that .w and .f specify the same operation.

说明illustrate

对Ra和Rb/立即操作数的补码进行逻辑与，并将结果返回到目的寄存器Rd。Logically AND the complement of Ra and Rb/immediate operand and return the result in destination register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝(VRb[i]‖SRb‖sex(IMM<8：0>)}；Bop[i]=(VRb[i]‖SRb‖sex(IMM<8:0>)};

Rd[i]<k>＝Ra[i]<k>&～Bop[i]<k>，k＝for all bits in elementi；Rd[i]<k>＝Ra[i]<k>&～Bop[i]<k>, k＝for all bits in elementi;

}}

异常abnormal

无。none.

VASA 累加器算术移位VASA Accumulator Arithmetic Shift

格式Format

汇编器句法assembler syntax

VASAL.dtVASAL.dt

VASAR.dtVASAR.dt

其中dt＝{b，b9，h，w}，而R指出向左或向右移位方向。where dt = {b, b9, h, w}, and R indicates the shift direction to the left or to the right.

支持的模式 R left right DS int8(b) int9(b9) int16(h) int32(w) supported modes R left right DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

向量累加器寄存器的每个数据元素左移一位的位置，且从右端以零填充(若R＝0)，或者带符号扩展右移一位的位置(若R＝1)。其结果存储在向量累加器中。Each data element of the vector accumulator register is shifted left by one position and filled from the right with zeros (if R=0), or sign-extended right shifted by one position (if R=1). The result is stored in the vector accumulator.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

if(R＝1)if(R＝1)

VACOH[i]：VACOL[i]＝VACOH[i]：VACOL[i]sign＞＞1；VACOH[i]:VACOL[i]=VACOH[i]:VACOL[i]sign＞＞1;

elseelse

VACOH[i]：VACOL[i]＝VACOH[i]：VACOL[i]＜＜1；VACOH[i]: VACOL[i]=VACOH[i]: VACOL[i]<<1;

}}

异常abnormal

溢出。overflow.

VASL 算术左移VASL arithmetic left shift

格式Format

汇编器句法assembler syntax

VASL.dt VRd，VRa，SRbVASL.dt VRd, VRa, SRb

VASL.dt VRd，VRa，#IMMVASL.dt VRd, VRa, #IMM

VASL.dt SRd，SRa，SRbVASL.dt SRd, SRa, SRb

VASL.dt SRd，SRa，#IMMVASL.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

支持的模式 D：S：M V＜-V@S V＜-V@I S＜-S@S S＞-S@I DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M V<-V@S V<-V@I S<-S@S S>-S@I DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

向量/标量寄存器Ra的每个数据元素左移，从右端以零填充，移位量由标量寄存器Rb或IMM字段给出，其结果存放在向量/标量寄存器Rd中。对引起溢出的那些元素，其结果根据其符号包含和到最大正值或最大负值。移位量定义为无符号整数。Each data element of the vector/scalar register Ra is shifted to the left and filled with zeros from the right. The shift amount is given by the field of the scalar register Rb or IMM, and the result is stored in the vector/scalar register Rd. For those elements that cause overflow, the result is inclusive and up to the most positive or most negative value, according to their signs. The shift amount is defined as an unsigned integer.

操作operate

shift_amount＝{SRb％32‖IMM<4：0>}；shift_amount={SRb%32‖IMM<4:0>};

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝saturate(Ra[i]＜＜shift_amount)；Rd[i]=saturate(Ra[i]<<shift_amount);

}}

异常abnormal

无。none.

编程注解programming notes

注意Shift-amount是从SRb或IMM<4：0>取得的5位数。对byte，byte9，halfword数据类型，编程者负责正确指定移位量，此移位量小于或等于数据长度的位数。如果移位量大于指定的数据长度，元素将由零来填充。Note that Shift-amount is a 5-digit number obtained from SRb or IMM<4:0>. For byte, byte9, and halfword data types, the programmer is responsible for correctly specifying the shift amount, which is less than or equal to the number of bits of the data length. If the shift amount is greater than the specified data length, the elements will be filled with zeros.

VASR 算术右移VASR Arithmetic shift right

格式Format

汇编器句法assembler syntax

VASR.dt VRd，VRa，SRbVASR.dt VRd, VRa, SRb

VASR.dt VRd，VRa，#IMMVASR.dt VRd, VRa, #IMM

VASR.dt SRd，SRa，SRbVASR.dt SRd, SRa, SRb

VASR.dt SRd，SRa，#IMMVASR.dt SRd, SRa, #IMM

其中dt＝{ b，b9，h，w}。where dt = {b, b9, h, w}.

支持的模式 D：S：M V＜-V@S V＜-V@I S＜-S@S S＜-S@I DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M V<-V@S V<-V@I S<-S@S S<-S@I DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

向量/标量寄存器Ra的每个数据元素被算术右移，最高有效位位置有符号扩展，移位量在标量寄存器Rb或IMM字段的最低有效位中给出，其结果存储在向量/标量寄存器Rd中。移位量规定为无符号整数。Each data element of the vector/scalar register Ra is arithmetically right-shifted, the most significant bit position is sign-extended, the shift amount is given in the least significant bit of the scalar register Rb or IMM field, and the result is stored in the vector/scalar register Rd middle. The shift amount is specified as an unsigned integer.

操作operate

shift_amount＝{SRb％32‖IMM<4：0>}；shift_amount={SRb%32‖IMM<4:0>};

for(i＝0；i＜NumElem && EMAS[i]；i++)(for(i=0; i<NumElem &&EMAS[i]; i++)(

Rd[i]＝Ra[i]sign＞＞shift_amount；Rd[i]=Ra[i]sign＞＞shift_amount;

}}

异常abnormal

无。none.

编程注解programming notes

注意Shift-amount是从SRb或IMM<4：0>取得的5位数。对于byte，byte9，halfword数据类型，编程者负责正确地指定移位量，此移位量小于或等于数据长度的位数。如果移位量大于指定的数据长度，元素将由符号位填充。Note that Shift-amount is a 5-digit number obtained from SRb or IMM<4:0>. For byte, byte9, and halfword data types, the programmer is responsible for correctly specifying the shift amount, which is less than or equal to the number of bits of the data length. If the shift amount is greater than the specified data length, the elements will be filled with sign bits.

VASS3 加及减(-1，0，1)符号VASS3 plus and minus (-1, 0, 1) symbols

格式Format

汇编器句法assembler syntax

VASS3.dt VRd，VRa，VRbVASS3.dt VRd, VRa, VRb

VASS3.dt VRd，VRa，SRbVASS3.dt VRd, VRa, SRb

VASS3.dt SRd，SRa，SRbVASS3.dt SRd, SRa, SRb

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

向量/标量寄存器Ra的内容被加到Rb以产生一中间结果，然后将Ra的符号从此中间结果中去掉；最终结果存储在向量/标量寄存器Rd中。The contents of vector/scalar register Ra are added to Rb to produce an intermediate result from which the sign of Ra is then removed; the final result is stored in vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

if(Ra[i]＞0) extsgn3＝1；if(Ra[i]＞0) extsgn3=1;

else if(Ra[i]＜0) extsgn3＝-1；else if(Ra[i]<0) extsgn3=-1;

else extsgn3＝0；else extsgn3=0;

Rd[i]＝Ra[i]+Rb[i]-extsgn3；Rd[i]=Ra[i]+Rb[i]-extsgn3;

}}

异常abnormal

溢出。overflow.

VASUB 减操作的绝对值VASUB The absolute value of the subtraction operation

格式Format

汇编器句法assembler syntax

VASUB.dt VRd，VRa，VRbVASUB.dt VRd, VRa, VRb

VASUB.dt VRd，VRa，SRbVASUB.dt VRd, VRa, SRb

VASUB.dt VRd，VRa，#IMMVASUB.dt VRd, VRa, #IMM

VASUB.dt SRd，SRa，SRbVASUB.dt SRd, SRa, SRb

VASUB.dt SRd，SRa，#IMMVASUB.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

支持的模式 D：S：M V＜-V@V V+V@S V＜-V@I S<-S@S S＜-S@I DS int8(b) int9(b9) int16(h) int32(w) float(f) supported modes D:S:M V<-V@V V+V@S V<-V@I S<-S@S S<-S@I DS int8(b) int9(b9) int16(h) int32(w) float(f)

说明illustrate

向量/标量寄存器Rb或IMM字段的内容从向量/标量寄存器Ra的内容中减去，其绝对值结果存储在向量/标量寄存器Rd中。The contents of the vector/scalar register Rb or the IMM field are subtracted from the contents of the vector/scalar register Ra, and the absolute value result is stored in the vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝[Rb[i]‖SRb‖sex(IMM<8：0>)}；Bop[i]=[Rb[i]‖SRb‖sex(IMM<8:0>)};

Rd[i]＝|Ra[i]～Bop[i]|；Rd[i]＝|Ra[i]～Bop[i]|;

异常abnormal

溢出，浮点无效操作数。Overflow, floating point invalid operand.

编程注解programming notes

如果减的结果是最大负数，则在绝对值操作后将发生溢出。如果允许饱和模式，此取绝对值操作的结果将是最大正数。If the result of the subtraction is the most negative number, overflow will occur after the absolute value operation. If saturation mode is enabled, the result of this absolute value operation will be the largest positive number.

VAVG 两元素平均VAVG two-element average

格式Format

汇编器句法assembler syntax

VAVG.dt VRd，VRa，VRbVAVG.dt VRd, VRa, VRb

VAVG.dt VRd，VRa，SRbVAVG.dt VRd, VRa, SRb

VAVG.dt SRd，SRa，SRbVAVG.dt SRd, SRa, SRb

其中dt＝{b，b9，h，w，f}。对整型数据类型使用VAVGT以指定“截断”舍入模式。where dt = {b, b9, h, w, f}. Use VAVGT on integer data types to specify a "truncate" rounding mode.

支持的模式 D：S：M V＜-V@V V＜-V@S S＜-S@S DS int8(b) int9(b9) int16(h) int32(w) float(f) supported modes D:S:M V<-V@V V<-V@S S<-S@S DS int8(b) int9(b9) int16(h) int32(w) float(f)

说明illustrate

向量/标量寄存器Ra的内容加到向量/标量寄存器Rb的内容上以产生一中间结果；接着中间结果被2除，并将最终结果存储在向量/标量寄存器Rd中。对整型数据类型，如果T＝1舍入模式是截断，而如果T＝0(缺省)，则舍去零。对浮点数据类型，舍入模式由VCSR<RMODE>指定。The contents of vector/scalar register Ra are added to the contents of vector/scalar register Rb to produce an intermediate result; the intermediate result is then divided by 2 and the final result is stored in vector/scalar register Rd. For integer data types, the rounding mode is truncate if T=1, and truncate zeros if T=0 (the default). For floating-point data types, the rounding mode is specified by VCSR<RMODE>.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝(Rb[i]‖SRb‖sex(IMM<8：0>)}；Bop[i]=(Rb[i]‖SRb‖sex(IMM<8:0>)};

Rd[i]＝(Ra[i]+Bop[i])//2；Rd[i]=(Ra[i]+Bop[i])//2;

}}

异常abnormal

无。none.

VAVGH 两相邻元素平均VAVGH Average of two adjacent elements

格式Format

汇编器句法assembler syntax

VAVGH.dt VRd，VRa，VRbVAVGH.dt VRd, VRa, VRb

VAVGH.dt VRd，VRa，SRbVAVGH.dt VRd, VRa, SRb

其中dt＝{b，b9，h，w，f]。对整型数据类型使用VAVGHT以指定“截断”舍入模式。where dt = {b, b9, h, w, f]. Use VAVGHT on integer data types to specify a "truncate" rounding mode.

支持的模式 D：S：M V＜-V@V V＜-V@S DS int8(b) int9(b9) int16(h) int32(w) float(f) supported modes D:S:M V<-V@V V<-V@S DS int8(b) int9(b9) int16(h) int32(w) float(f)

说明illustrate

对于每个元素，平均两个相邻元素对。对整型数据类型，如果T＝1，舍入模式是截断，而对T＝0(缺省)则是舍去零。对浮点数据类型，舍入模式由VCSR<RMODE>指定。For each element, two adjacent pairs of elements are averaged. For integer data types, the rounding mode is truncate if T=1, and truncate to zero for T=0 (the default). For floating-point data types, the rounding mode is specified by VCSR<RMODE>.

操作operate

for(i＝0；i＜NumElem-1；i++){for(i=0; i<NumElem-1; i++){

Rd[i]＝(Ra[i]+Ra[i+l])//2；Rd[i]=(Ra[i]+Ra[i+l])//2;

}}

Rd[NumElem-1]＝(Ra[NumElem-1)+{VRb[0]‖SRb})//2；Rd[NumElem-1]=(Ra[NumElem-1)+{VRb[0]∥SRb})//2;

异常abnormal

无。none.

编程注解programming notes

此指令不受元素屏蔽影响。This directive is unaffected by elemental masking.

VAVGQ 四平均VAVGQ four average

格式Format

汇编器句法assembler syntax

VAVGQ.dt VRd，VRa，VRbVAVGQ.dt VRd, VRa, VRb

其中dt＝{b，b9，h，w}。对整型数据类型使用VAVGQT以指示“截断”舍入模式。where dt = {b, b9, h, w}. Use VAVGQT for integer data types to indicate "truncate" rounding mode.

支持的模式 D：S：M V＜-V@V DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M V<-V@V DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

在VEC64模式下不支持本指令。This command is not supported in VEC64 mode.

如下图所示，采用由T指定的截断模式(1为截断，0为舍去零，缺省)来计算4个元素的平均。注意最左边的元素(D_n-1)未定义。As shown in the figure below, the average of 4 elements is calculated using the truncation mode specified by T (1 is truncation, 0 is truncation, default). Note that the leftmost element (D _n-1 ) is undefined.

操作operate

for(i＝0；i＜NumElem-1；i++){for(i=0; i<NumElem-1; i++){

Rd[i]＝(Ra[i]+Rb[i]+Ra[i+1]+Rb[i+1])//4：Rd[i]=(Ra[i]+Rb[i]+Ra[i+1]+Rb[i+1])//4:

}}

异常abnormal

无。none.

VCACHE Cache操作VCACHE Cache operation

格式Format

汇编器句法assembler syntax

VCACHE.fc SRd，SRiVCACHE.fc SRd, SRi

VCACHE.fc SRb，#IMMVCACHE.fc SRb, #IMM

VCACHE.fc SRb+，SRiVCACHE.fc SRb+, SRi

VCACHE.fc SRb+，#IMMVCACHE.fc SRb+, #IMM

其中fc＝{0，1}。where fc = {0, 1}.

说明illustrate

本指令供向量数据Cache的软件管理使用。当数据Cache的部分或全部被象暂时存储器那样配置，此指令对暂时存储器无影响。This command is used for software management of vector data cache. When part or all of the data cache is configured like temporary storage, this instruction has no effect on temporary storage.

支持下列选项： FC<2：0> 意义 000 回写并使其标签与EA匹配的改动过的Cache行无效。如果匹配行包含未改动过的数据，则使此行无效而不回写。如果发现没有Cache行包含EA，则数据Cache保留不被触动。 001 回写并使由EA的索引指定的改动过的Cache行无效。如果匹配行包含未改动过的数据，使此行无效而不回写。其它未定义 The following options are supported: FC<2:0> significance 000 Invalidate the modified cache line that was written back and had its tag match the EA. If a matching row contains unchanged data, invalidate the row without writing it back. If no cache line is found to contain an EA, the data cache remains untouched. 001 Writes back and invalidates the modified cache line specified by the EA's index. If a matching row contains unchanged data, invalidate the row without writing it back. other undefined

操作operate

异常abnormal

无。none.

编程注解programming notes

VCAND 补码与VCAND complement and

格式Format

汇编器句法assembler syntax

VCAND.dt VRd，VRa，VRbVCAND.dt VRd, VRa, VRb

VCAND.dt VRd，VRa，SRbVCAND.dt VRd, VRa, SRb

VCAND.dt VRd，VRa，#IMMVCAND.dt VRd, VRa, #IMM

VCAND.dt SRd，SRa，SRbVCAND.dt SRd, SRa, SRb

VCAND.dt SRd，SRa，#IMMVCAND.dt SRd, SRa, #IMM

其中dt＝(b，b9，h，w)。注意.w和.f指明同样操作。where dt = (b, b9, h, w). Note that .w and .f indicate the same operation.

说明illustrate

对Ra和Rb/立即操作数的补码进行逻辑与，并将其结果返回到目的寄存器Rd。Logically AND the complement of Ra and Rb/immediate operand and return the result in destination register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]<k>＝～Ra[i]<k>&Bop[i]<k>，k＝for all bits in elementi；Rd[i]<k>＝～Ra[i]<k>&Bop[i]<k>，k＝for all bits in elementi;

}}

异常abnormal

无。none.

VCBARR 条件屏障VCBARR conditional barrier

格式Format

汇编器句法assembler syntax

VCBARR.condVCBARR.cond

其中cond＝{0-7}。每个条件将在后面给出助记符。where cond={0-7}. Each condition will be given a mnemonic after it.

说明illustrate

只要此条件保持有效，拖延本指令以及全部后面的指令(出现在程序序列后面的那些)。Cond<2：0>字段的解释不同于在CT格式下的其它条件指令。This instruction and all following instructions (those appearing later in the program sequence) are stalled as long as this condition remains in effect. The interpretation of the Cond<2:0> field is different from other conditional instructions in the CT format.

当前定义下列条件： Cond<2：0> 意义 000 在执行较后的任何指令之前，等待全部早先的指令(程序序列中较早出现的)以结束执行。其它未定义 The following conditions are currently defined: Cond<2:0> significance 000 All earlier instructions (that occur earlier in the program sequence) are waited for to complete execution before executing any later instructions. other undefined

操作operate

while(Cond＝true)while(Cond=true)

stall all later instructoins；stall all later instructoins;

异常abnormal

无。none.

编程注解programming notes

此指令为软件而提供以强制指令的连串执行。此指令可被用于强迫精确报告不明确的异常事件。例如，如果该指令被立即用在可引起异常事件的算术指令以后，则此异常事件将以对该指令寻址的程序计数器报告。This instruction is provided for software to enforce the serial execution of instructions. This directive can be used to force precise reporting of ambiguous exceptions. For example, if the instruction is used immediately after an arithmetic instruction that causes an exception, the exception will be reported in the program counter addressed by the instruction.

VCBR 条件转移VCBR conditional transfer

格式Format

汇编器句法assembler syntax

VCBR.cond #OffsetVCBR.cond #Offset

其中cond＝{un，lt，eq，le，gt，ne，ge，ov}。where cond = {un, lt, eq, le, gt, ne, ge, ov}.

说明illustrate

如果Cond为真，则转移，这不是延迟转移。Branch if Cond is true, this is not a delayed transfer.

操作operate

If((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un))If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un))

VPC＝VPC+sex(Offset<22：0>*4)；VPC＝VPC+sex(Offset<22:0>*4);

elseVPC＝VPC+4；elseVPC=VPC+4;

异常abnormal

无效的指令地址。Invalid instruction address.

VCBRI 间接条件转移VCBRI indirect conditional transfer

格式Format

汇编器句法assembler syntax

VCBRI.cond SRbVCBRI.cond SRb

说明illustrate

如果Cond为真，则间接转移。这不是延迟转移。Branches indirectly if Cond is true. This is not delayed transfer.

操作operate

VPC＝SRb<31：2>：b’00；VPC=SRb<31:2>:b'00;

elseVPC＝VPC+4；elseVPC=VPC+4;

异常abnormal

无效的指令地址。Invalid instruction address.

VCCS 条件现场转换VCCS Conditional Field Conversion

格式Format

汇编器句法assembler syntax

VCCS #OffsetVCCS #Offset

说明illustrate

如果VIMSK<cse>为真，则跳转到现场转换子程序。这不是延迟转移。If VIMSK<cse> is true, jump to the scene conversion subroutine. This is not delayed transfer.

如果VIMSK<cse>为真，则将VPC+4(返回地址)保存到返回地址堆栈。如果非真，从VPC+4继续执行。If VIMSK<cse> is true, save VPC+4 (return address) to the return address stack. If not true, continue execution from VPC+4.

操作operate

If(VIMSK<cse>＝1){If(VIMSK<cse>=1){

if(VSP<4>＞15){If(VSP<4>＞15){

VISRC<RASO>＝1；VISRC<RASO>=1;

signal ARM7 with RASO exception；Signal ARM7 with RASO exception;

VP STATE＝VP_IDLE；VP STATE=VP_IDLE;

}else{}else{

RSTACK[VSP<3：0>]＝VPC+4；RSTACK[VSP<3:0>]＝VPC+4;

VSP<4：0>＝VSP<4：0>+1；VSP<4:0>＝VSP<4:0>+1;

VPC＝VPC+sex(Offset<22：0>*4)；VPC＝VPC+sex(Offset<22:0>*4);

}}

}else VPC＝VPC+4；}else VPC=VPC+4;

异常abnormal

返回地址堆栈溢出。Return address stack overflow.

VCHGCR 改变控制寄存器VCHGCR Change Control Register

格式Format

汇编器句法assembler syntax

VCHGCR ModeVCHGCR Mode

说明illustrate

此指令改变向量处理器的操作模式This instruction changes the operating mode of the vector processor

Mode中每位指定如下：方式意义 bit1：0 这两位控制VCSR<CBANK>位。编码指定：00-不改变01-清除VCSR<CBANK>位10-设置VCSR<CBANK>位11-触发VCSR<CBANK>位 bits3：2 这两位控制VCSR<SMM>位。编码指定：00-不改变01-清除VCSR<SMM>位10-设置VCSR<SMM>位11-触发VCSR<SMM>位 bit5：4 这两位控制VCSR<CEM>位。编码指定：00-不改变01-清除VCSR<CEM>位10-设置VCSR<CEM>位11-触发VCSR<CEM>位其它未定义 Each bit in Mode is specified as follows: Way significance bit1:0 These two bits control the VCSR<CBANK> bits. Code designation: 00-no change 01-clear VCSR<CBANK> bit 10-set VCSR<CBANK> bit 11-trigger VCSR<CBANK> bit bits3: 2 These two bits control the VCSR<SMM> bits. Code designation: 00-no change 01-clear VCSR<SMM> bit 10-set VCSR<SMM> bit 11-trigger VCSR<SMM> bit bit5: 4 These two bits control the VCSR<CEM> bit. Code specification: 00-no change 01-clear VCSR<CEM>bit 10-set VCSR<CEM>bit 11-trigger VCSR<CEM>bit other undefined

操作operate

异常abnormal

无。none.

编程注解programming notes

本指令为硬件提供，以比VMOV指令更为有效的方式改变VCSR中的控制位。This instruction is provided by hardware to change the control bits in VCSR in a more efficient way than VMOV instruction.

VCINT 条件中断ARM7VCINT conditional interrupt ARM7

格式Format

汇编器句法assembler syntax

VCINT.cond #ICODEVCINT.cond #ICODE

说明illustrate

如果Cond为真，当允许时，停止执行并中断ARM7。If Cond is true, stop execution and interrupt the ARM7 when enabled.

操作operate

If((Cond＝VCSR[SO，GT.EQ，LT])|(Cond＝un)){If((Cond=VCSR[SO,GT.EQ,LT])|(Cond=un)){

VISRC<vip>＝1；VISRC<vip>=1;

VIINS＝[VCINT.cond#ICODE instruction]；VIINS = [VCINT.cond#ICODE instruction];

VEPC＝VPC；VEPC=VPC;

if(VIMSK<vie>＝1)signal ARM7 interrupt；If(VIMSK<vie>＝1)signal ARM7 interrupt;

VP_STATE＝VP_IDLE；VP_STATE=VP_IDLE;

}}

else VPC＝VPC+4；else VPC=VPC+4;

异常abnormal

VCINT中断。VCINT interrupt.

VCJOIN 用ARM7任务条件连接VCJOIN joins with ARM7 task conditions

格式Format

汇编器句法assembler syntax

VCJOIN.cond #OffsetVCJOIN.cond #Offset

说明illustrate

操作operate

If((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un)){If((Cond=VCSR[SO,GT,EQ,LT])|(Cond=un)){

VISRC<vjp>＝-1；VISRC<vjp>=-1;

VIINS＝[VCJOIN.cond#Offset instruction]；VIINS = [VCJOIN.cond#Offset instruction];

VEPC＝VPC；VEPC=VPC;

if(VIMSK<vje>＝1)signal ARM7 interrupt；If(VIMSK<vje>＝1)signal ARM7 interrupt;

VP_STATE＝VP_IDLE；VP_STATE=VP_IDLE;

}}

else VPC＝VPC+4；else VPC=VPC+4;

异常abnormal

VCJOIN中断。VCJOIN interrupted.

VCJSR 条件跳转到子程序VCJSR Conditional jump to subroutine

格式Format

汇编器句法assembler syntax

VCJSR.cond #OffsetVCJSR.cond #Offset

说明illustrate

如果Cond为真，则跳转到子程序。这不是延迟转移。If Cond is true, jump to the subroutine. This is not delayed transfer.

如果Cond为真，将VPC+4(返回地址)保存到返回地址堆栈。如果非真，则从VPC+4继续执行。If Cond is true, save VPC+4 (return address) to the return address stack. If not true, execution continues from VPC+4.

操作operate

If((Cond＝VCSR[SO，GT.EQ，LT])|(Cond＝un))(If((Cond=VCSR[SO,GT.EQ,LT])|(Cond=un))(

if(VSP<4>＞15){If(VSP<4>＞15){

VISRC<RASO>＝1；VISRC<RASO>=1;

signal ARM7 with RASO exception；Signal ARM7 with RASO exception;

VP_STATE＝VP_IDLE；VP_STATE=VP_IDLE;

}else{}else{

RSTACK[VSP<3：0>]＝VPC+4；RSTACK[VSP<3:0>]＝VPC+4;

VSP<4：0>＝VSP<4：0>+1：VSP<4:0>＝VSP<4:0>+1:

VPC＝VPC+sex(Offset<22：0>*4)；VPC＝VPC+sex(Offset<22:0>*4);

}}

}else VPC＝VPC+4；}else VPC=VPC+4;

异常abnormal

返回地址堆栈溢出。Return address stack overflow.

VCJSRI 间接条件跳转到子程序VCJSRI indirect conditional jump to subroutine

格式Format

汇编器句法assembler syntax

VCJSRI.cond SRbVCJSRI.cond SRb

说明illustrate

如果Cond为真，则间接跳转到子程序。这不是延迟转移。Indirect jump to subroutine if Cond is true. This is not delayed transfer.

如果Cond为真，则将VPC+4(返回地址)保存到返回地址堆栈。如果非真，从VPC+4继续执行。If Cond is true, save VPC+4 (return address) to the return address stack. If not true, continue execution from VPC+4.

操作operate

if(VSP<4：0>15){If(VSP<4:0>15){

VISRC<RASO>＝1；VISRC<RASO>=1;

signal ARM7 with RASO exception；Signal ARM7 with RASO exception;

VP_STATE＝VP_IDLE；VP_STATE=VP_IDLE;

}else{}else{

RSTACK[VSP<3：0>]＝VPC+4；RSTACK[VSP<3:0>]＝VPC+4;

VSP<4：0>＝VSP<4：0>+1；VSP<4:0>＝VSP<4:0>+1;

VPC＝SRb<31：2>：b′00；VPC=SRb<31:2>:b′00;

}}

}else VPC＝VPC+4：}else VPC=VPC+4:

异常abnormal

返回地址堆栈溢出。Return address stack overflow.

VCMOV 条件转移VCMOV conditional transfer

格式Format

汇编器句法assembler syntax

VCMOV dt Rd，Rb，condVCMOV dt Rd, Rb, cond

VCMOV.dt Rd，#IMM，condVCMOV.dt Rd, #IMM, cond

其中dt＝{b，b9，h，w，f}，cond＝(un，lt，eq，le，gt，ne，ge，ov}。注意.f和.w指定相同的操作，除非.f数据类型不支持9位的立即操作数。where dt = {b, b9, h, w, f}, cond = (un, lt, eq, le, gt, ne, ge, ov}. Note that .f and .w specify the same operation unless .f data Type does not support 9-bit immediate operands.

支持的模式 D：S：M V＜-V V＜-S V＜-I S＜-S S＜-I DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M V<-V V<-S V<-I S<-S S<-I DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

如果Cond为真，寄存器Rb的内容转移到寄存器Rd。ID<1：0>进一步指定源和目的寄存器：If Cond is true, the contents of register Rb are transferred to register Rd. ID<1:0> further specifies the source and destination registers:

VR 当前组向量寄存器VR current group vector register

SR 标量寄存器SR scalar register

SY 同步寄存器SY Synchronization register

VAC 向量累加器寄存器(对VAC寄存器编码参照VMOV说明) D：S：M ID<1：0>＝00 ID<1：0>＝01 ID<1：0>＝10 ID<1：0>＝11 V＜-V VR＜-VR VR＜-VAC VAC＜-VR V＜-S VR＜-SR VAC＜-SR V＜-I VR＜-I S＜-S SR＜-SR S＜-I SR＜-1 VAC vector accumulator register (refer to VMOV description for VAC register encoding) D:S:M ID<1:0>=00 ID<1:0>=01 ID<1:0>=10 ID<1:0>=11 V<-V VR<-VR VR<-VAC VAC<-VR V<-S VR<-SR VAC<-SR V<-I VR<-I S<-S SR<-SR S<-I SR<-1

操作operate

If((Cond＝VCSR[SOV，GT，EQ，LT])|(Cond＝un))If((Cond=VCSR[SOV,GT,EQ,LT])|(Cond=un))

for(i＝0；i＜NumElem；i++)for(i=0; i<NumElem; i++)

Rd[i]＝{Rb[i]‖SRb‖Sex(IMM<8：0>)}；异常Rd[i]={Rb[i]‖SRb‖Sex(IMM<8:0>)}; exception

无。none.

编程注解programming notes

本指令不受元素屏蔽的影响，VCMOVM受元素屏蔽影响。This instruction is not affected by element masking, VCMOVM is affected by element masking.

对8个元素，向量累加器中扩展浮点精度的表示使用了全部576位。因而，包括累加器的向量寄存器的转移必须指定.b9数据长度。For 8 elements, the extended floating-point precision representation in the vector accumulator uses all 576 bits. Thus, vector register transfers that include the accumulator must specify a .b9 data length.

VCMOVM 带元素屏蔽的条件转移VCMOVM conditional branch with element masking

格式Format

汇编器句法assembler syntax

VCMOVM.dt Rd，Rb，condVCMOVM.dt Rd, Rb, cond

VCMOVM.dt Rd，#IMM，condVCMOVM.dt Rd, #IMM, cond

其中dt＝{b，b9，h，w，f}，cond＝{un，lt，eq，le，gt，ne，ge，ov}。注意.f和.w指定相同的操作，除非.f数据类型不支持9位的立即操作数。where dt = {b, b9, h, w, f}, cond = {un, lt, eq, le, gt, ne, ge, ov}. Note that .f and .w specify the same operation, except that the .f data type does not support 9-bit immediate operands.

支持的模式 D：S：M V＜-V V＜-S V＜-1 DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M V<-V V<-S V<-1 DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

如果Cond为真，则寄存器Rb的内容转移到寄存器Rd。ID<1：0>进一步指定源和目的寄存器：If Cond is true, the contents of register Rb are transferred to register Rd. ID<1:0> further specifies the source and destination registers:

VR 当前组向量寄存器VR current group vector register

SR 标量寄存器SR scalar register

VAC 向量累加器寄存器(对VAC寄存器编码参照VMOV说明) D：S：M ID<1：0>＝00 ID<1：0>＝ 01 ID<1：0>＝10 ID<1：0>＝11 V＜-V VR＜-VR VR＜-VAC VAC＜-VR V＜-S VR＜-SR VAC＜-SR V＜-I V＜-I S＜-S S＜-I VAC vector accumulator register (refer to VMOV description for VAC register encoding) D:S:M ID<1:0>=00 ID<1:0>=01 ID<1:0>=10 ID<1:0>=11 V<-V VR<-VR VR<-VAC VAC<-VR V<-S VR<-SR VAC<-SR V<-I V<-I S<-S S<-I

操作operate

for(i＝0；i＜NumElem && MMASK[i]；i++)for(i=0; i<NumElem &&MMASK[i]; i++)

Rd[i]＝{Rb[i]‖SRb‖sex(IMM<8：0>)}；Rd[i]={Rb[i]‖SRb‖sex(IMM<8:0>)};

异常abnormal

无。none.

编程注解programming notes

本指令受VMMR元素屏蔽的影响，VCMOV不受元素屏蔽影响。This instruction is affected by VMMR element masking, and VCMOV is not affected by element masking.

对8个元素，在向量累加器中扩展浮点精度的表示使用全部576位。因而，包括累加器的向量寄存器的转移必须指定.b9数据长度。For 8 elements, the extended floating-point precision representation in the vector accumulator uses all 576 bits. Thus, vector register transfers that include the accumulator must specify a .b9 data length.

VCMPV 比较和设置屏蔽VCMPV compare and set mask

格式Format

汇编器句法assembler syntax

VCMPV.dt VRa，VRb，cond，maskVCMPV.dt VRa, VRb, cond, mask

VCMPV.dt VRa，SRb，cond，maskVCMPV.dt VRa, SRb, cond, mask

其中dt＝{b，b9，h，w，f}，cond＝{lt，eq，le，gt，ne，ge，}，mask＝{VGMR，VMMR}。如果指定不屏蔽，则VGMR被假定。where dt={b, b9, h, w, f}, cond={lt, eq, le, gt, ne, ge,}, mask={VGMR, VMMR}. If no masking is specified, VGMR is assumed.

支持的模式 D：S：M M＜-V@V M＜-V@S DS int8(b) int9(b9) int16(h) int32(w) float(f) supported modes D:S:M M<-V@V M<-V@S DS int8(b) int9(b9) int16(h) int32(w) float(f)

说明illustrate

向量寄存器VRa和VRb的内容通过执行减法操作(VRa[i]-VRb[i])进行元素方式的比较，如果比较的结果与VCMPV指令的Cond字段相匹配，则VGMR(如K＝0)或VMMR(如K＝1)寄存器中相位的#i位被设置。例如，如果Cond字段小于(LT)，则当VRa[i]＜VRb[i]时设置VGMR[i](或VMMR[i])位。The contents of the vector registers VRa and VRb are compared element-wise by performing a subtraction operation (VRa[i]-VRb[i]). If the result of the comparison matches the Cond field of the VCMPV instruction, then VGMR (such as K=0) or The #i bit of the phase in the VMMR (eg K=1) register is set. For example, if the Cond field is less than (LT), the VGMR[i] (or VMMR[i]) bit is set when VRa[i]<VRb[i].

操作operate

for(i＝0：i＜NumElem：i++){ for(i=0: i<NumElem: i++){

Bop[i]＝{Rb[i]‖SRb‖sex(IMM<8：0>)}；Bop[i]={Rb[i]‖SRb‖sex(IMM<8:0>)};

relationship[i]＝Ra[i]？Bop[i]；Relationship[i]=Ra[i]? Bop[i];

if(K＝1)If(K＝1)

MMASK[i]＝(relationship[i]＝Cond)？True：False；MMASK[i]=(relationship[i]=Cond)? True: False;

elseelse

EMASK[i]＝(relationship[i]＝Cond)？True：False；EMASK[i]=(relationship[i]=Cond)? True: False;

}}

异常abnormal

无。none.

编程注解programming notes

此指令不受元素屏蔽的影响。This directive is not affected by elemental masking.

VCNTLZ 前导零的计数VCNTLZ Count of leading zeros

格式Format

汇编器句法assembler syntax

VCNTLZ.dt VRd，VRbVCNTLZ.dt VRd, VRb

VCNTLZ.dt SRd，SRbVCNTLZ.dt SRd, SRb

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

支持的模式 5 V＜-V S＜-S DS int8(b) int9(b9) int16(h) int32(w) supported modes 5 V<-V S<-S DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

对Rb中每个元素进行前导零数目的计数；将计数值返回到Rd中。Counts the number of leading zeros for each element in Rb; returns the count in Rd.

操作operate

for(i＝0：i＜NumElem && EMASK[i]；i++){ for(i=0: i<NumElem &&EMASK[i]; i++){

Rd[i]＝number of leading zeroes(Rb[i])；Rd[i]=number of leading zeroes(Rb[i]);

}}

异常abnormal

无。none.

编程注解programming notes

如果元素中全部位为零，则结果等于元素的长度(8、9、16或32分别对应byte、byte9、halfword或word)。If all bits in the element are zero, the result is equal to the length of the element (8, 9, 16, or 32 for byte, byte9, halfword, or word, respectively).

前导零的计数与元素位置索引具有相反的关系(如果用在VCMPR指令后面)。为转换到元素位置，对给定的数据类型，从NumElem减去VCNTLZ的结果。The count of leading zeros has an inverse relationship to the element position index (if used after a VCMPR instruction). For conversion to element position, the result of subtracting VCNTLZ from NumElem for the given data type.

VCOR 或的补码The complement of VCOR or

格式Format

汇编器句法assembler syntax

VCOR.dt VRd，VRa，VRbVCOR.dt VRd, VRa, VRb

VCOR.dt VRd，VRa，SRbVCOR.dt VRd, VRa, SRb

VCOR.dt VRd，VRa，#IMMVCOR.dt VRd, VRa, #IMM

VCOR.dt SRd，SRa，SRbVCOR.dt SRd, SRa, SRb

VCOR.dt SRd，SRa，#IMMVCOR.dt SRd, SRa, #IMM

说明illustrate

对Ra和Rb/立即操作数的补码作逻辑或，并将结果返回到目的寄存器操作Logically OR the complement of Ra and Rb/immediate operand, and return the result to the destination register operation

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]<k>＝-Ra[i]<k>|Bop[i]<k>，k＝for all bits in elementi； Rd[i]<k>＝-Ra[i]<k>|Bop[i]<k>，k＝for all bits in elementi;

异常abnormal

无。none.

VCRSR 从子程序条件返回VCRSR return from subroutine condition

格式Format

汇编器句法assembler syntax

VCRSR.condVCRSR.cond

说明illustrate

如果Cond为真，则从子程返回。这不是延迟转移Return from subroutine if Cond is true. this is not delayed transfer

如果Cond为真，则从保存在返回地址堆栈中的返回地址继续执行。如果非真，则从VPC+4继续执行。If Cond is true, execution continues from the return address held on the return address stack. If not true, execution continues from VPC+4.

操作operate

If((Cond＝VCSR[SO，GTEQ，LT])|(Cond＝un)){If((Cond=VCSR[SO,GTEQ,LT])|(Cond=un)){

if(VSP<4：0>＝0){ if (VSP<4:0>＝0){

VISRC<RASU>＝1；VISRC<RASU>=1;

signal ARM7 with RASU exeeption：Signal ARM7 with RASU exeeption:

VP_STATE＝VP_IDLE；VP_STATE=VP_IDLE;

}else{}else{

VSP<4：0>＝VSP<4：0>-1；VSP<4:0>＝VSP<4:0>-1;

VPC＝RSTACK[VSP<3：0>]；VPC＝RSTACK[VSP<3:0>];

VPC<1：0>＝b′00；VPC<1:0>=b′00;

}}

}else VPC＝VPC+4；}else VPC＝VPC+4;

异常abnormal

无效的指令地址，返回地址堆栈下溢。Invalid instruction address, return address stack underflow.

VCVTB9 byte9数据类型的转换VCVTB9 byte9 data type conversion

格式Format

汇编器句法assembler syntax

VCVTB9.md VRd，VRbVCVTB9.md VRd, VRb

VCVTB9.md SRd，SRbVCVTB9.md SRd, SRb

其中md＝{bb9，b9h，hb9}。where md={bb9, b9h, hb9}.

支持模式 S V＜-V S＜-S MD bb9 b9h hb9 support mode S V<-V S<-S MD bb9 b9h hb9

说明illustrate

Rb中的每个元素从byte转换到byte9(bb9)，从byte9转换到halfword(b9h)或从halfword转换到byte9(hb9)。Each element in Rb is converted from byte to byte9 (bb9), from byte9 to halfword (b9h) or from halfword to byte9 (hb9).

操作operate

if(md<1：0>＝0){//bb9 for byte to byte9 conversionIf(md<1:0>＝0){//bb9 for byte to byte9 conversion

VRd＝VRb；VRd=VRb;

VRd<9i+8>＝VRb<9i+7>，i＝0 to 31(or 63 in VEC64 mode)} VRd<9i+8>＝VRb<9i+7>，i＝0 to 31(or 63 in VEC64 mode)}

else if(md<1：0>＝2){//b9h for byte9 to halfword conversion else if(md<1:0>＝2){//b9h for byte9 to halfword conversion

VRd＝VRb；VRd=VRb;

VRd<18i+16：18i+9>＝VRb<18i+8>，i＝0 to 15(or 31 in VEC64 mode)} VRd<18i+16: 18i+9>＝VRb<18i+8>，i＝0 to 15(or 31 in VEC64 mode)}

else if(md<1：0>＝3)//hb9 for halfword to byte9 conversion else if(md<1:0>＝3)//hb9 for halfword to byte9 conversion

VRd<18i+8>＝VRb<18i+9>，i＝0 to 15(or 31 in VEC64 mode) VRd<18i+8>＝VRb<18i+9>，i＝0 to 15(or 31 in VEC64 mode)

else VRd＝undefuned；else VRd=undefined;

异常abnormal

无。none.

编程注解programming notes

在与b9h模式一同使用该指令之前，要求编程者用shuffle(混洗)操作调整向量寄存器中所减少的元素数目。与hb9模式一同使用该指令之后，要求编程者用unshuffle操作调整目的向量寄存器中所增加的元素数目。此指令不受元素屏蔽的影响。Before using this instruction with b9h mode, the programmer is required to use a shuffle operation to adjust the reduced number of elements in the vector register. After using this instruction with hb9 mode, the programmer is required to use the unshuffle operation to adjust the number of elements added in the destination vector register. This directive is not affected by elemental masking.

VCVTFF 浮点到定点的转换VCVTFF floating-point to fixed-point conversion

格式Format

汇编器句法assembler syntax

VCVTFF VRd，VRa，SRbVCVTFF VRd, VRa, SRb

VCVTFF VRd，VRa，#IMMVCVTFF VRd, VRa, #IMM

VCVTFF SRd，SRa，SRbVCVTFF SRd, SRa, SRb

VCVTFF SRd，SRa，#IMMVCVTFF SRd, SRa, #IMM

支持的模式 D：S：M V＜-V，S V＜-V，I S＜-S，S S＜-S，I supported modes D:S:M V<-V,S V<-V, I S<-S, S S<-S, I

说明illustrate

向量/标量寄存器Ra的内容从32位浮点转换成格式<X，Y>的定点实数，其中Y的长度由Rb(模32)或IMM字段指定，而X的长度由(32-Y的长度)指定。X表示整数部分，Y表示小数部分。结果存放在向量/标量寄存器Rd中。The contents of the vector/scalar register Ra are converted from 32-bit floating-point to fixed-point real numbers of the format <X, Y>, where the length of Y is specified by the Rb (modulo 32) or IMM field, and the length of X is specified by (32-the length of Y ) specified. X represents the integer part, and Y represents the fractional part. The result is stored in vector/scalar register Rd.

操作operate

Y_size＝{SRb％32‖IMM<4：0>}；Y_size={SRb%32‖IMM<4:0>};

for(i＝0；i＜NumElem；i++){for(i=0; i<NumElem; i++){

Rd[i]＝convert to<32-Y_size.Y size>format(Ra[i])；Rd[i]＝convert to<32-Y_size.Y size>format(Ra[i]);

}}

异常abnormal

溢出。overflow.

编程注解programming notes

本指令只支持Word数据长度。由于结构不支持寄存器中的多数据类型，本指令不使用元素屏蔽。对整型数据类型本指令使用舍去零的舍入方式。This command only supports Word data length. Since structures do not support multiple data types in registers, this instruction does not use element masking. For integer data types, this instruction uses the rounding method of omitting zero.

VCVTIF 整数到浮点的转换VCVTIF Integer to floating point conversion

格式Format

汇编器句法assembler syntax

VCVTIF VRd，VRbVCVTIF VRd, VRb

VCVTIF VRd，SRbVCVTIF VRd, SRb

VCVTIF SRd，SRbVCVTIF SRd, SRb

支持的模式 D：S：M V＜-V V＜-S S＜-S supported modes D:S:M V<-V V<-S S<-S

说明illustrate

向量/标量寄存器Rb的内容从int32转换为浮点数据类型，结果存放在向量/标量寄存器Rd中。The contents of the vector/scalar register Rb are converted from int32 to floating-point data type, and the result is stored in the vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem；i++){for(i=0; i<NumElem; i++){

Rd[i]＝convert to floating point format(Rb[i])；Rd[i]＝convert to floating point format(Rb[i]);

}}

异常abnormal

无。none.

编程注解programming notes

本指令仅支持word数据长度。由于结构不支持寄存器中的多数据类型，本指令不使用元素屏蔽。This command only supports word data length. Since structures do not support multiple data types in registers, this instruction does not use element masking.

VD1CBR VCR1减1及条件转移VD1CBR VCR1 minus 1 and conditional transfer

格式Format

汇编器句法assembler syntax

VD1CBR.cond #OffsetVD1CBR.cond #Offset

说明illustrate

如果Cond为真，VCR1减1并转移。这不是延迟转移。If Cond is true, decrement VCR1 by 1 and branch. This is not delayed transfer.

操作operate

VCR1＝VCR1-1；VCR1 = VCR1-1;

If((VCR1＞0)&((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un)))If((VCR1>0)&((Cond=VCSR[SO, GT, EQ, LT])|(Cond=un)))

VPC＝VPC+Sex(Offset<22：0>*4)；VPC＝VPC+Sex(Offset<22:0>*4);

else VPC＝VPC+4；else VPC=VPC+4;

异常abnormal

无效的指令地址。Invalid instruction address.

编程注解programming notes

注意VCR1是在转移条件被检查之前减1的。当VCR1为0时执行此指令，将循环计数有效设置到2³²-1。Note that VCR1 is decremented by 1 before the transition condition is checked. Execute this instruction when VCR1 is 0, and effectively set the loop count to 2 ³² -1.

VD2CBR VCR2减1及条件转移VD2CBR VCR2 minus 1 and conditional transfer

格式Format

汇编器句法assembler syntax

VD2CBR.cond #OffsetVD2CBR.cond #Offset

说明illustrate

如果Cond为真，VCR2减1并转移。这不是延迟转移。If Cond is true, decrement VCR2 by 1 and branch. This is not delayed transfer.

操作operate

VCR2＝VCR2-1；VCR2 = VCR2-1;

If((VCR2＞0)&((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un)))If((VCR2>0)&((Cond=VCSR[SO, GT, EQ, LT])|(Cond=un)))

VPC＝VPC+sex(Offset<22：0>*4)；VPC＝VPC+sex(Offset<22:0>*4);

else VPC＝VPC+4；else VPC=VPC+4;

异常abnormal

无效的指令地址。Invalid instruction address.

编程注解programming notes

注意VCR2是在转移条件被检查之前减1的。当VCR2为0时执行此指令，将循环计数有效设置到2³²-1。Note that VCR2 is decremented by 1 before the transition condition is checked. Execute this instruction when VCR2 is 0, and effectively set the loop count to 2 ³² -1.

VD3CBR VCR3减1及条件转移 VD3CBR VCR3 minus 1 and conditional transfer

格式Format

汇编器句法assembler syntax

VD3CBR.cond #OffsetVD3CBR.cond #Offset

说明illustrate

当Cond为真，VCR3减1并转移。这不是延迟转移。When Cond is true, VCR3 decrements 1 and transfers. This is not delayed transfer.

操作operate

VCR3＝VCR3-1；VCR3=VCR3-1;

If((VCR3＞0)&((Cond＝VCSR[SO，GT，EQ，LT])|(Cond＝un)))If((VCR3>0)&((Cond=VCSR[SO, GT, EQ, LT])|(Cond=un)))

VPC＝VPC+sex(Offset<22：0>*4)；VPC＝VPC+sex(Offset<22:0>*4);

else VPC＝VPC+4；else VPC=VPC+4;

异常abnormal

无效的指令地址。Invalid instruction address.

编程注解programming notes

注意VCR3是在转移条件被检查之前减1的。当VCR3为0时执行此指令，将循环计数有效设置到2³²-1。Note that VCR3 is decremented by 1 before the branch condition is checked. Execute this instruction when VCR3 is 0, and effectively set the loop count to 2 ³² -1.

VDIV2N 被2VDIV2N by 2 ^nno 除remove

格式Format

汇编器句法assembler syntax

VDIV2N.dt VRd，VRa，SRbVDIV2N.dt VRd, VRa, SRb

VDIV2N.dt VRd，VRa，#IMMVDIV2N.dt VRd, VRa, #IMM

VDIV2N.dt SRd，SRa，SRbVDIV2N.dt SRd, SRa, SRb

VDIV2N.dt SRd，SRa，#IMVVDIV2N.dt SRd, SRa, #IMV

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

向量/标量寄存器Ra的内容被2ⁿ除，这里n是标量寄存器Rb或2MM的正整数内容，最终结果存放在向量/标量寄存器Rd中。此指令使用截断(向零舍入)作为舍入模式。The content of the vector/scalar register Ra is divided by 2 ⁿ , where n is the positive integer content of the scalar register Rb or 2MM, and the final result is stored in the vector/scalar register Rd. This instruction uses truncation (rounding towards zero) as the rounding mode.

操作operate

N＝{SRb％32‖IMM<4：0>}；N = {SRb%32‖IMM<4:0>};

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝Ra[i]/2^N；Rd[i]=Ra[i]/ ^2N ;

}}

异常abnormal

无。none.

编程注解programming notes

注意N是从SRb或IMM<4：0>取得的5位数。对byte、byte9、halfword数据类型，编程者负责正确指定数据长度中小于或等于精度级的N值。如果它大于指定数据长度的精度，元素将用符号位填充。本指令用向零舍入的舍入模式。Note that N is a 5-digit number taken from SRb or IMM<4:0>. For byte, byte9, and halfword data types, the programmer is responsible for correctly specifying the N value in the data length that is less than or equal to the precision level. If it is greater than the precision specified for the data length, the element will be padded with a sign bit. This instruction uses a rounding mode that rounds towards zero.

VDIV2N.F 浮点被2VDIV2N.F floating point is 2 ^nno 除remove

格式Format

汇编器句法assembler syntax

VDIV2N.f VRd，VRa，SRbVDIV2N.f VRd, VRa, SRb

VDIV2N.f VRd，VRa，#IMMVDIV2N.f VRd, VRa, #IMM

VDIV2N.f SRd，SRa，SRbVDIV2N.f SRd, SRa, SRb

VDIV2N.f SRd，SRa，#IMMVDIV2N.f SRd, SRa, #IMM

支持的模式 D：S：M V＜-V@S V＜-V@I S＜-S@S S＜-S@I supported modes D:S:M V<-V@S V<-V@I S<-S@S S<-S@I

说明illustrate

向量/标量寄存器Ra的内容被2n除，这里n是标量寄存器Rb或IMM的正整数内容，最终结果存放在向量/标量寄存器Rd中。The content of the vector/scalar register Ra is divided by 2n, where n is the positive integer content of the scalar register Rb or IMM, and the final result is stored in the vector/scalar register Rd.

操作operate

N＝{SRb％32‖IMM<4：0>}；N = {SRb%32‖IMM<4:0>};

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝Ra[i]/2^N；Rd[i]=Ra[i]/ ^2N ;

}}

异常abnormal

无。none.

编程注解programming notes

注意N是从SRb或IMM<4：0>中取得的5位数。Note that N is a 5-digit number taken from SRb or IMM<4:0>.

VDIVI 不完全的除初始化VDIVI incomplete deinitialization

格式Format

汇编器句法assembler syntax

VDIVI.ds VRbVDIVI.ds VRb

VDIVI.ds SRbVDIVI.ds SRb

其中ds＝{b，b9，h，w}。where ds = {b, b9, h, w}.

支持的模式 S VRb SRb DS int8(b) int9(b9) int16(h) int32(w) supported modes S VRb SRb DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

执行不恢复的带符号整数除的初始化步骤。被除数是累加器中双精度带符号整数。如果被除数是单精度数，它必须被符号扩展到双精度，并存放在VACOH和VACOL中。除数是Rb中的单精度带符号整数。Performs an initialization step for signed integer division without recovery. The dividend is a double-precision signed integer in the accumulator. If the dividend is single precision, it must be sign extended to double precision and stored in VACOH and VACOL. The divisor is a single-precision signed integer in Rb.

如果被除数的符号与除数的符号相同，则从累加器高位中减去Rb。如不同，则Rb被加到累加器高位上。If the dividend has the same sign as the divisor, subtract Rb from the high bits of the accumulator. If different, Rb is added to the high bit of the accumulator.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb)Bop[i]={VRb[i]‖SRb)

if(VACOH[i]<msb>＝Bop[i]<msb>)If(VACOH[i]<msb>＝Bop[i]<msb>)

VACOH[i]＝VACOH[i]-Bop[i]；VACOH[i]=VACOH[i]-Bop[i];

elseelse

VACOH[i]＝VACOH[i]+Bop[i]：VACOH[i]=VACOH[i]+Bop[i]:

}}

异常abnormal

无。none.

编程注解programming notes

在除法步骤之前，编程者负责检查溢出或被零除的情况。It is the programmer's responsibility to check for overflow or division by zero before the divide step.

VDOVS 不完全的除步骤VDOVS incomplete removal step

格式Format

汇编器句法assembler syntax

VDIVS.ds VRbVDIVS.ds VRb

VDIVS.ds SRbVDIVS.ds SRb

其中ds＝{b，b9，h，w}。where ds = {b, b9, h, w}.

说明illustrate

执行一不恢复的带符号除的选代步骤。本指令要求被执行的次数与同数据的长度相同(例如，对int8为8次，int9为9次，int16为16，int32数据类型为32)。VDIVI指令必须在除步骤之前被使用一次，在累加器中产生初始部分的余数。除数是Rb中的单精度带符号整数。每个步骤提取一个商数位并移到累加器的最低有效位。Perform an alternative step for signed division without recovery. The number of times this instruction is required to be executed is the same as the length of the same data (for example, 8 times for int8, 9 times for int9, 16 times for int16, and 32 times for int32 data type). The VDIVI instruction must be used once before the divide step to produce the remainder of the initial portion in the accumulator. The divisor is a single-precision signed integer in Rb. Each step extracts a quotient bit and moves to the least significant bit of the accumulator.

如果累加器中部分余数的符号与Rb中除数的符号相同，则从累加器高位中减去Rb。如果不同，Rb被加到累加器高位上。If the partial remainder in the accumulator has the same sign as the divisor in Rb, subtract Rb from the upper bits of the accumulator. If different, Rb is added to the high bit of the accumulator.

如果累加器中得出的部分余数(加或减的结果)的符号与除数的符号相同，则商位为1。如果不相同，则商位为0。累加器左移一位的位置并用商位填入。The quotient is 1 if the partial remainder (the result of addition or subtraction) in the accumulator has the same sign as the divisor. If not the same, the quotient is 0. The accumulator is shifted one position to the left and filled with the quotient.

在除步骤的结尾，余数是在累加器高位中，而商在累加器低位中。此商是1的补码形式。At the end of the division step, the remainder is in the high bit of the accumulator and the quotient is in the low bit of the accumulator. This quotient is in 1's complement form.

操作operate

VESL 元素左移一位VESL element shift left one

格式Format

汇编器句法assembler syntax

VESL.dt SRc，VRd，VRa，SRbVESL.dt SRc, VRd, VRa, SRb

其中dt＝{b，b9，h，w，f}。注意.w和.f指定相同的操作。where dt = {b, b9, h, w, f}. Note that .w and .f specify the same operation.

支持的模式 S SRb DS int8(b) int9(b9) int16(h) int32(w) supported modes S SRb DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

将向量寄存器Ra中的元素左移一个位置，从标量寄存器Rb填入。正被移出的最左元素返回到标量寄存器Rc，其它元素返回到向量寄存器Rd。Shift the element in the vector register Ra to the left by one position, and fill it from the scalar register Rb. The leftmost element being shifted out is returned to scalar register Rc, the other elements are returned to vector register Rd.

操作operate

VRd[0]＝SRb；VRd[0] = SRb;

for(i＝o；i＜NumElem-1；i++)for(i=o; i<NumElem-1; i++)

VRd[i]＝VRa[i-1]；VRd[i]=VRa[i-1];

SRc＝VRa[NumElem-1]；SRc=VRa[NumElem-1];

异常abnormal

无。none.

编程注解programming notes

VESR 元素右移一位The VESR element is shifted one bit to the right

格式Format

汇编器句法assembler syntax

VESL.dt SRc，VRd，VRa，SRbVESL.dt SRc, VRd, VRa, SRb

说明illustrate

将向量寄存器Ra的元素右移一个位置，从标量寄存器Rb填入。正被移出的最右元素返回到标量寄存器Rc，其它元素返回到向量寄存器Rd。Shift the elements of the vector register Ra one place to the right and fill them from the scalar register Rb. The rightmost element being shifted out is returned to scalar register Rc, the other elements are returned to vector register Rd.

操作operate

SRc＝VRa[0]；SRc=VRa[0];

for(i＝o；i＜NumElem-2；i++)for(i=o; i<NumElem-2; i++)

VRd[i]＝VRa[i+1]；VRd[i]=VRa[i+1];

VRd[NumElem-1]＝SRb；VRd[NumElem-1] = SRb;

异常abnormal

无。none.

编程注解programming notes

VEXTRT 抽取一元素VEXTRT Extract an element

格式Format

汇编器句法assembler syntax

VEXTRT.dt SRd，VRa，SRbVEXTRT.dt SRd, VRa, SRb

VEXTRT.dt SRd，VRa，#IMMVEXTRT.dt SRd, VRa, #IMM

支持的模式 D：S：M S＜-S S＜-I DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M S<-S S<-I DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

从Ra向量寄存器抽取一元素并将其存入标量寄存器Rd，该寄存器的索引由标量寄存器Rb或IMM字段指出。An element is extracted from the Ra vector register and stored into the scalar register Rd whose index is indicated by the scalar register Rb or IMM field.

操作operate

index32＝{SRb％32‖IMM<4：0>}；index32={SRb%32‖IMM<4:0>};

index64＝{SRb％64‖IMM<5：0>}；index64={SRb%64‖IMM<5:0>};

index＝(VCSR<vec64>)？index64：index32；index=(VCSR<vec64>)? index64: index32;

SRd＝VRa[index]；SRd=VRa[index];

异常abnormal

无。none.

编程注解programming notes

VEXTSGN2 抽取(1，-1)符号VEXTSGN2 extracts (1, -1) symbols

格式Format

汇编器句法assembler syntax

VEXTSGN2.dt VRd，VRaVEXTSGN2.dt VRd, VRa

VEXTSGN2.dt SRd，SRaVEXTSGN2.dt SRd, SRa

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

支持的模式 S V＜-V S＜-S DS int8(b) int9(b9) int16(h) int32(w) supported modes S V<-V S<-S DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

计算向量/标量寄存器Ra元素方式内容的符号值，将结果存放到向量/标量寄存器Rd中。Calculate the symbolic value of the element-wise content of the vector/scalar register Ra, and store the result in the vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝(Ra[i]＜0)？-1：1；Rd[i]=(Ra[i]<0)? -1:1;

}}

异常abnormal

无。none.

VEXTSGN3抽取(1，0，-1)符号VEXTSGN3 extracts (1, 0, -1) symbols

格式Format

汇编器句法assembler syntax

VEXTSGN3.dt VRd，VRaVEXTSGN3.dt VRd, VRa

VEXTSGN3.dt SRd，SRaVEXTSGN3.dt SRd, SRa

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

if(Ra[i]＞0) Rd[i]＝1；If(Ra[i]＞0) Rd[i]＝1;

else if(Ra[i]＜0) Rd[i]＝-1； else if(Ra[i]＜0) Rd[i]＝-1;

else Rd[i]＝0；else Rd[i]=0;

}}

异常abnormal

无。none.

VINSRT 插入一元素VINSRT Insert an element

格式Format

汇编器句法assembler syntax

VINSRT.dt VRd，SRa，SRbVINSRT.dt VRd, SRa, SRb

VINSRT.dt VRd，SRa，#IMMVINSRT.dt VRd, SRa, #IMM

支持的模式 D：S：M V＜-S V＜-I DS int8(b) int9(b9) int16(h) int32(w) supported modes D:S:M V<-S V<-I DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

将标量寄存器Ra中的元素按标量寄存器Rb或IMM字段指定的索引插入到向量寄存器Rd中。Inserts the element in scalar register Ra into vector register Rd at the index specified by the scalar register Rb or IMM field.

操作operate

index32＝{SRb％32‖IM4<4：0>}；index32={SRb%32‖IM4<4:0>};

index64＝{SRb％64‖IMM<5：0>}；index64={SRb%64‖IMM<5:0>};

VRd[index]＝SRa；VRd[index] = SRa;

异常abnormal

无。none.

编程注解programming notes

本指令不受元素屏蔽的影响。This directive is unaffected by elemental shielding.

VL 加载VL loading

格式Format

汇编器句法assembler syntax

VL.lt Rd，SRb，SRiVL.lt Rd, SRb, SRi

VL.lt Rd，SRb，#IMMVL.lt Rd, SRb, #IMM

VL.lt Rd，SRb+，SRiVL.lt Rd, SRb+, SRi

VL.lt Rd，SRb+，#IMMVL.lt Rd, SRb+, #IMM

其中lt＝{b，bz9，bs9，h，w，4，8，16，32，64}，Rd＝{VRd，VRAd，SRd}。注意.b和.bs9指定相同的操作，.64和VRAd不能一起被指定。对cache-off加载使用VLOFF。where lt={b, bz9, bs9, h, w, 4, 8, 16, 32, 64}, Rd={VRd, VRAd, SRd}. Note that .b and .bs9 specify the same operation, and .64 and VRAd cannot be specified together. Use VLOFF for cache-off loads.

操作operate

加载当前或替代组中的一向量寄存器或者一标量寄存器。Load a vector register or a scalar register in the current or alternate bank.

操作operate

EA＝SRb+{SRi‖Sex(IMM<7：0>)}；EA=SRb+{SRi‖Sex(IMM<7:0>)};

if(A＝1)SRb＝EA；if(A=1)SRb=EA;

Rd＝见下表： LT 加载操作 .b SR_d<7：0>：＝BYTE[EA] .bz9 SR_d<8：0>＝zex BYTE[EA] .bs9 SR₄<8：0>＝sex BYTE[EA] .h SR_d<15：0>＝HALF[EA] .w SR_d<31：0>＝WORD[EA] .4 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 3 .8 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 7 .16 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 15 .32 VR_d<9i+8：9i>＝sex BYT[EA+i]，i＝0 to 31 .64 VR_0d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31VR_1d<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31 Rd = see the table below: LT load operation .b SR _d <7:0>:=BYTE[EA] .bz9 SR _d <8:0> = zex BYTE[EA] .bs9 SR ₄ <8:0> = sex BYTE[EA] .h _SRd <15:0>=HALF[EA] .w SR _d <31:0> = WORD[EA] .4 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 3 .8 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 7 .16 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 15 .32 VR _d <9i+8:9i>=sex BYT[EA+i], i=0 to 31 .64 VR _0d <9i+8:9i>=sex BYTE[EA+i], i=0 to 31 _{VR 1d} <9i+8:9i>=sex BYTE[EA+32+i], i=0 to 31

异常abnormal

无效的数据地址，不对齐的访问。Invalid data address, misaligned access.

编程注解programming notes

VLCB 从循环缓冲器加载VLCB loads from circular buffer

格式Format

汇编器句法assembler syntax

VLCB.lt Rd，SRb，SRiVLCB.lt Rd, SRb, SRi

VLCB.lt Rd，SRb，#IMMVLCB.lt Rd, SRb, #IMM

VLCB.lt Rd，SRb+，SRiVLCB.lt Rd, SRb+, SRi

VLCB.lt Rd，SRb+，#IMMVLCB.lt Rd, SRb+, #IMM

其中lt＝{b，bz9，bs9，h，w，4，8，16，32，64}，Rd＝{VRd，VRAd，SRd}。注意.b和.bs9指定相同的操作，.64和VRAd不能被一起指定。对cache-off加载使用VLCBOFF。where lt={b, bz9, bs9, h, w, 4, 8, 16, 32, 64}, Rd={VRd, VRAd, SRd}. Note that .b and .bs9 specify the same operation, and .64 and VRAd cannot be specified together. Use VLCBOFF for cache-off loads.

说明illustrate

从由SR_b+1中的BEGIN指针和SR_b+2中的END指针所限定的循环缓冲器加载一向量或标量寄存器。Loads a vector or scalar register from the circular buffer bounded by the BEGIN pointer in SR _b+1 and the END pointer in SR _b+2 .

在加载以及地址更新操作之前，如有效地址大于END地址，则有效地址被调整。另外，对.h和.w标量加载，循环缓冲器边界必须分别与halfword和word边界对齐。Before the load and address update operations, if the effective address is greater than the END address, the effective address is adjusted. Additionally, for .h and .w scalar loads, circular buffer boundaries must be aligned to halfword and word boundaries, respectively.

操作operate

EA＝SR_b+{SRi‖sex(IMM<7：0>)}；EA=SR _b +{SRi‖sex(IMM<7:0>)};

BEGIN＝SR_b+1；BEGIN=SR _b+1 ;

END＝SR_b+2；END=SR _b+2 ;

cbsize＝END-BEGIN；cbsize=END-BEGIN;

if(EA＞END)EA＝BEGIN+(EA-END)；if(EA>END)EA=BEGIN+(EA-END);

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

R_d＝见以下的表： LT 加载操作 .bz9 SR_d<8：0>＝zex BYTE[EA] .bs9 SR_d<8：0>＝sex BYTE[EA] .h SR_d<15：0>＝HALF[EA] .w SR_d<31：0>＝WORD[EA] .4 VR_d<9i+8：9i>＝sex BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]，i＝0 to 3 .8 VR_d<9i+8：9i>＝sex BYTE[(EA+i>END)？EA+i-cbsize：EA+i]，i＝0 to 7 LT 加载操作 .16 VR_d<9i+8：9i>＝sex BYTE[(EA+i>END)？EA+i-cbsize：EA+i]，i＝0 to 15 .32 VR_d<9i+8：9i>＝sex BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]，i＝0 to 31 .64 VR_0d<9i+8：9i>＝sex BYTE[(EA+i>END)？EA+i-cbsize：EA+i]，i＝0 to 31VR_1d<9i+8：9i>＝sex BYTE[(EA+32+i＞END)？EA+32+i-cbsize：EA+32+i]，i＝0 to 31 R _d = see the table below: LT load operation .bz9 SR _d <8:0> = zex BYTE[EA] .bs9 SR _d <8:0> = sex BYTE[EA] .h _SRd <15:0>=HALF[EA] .w SR _d <31:0> = WORD[EA] .4 VR _d <9i+8:9i>=sex BYTE[(EA+i>END)? EA+i-cbsize: EA+i], i=0 to 3 .8 VR _d <9i+8:9i>=sex BYTE[(EA+i>END)? EA+i-cbsize: EA+i], i=0 to 7 LT load operation .16 VR _d <9i+8:9i>=sex BYTE[(EA+i>END)? EA+i-cbsize: EA+i], i=0 to 15 .32 VR _d <9i+8:9i>=sex BYTE[(EA+i>END)? EA+i-cbsize: EA+i], i=0 to 31 .64 VR _0d <9i+8:9i>=sex BYTE[(EA+i>END)? EA+i-cbsize:EA+i], i=0 to 31VR _1d <9i+8:9i>=sex BYTE[(EA+32+i>END)? EA+32+i-cbsize: EA+32+i], i=0 to 31

异常abnormal

编程注解programming notes

编程者必须对此指令确定下面的条件以按所期望工作：The programmer must determine the following conditions for this instruction to work as expected:

BEGIN＜EA＜2*END-BEGINBEGIN＜EA＜2*END-BEGIN

即，EA＞BEGIN以及EA-END＜END-BEGIN。That is, EA>BEGIN and EA-END<END-BEGIN.

VLD 双加载VLD double load

格式Format

汇编器句法assembler syntax

VLD.lt Rd，SRb，SRiVLD.lt Rd, SRb, SRi

VLD.lt Rd，SRb，#IMMVLD.lt Rd, SRb, #IMM

VLD.lt Rd，SRb+，SRiVLD.lt Rd, SRb+, SRi

VLD.lt Rd，SRb+，#IMMVLD.lt Rd, SRb+, #IMM

其中lt＝{b，bz9，bs9，h，w，4，8，16，32，64}，Rd＝{VRd，VRAd，SRd}。注意.b和.bs9指定相同的操作，.64和VRAd不能被一起指定。对cache-off加载使用VLDOFF。where lt={b, bz9, bs9, h, w, 4, 8, 16, 32, 64}, Rd={VRd, VRAd, SRd}. Note that .b and .bs9 specify the same operation, and .64 and VRAd cannot be specified together. Use VLDOFF for cache-off loads.

说明illustrate

加载当前或替代组中的两个向量寄存器或两个标量寄存器。Load two vector registers or two scalar registers in the current or alternate bank.

操作operate

EA＝SR_b+{SR_i‖Sex(IMM<7：0>)}；EA=SR _b +{SR _i ‖Sex(IMM<7:0>)};

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

R_d：R_d+1＝见下表： LT 加载操作 .bz9 SR_d<8：0>＝zex BYTE[EA]SR_d+1<8：0>＝zex BYTE[EA+1] .bs9 SR_d<8：0>＝zex BYTE[EA]SR_d+1<8：0>＝zex BYTE[EA+1] .h SR_d<15：0>＝HALF[EA]SR_d+1<15：0>＝HALF[EA+2] .w SR_d<31：0>＝WORD[EA]SR_d+1<31：0>＝WORD[EA+4] .4 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 3VR_d+1<9i+8：9i>＝sex BYTE[EA+4+i]，i＝0 to 3 .8 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 7VR_d+1<9i+8：9i>＝sex BYTE[EA+8+i]，i＝0 to 7 LT 加载操作 .16 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 15VR_d+1<9i+8：9i>＝sex BYTE[EA+16+i]，i＝0 to 15 .32 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31VR_d+1<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31 .64 VR_0d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31VR_1d<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31VR_0d+1<9i+8：9i>＝sex BYTE[EA+64+i]，i＝0 to 31VR_1d+1<9i+8：9i>＝sex BYTE[EA+96+i]，i＝0 to 31 R _d : R _d+1 = see the table below: LT load operation .bz9 SR _d <8:0>=zex BYTE[EA] SR _d+1 <8:0>=zex BYTE[EA+1] .bs9 SR _d <8:0>=zex BYTE[EA] SR _d+1 <8:0>=zex BYTE[EA+1] .h SR _d <15:0>=HALF[EA]SR _d+1 <15:0>=HALF[EA+2] .w SR _d <31:0>=WORD[EA]SR _d+1 <31:0>=WORD[EA+4] .4 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 3 VR _d+1 <9i+8:9i>=sex BYTE[EA+4+i], i=0 to 3 .8 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 7 VR _d+1 <9i+8:9i>=sex BYTE[EA+8+i], i=0 to 7 LT load operation .16 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 15 VR _d+1 <9i+8:9i>=sex BYTE[EA+16+i], i=0 to 15 .32 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 31 _{VR d+1} <9i+8:9i>=sex BYTE[EA+32+i], i=0 to 31 .64 VR _0d <9i+8:9i>=sex BYTE[EA+i], i=0 to 31VR _1d <9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31VR _0d+1 <9i+8:9i>=sex BYTE[EA+64+i], i=0 to 31VR _1d+1 <9i+8:9i>=sex BYTE[EA+96+i], i=0 to 31

异常abnormal

编程注解programming notes

VLI 加载立即数VLI load immediate

格式Format

汇编器句法assembler syntax

VLI.dt VRd，#IMMVLI.dt VRd, #IMM

VLI.dt SRd，#IMMVLI.dt SRd, #IMM

其中dt＝{b，b9，h，w，f}。where dt = {b, b9, h, w, f}.

说明illustrate

向标量或向量寄存器加载立即值。Load an immediate value into a scalar or vector register.

对标量寄存器加载，根据数据类型加载byte、byte9、halfword或word。对byte、byte9和halfword数据类型，未受影响的那些byte(byte9)不被改变。For scalar register loads, load byte, byte9, halfword, or word depending on the data type. For byte, byte9, and halfword data types, those bytes (byte9) that are not affected are not changed.

操作operate

Rd＝见下表： DT 标量加载向量加载 .i8 SR_d<7：0>＝IMM<7：0> VR_d＝32 int8 elements .i9 SR_d<8：0>＝IMM<8：0> VR_d＝32 int9 elements .i16 SR_d<15：0>＝IMM<15：0> VR_d＝16 int16 elements .i32 SR_d<31：0>＝sex IMM<18：0> VR_d＝8 int32 elcments .f SR_d<31>＝IMM<18>(sign)SR_d<30：23>＝IMM<17：10>(exponent)SR_d<22：13>＝IMM<9：0>(mantissa)SR_d<12：0>＝zeroes VR_d＝8 float elements Rd = see the table below: DT scalar load vector loading .i8 _SRd <7:0> = IMM<7:0> VR _d = 32 int8 elements .i9 _SRd <8:0>=IMM<8:0> VR _d = 32 int9 elements .i16 _SRd <15:0>=IMM<15:0> VR _d = 16 int16 elements .i32 SR _d <31:0>=sex IMM<18:0> VR _d = 8 int32 elcments .f SR _d <31> = IMM <18> (sign) SR _d <30:23> = IMM <17:10> (exponent) SR _d <22:13> = IMM <9:0> (mantissa) SR _d < 12: 0>=zeroes VR _d = 8 float elements

异常abnormal

无。none.

VLQ 四加载VLQ Quad load

格式Format

汇编器句法assembler syntax

VLQ.lt Rd，SRb，SRiVLQ.lt Rd, SRb, SRi

VLQ.lt Rd，SRb，#IMMVLQ.lt Rd, SRb, #IMM

VLQ.lt Rd，SRb+，SRiVLQ.lt Rd, SRb+, SRi

VLQ.lt Rd，SRb+，#IMMVLQ.lt Rd, SRb+, #IMM

其中lt＝{b，bz9，bs9，h，w，4，8，16，32，64}，Rd＝{VRd，VRAd，SRd}。注意.b和.bs9指定相同的操作，.64和VRAd不能被一起指定。对Cache-off加载利用VLQOFF。where lt={b, bz9, bs9, h, w, 4, 8, 16, 32, 64}, Rd={VRd, VRAd, SRd}. Note that .b and .bs9 specify the same operation, and .64 and VRAd cannot be specified together. Use VLQOFF for cache-off loading.

说明illustrate

在当前或替代组中加载四个向量寄存器或四个标量寄存器。Load four vector registers or four scalar registers in the current or alternate bank.

操作operate

EA＝SR_b+{SR_i‖sex(IMM<7：0>)]；EA=SR _b +{SR _i ‖sex(IMM<7:0>)];

if(A＝1)SR_b＝EA；；if(A=1) SR _b = EA;

R_d：R_d+1：R_d+2：R_d+3＝见下表： LT 加载操作 .bz9 SR_d<8：0>＝zex BYTE[EA]SR_d+1<8：0>＝zex BYTE[EA+1]SR_d+2<8：0>＝zex BYTE[EA+2]SR_d3<8：0>＝zex BYTE[EA+3] .bs9 SR_d<8：0>＝zex BYTE[EA]SR_d+1<8：0>＝zex BYTE[EA+1]SR_d+2<8：0>＝zex BYTE[EA+2]SR_d+3<8：0>＝zex BYTE[EA+3] .h SR_d<15：0>＝HALF[EA]SR_d+1<15：0>＝HALF[EA+2]SR_d+2<15：0>＝HALF[EA+4]SR_d+3<15：0>＝HALF{EA+6] LT 加载操作 .w SR_d<31：0>＝WORD[EA]SR_d+1<31：0>＝WORD[EA+4]SR_d+2<31：0>＝WORD[EA+8]SR_d+3<31：0>＝WORD[EA+12] .4 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 3VR_d+1<9i+8：9i>＝sex BYTE[EA+4+i]，i＝0 to 3VR_d+2<9i+8：9i>＝sex BYTE[EA+8+i]，i＝0 to 3VR_d+3<9i+8：9i>＝sex BYTE[EA+12+i]，i＝0 to 3 .8 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 7VR_d+1<9i+8：9i>＝sex BYTE[EA+8+i]，i＝0 to 7VR_d+2<9i+8：9i>＝sex BYTE[EA+16+i]，i＝0 to 7VR_d+3<9i+8：9i>＝sex BYTE[EA+24+i]，i＝0 to 7 .16 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 15VR_d+1<9i+8：9i>＝sex BYTE[EA+16+i]，i＝0 to 15VR_d+2<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 15VR_d+3<9i+8：9i>＝sex BYTE[EA+48+i]，i＝0 to 15 .32 VR_d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31VR_d+1<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31VR_d+2<9i+8：9i>＝sex BYTE[EA464+i]，i＝0 to 31VR_d+3<9i+8：9i>＝sex BYTE[EA+96+i]，i＝0 to 31 .64 VR_0d<9i+8：9i>＝sex BYTE[EA+i]，i＝0 to 31VR_1d<9i+8：9i>＝sex BYTE[EA+32+i]，i＝0 to 31VR_0d+1<9i+8：9i>＝sex BYTE[EA+64+i]，i＝0 to 31VR_1d+1<9i+8：9i>＝sex BYTE[EA+96+i]，i＝0 to 31VR_0d+2<9i+8：9i>＝sex BYTE[EA+128+i]，i＝0 to 31VR_1d+2<9i+8：9i>＝sex BYTE[EA+160+i]，i＝0 to 31VR_0d+3<9i+8：9i>＝sex BYTE[EA+192+i]，i＝0 to 31VR_1d+3<9i+8：9i>＝sex BYTE[EA+224+i]，i＝0 to 31 R _d : R _d+1 : R _d+2 : R _d+3 = see the table below: LT load operation .bz9 SR _d <8:0>=zex BYTE[EA]SR _d+1 <8:0>=zex BYTE[EA+1]SR _d+2 <8:0>=zex BYTE[EA+2]SR _d3 < 8: 0>= zex BYTE[EA+3] .bs9 SR _d <8:0>=zex BYTE[EA]SR _d+1 <8:0>=zex BYTE[EA+1]SR _d+2 <8:0>=zex BYTE[EA+2]SR _{d+ 3} <8:0>=zex BYTE[EA+3] .h SR _d <15:0>=HALF[EA]SR _d+1 <15:0>=HALF[EA+2]SR _d+2 <15:0>=HALF[EA+4]SR _d+3 <15 :0>=HALF{EA+6] LT load operation .w SR _d <31:0>=WORD[EA]SR _d+1 <31:0>=WORD[EA+4]SR _d+2 <31:0>=WORD[EA+8]SR _d+3 <31 : 0>=WORD[EA+12] .4 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 3VR _d+1 <9i+8:9i>=sex BYTE[EA+4+i], i=0 to 3VR _{d +2} <9i+8:9i>=sex BYTE[EA+8+i], i=0 to 3VR _d+3 <9i+8:9i>=sex BYTE[EA+12+i], i=0 to 3 .8 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 7VR _d+1 <9i+8:9i>=sex BYTE[EA+8+i], i=0 to 7VR _{d +2} <9i+8:9i>=sex BYTE[EA+16+i], i=0 to 7VR _d+3 <9i+8:9i>=sex BYTE[EA+24+i], i=0 to 7 .16 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 15VR _d+1 <9i+8:9i>=sex BYTE[EA+16+i], i=0 to 15VR _{d +2} <9i+8:9i>=sex BYTE[EA+32+i], i=0 to 15VR _d+3 <9i+8:9i>=sex BYTE[EA+48+i], i=0 to 15 .32 VR _d <9i+8:9i>=sex BYTE[EA+i], i=0 to 31VR _d+1 <9i+8:9i>=sex BYTE[EA+32+i], i=0 to 31VR _{d +2} <9i+8:9i>=sex BYTE[EA464+i], i=0 to 31VR _d+3 <9i+8:9i>=sex BYTE[EA+96+i], i=0 to 31 .64 VR _0d <9i+8:9i>=sex BYTE[EA+i], i=0 to 31VR _1d <9i+8:9i>=sex BYTE[EA+32+i],i=0 to 31VR _0d+1 <9i+8:9i>=sex BYTE[EA+64+i], i=0 to 31VR _1d+1 <9i+8:9i>=sex BYTE[EA+96+i], i=0 to 31VR _{0d +2} <9i+8:9i>=sex BYTE[EA+128+i], i=0 to 31VR _1d+2 <9i+8:9i>=sex BYTE[EA+160+i], i=0 to 31VR _0d+3 <9i+8:9i>=sex BYTE[EA+192+i], i=0 to 31VR _1d+3 <9i+8:9i>=sex BYTE[EA+224+i], i= 0 to 31

异常abnormal

编程注解programming notes

VLR 反向加载VLR reverse loading

格式Format

汇编器句法assembler syntax

VLR.lt Rd，SRb，SRiVLR.lt Rd, SRb, SRi

VLR.lt Rd，SRb，#IMMVLR.lt Rd, SRb, #IMM

VLR.lt Rd，SRb+，SRiVLR.lt Rd, SRb+, SRi

VLR.lt Rd，SRb+，#IMMVLR.lt Rd, SRb+, #IMM

其中lt＝{4，8，16，32，64}，Rd＝{VRd，VRAd}。注意.64和VRAd不能被一起指定。对Cache-off加载利用VLROFF。where lt={4, 8, 16, 32, 64}, Rd={VRd, VRAd}. Note that .64 and VRAd cannot be specified together. Use VLROFF for Cache-off loading.

说明illustrate

按逆元素序列加载一向量寄存器。此指令不支持标量目的寄存器。Load a vector register in reverse order of elements. This instruction does not support scalar destination registers.

操作operate

EA＝SR_b+{SR_i‖sex(IMM<7：0>)]；EA=SR _b +{SR _i ‖sex(IMM<7:0>)];

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

Rd＝见下表： LT 加载操作 .4 VR_d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 3 .8 VR_d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 7 .16 VE_d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 15 .32 VR_d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 31 .64 VR_0d[31-i]<8：0>＝sex BYTE[EA+32+i]，i＝0 to 31VR_1d[31-i]<8：0>＝sex BYTE[EA+i]，i＝0 to 31 Rd = see the table below: LT load operation .4 VR _d [31-i]<8:0>=sex BYTE[EA+i], i=0 to 3 .8 VR _d [31-i]<8:0>=sex BYTE[EA+i], i=0 to 7 .16 VE _d [31-i]<8:0>=sex BYTE[EA+i], i=0 to 15 .32 VR _d [31-i]<8:0>=sex BYTE[EA+i], i=0 to 31 .64 VR _0d [31-i]<8:0>=sex BYTE[EA+32+i], i=0 to 31VR _1d [31-i]<8:0>=sex BYTE[EA+i], i= 0 to 31

异常abnormal

无效的数据地址地址，不对齐的访问。Invalid data address address, unaligned access.

编程注解programming notes

VLSL 逻辑左移VLSL logical shift left

格式Format

汇编器句法assembler syntax

VLSL.dt VRd，VRa，SRbVLSL.dt VRd, VRa, SRb

VLSL.dt VRd，VRa，#IMMVLSL.dt VRd, VRa, #IMM

VLSL.dt SRd，SRa，SRbVLSL.dt SRd, SRa, SRb

VLSL.dt SRd，SRa，#IMMVLSL.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

向量/标量寄存器Ra的每个元素向左逻辑移位，最低有效位(LSB)位置以零填入，移位量在标量寄存器Rb或IMM字段中给定，结果存放在向量/标量寄存器Rd中。Each element of the vector/scalar register Ra is logically shifted to the left, the least significant bit (LSB) position is filled with zeros, the shift amount is given in the scalar register Rb or IMM field, and the result is stored in the vector/scalar register Rd .

操作operate

shift_amount＝{SRb％32‖IMM<4：0>}； shift_amount={SRb%32‖IMM<4:0>};

for(i＝0；i＜NumElem && EMASK[i]：i++){`` for(i=0; i<NumElem && EMASK[i]:i++){

Rd[i]＝Ra[i]＜＜shift_amount；Rd[i]=Ra[i]<<shift_amount;

}}

异常abnormal

无。none.

编程注解programming notes

注意shift-amount是从SRb或IMM<4：0>中取得的5位数，对于byte，byte9，halfword数据类型，编程者负责正确指定小于或等于数据长度的位数的移位量。如果该移位量大于指定的数据长度，元素将以零来填充。Note that shift-amount is a 5-digit number obtained from SRb or IMM<4:0>. For byte, byte9, and halfword data types, the programmer is responsible for correctly specifying the shift amount of the number of bits less than or equal to the data length. If the shift amount is greater than the specified data length, the element will be filled with zeros.

VLSR 逻辑右移VLSR logical shift right

格式Format

汇编器句法assembler syntax

VLSR.dt VRd，VRa，SRbVLSR.dt VRd, VRa, SRb

VLSR.dt VRd，VRa，#IMMVLSR.dt VRd, VRa, #IMM

VLSR.dt SRd，SRa，SRbVLSR.dt SRd, SRa, SRb

VLSR.dt SRd，SRa，#IMMVLSR.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

向量/标量寄存器Ra的每个元素向右逻辑移位，最高有效位(MSB)位置以零填入，移位量在标量寄存器Rb或IMM字段中给定，结果存放在向量/标量寄存器Rd中。Each element of the vector/scalar register Ra is logically shifted to the right, the most significant bit (MSB) position is filled with zeros, the shift amount is given in the scalar register Rb or IMM field, and the result is stored in the vector/scalar register Rd .

操作operate

shift_amount＝{SRb％32‖IMM<4：0>}；shift_amount={SRb%32‖IMM<4:0>};

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝Ra[i]zero＞＞shift_amount；Rd[i]=Ra[i]zero＞＞shift_amount;

}}

异常abnormal

无。none.

编程注解programming notes

VLWS 跨距加载VLWS span loading

格式Format

汇编器句法assembler syntax

VLWS.dt Rd，SRb，SRiVLWS.dt Rd, SRb, SRi

VLWS.dt Rd，SRb，#IMMVLWS.dt Rd, SRb, #IMM

VLWS.dt Rd，SRb+，SRiVLWS.dt Rd, SRb+, SRi

VLWS.dt Rd，SRb+，#IMMVLWS.dt Rd, SRb+, #IMM

其中dt＝{4，8，16，32}，Rd＝{VRd，VRAd}。注意，.64模式不被支持，用VL替代。对Cache-off加载使用VLWSOFF。where dt={4, 8, 16, 32}, Rd={VRd, VRAd}. Note that .64 mode is not supported, use VL instead. Use VLWSOFF for cache-off loads.

说明illustrate

从有效地址开始，用标量寄存器SRb+1作为跨距控制寄存器，从存储器加载32字节到向量寄存器VRd。Starting at an effective address, load 32 bytes from memory into vector register VRd using scalar register SRb+1 as the stride control register.

LT指定block size、对每个块加载的连续字节数。SRb+1指定stride、分隔两个连续块的起始的字节数。LT specifies the block size, the number of consecutive bytes loaded for each block. SRb+1 specifies the stride, the number of bytes separating the beginning of two consecutive blocks.

stride必须等于或大于block size。EA必须是对准的数据长度。stride和block size必须是多种数据长度。stride must be equal to or greater than block size. EA must be the aligned data length. stride and block size must be multiple data lengths.

操作operate

EA＝SR_b+{SR_i‖sex(IMM<7：0>)}；EA=SR _b +{SR _i ‖sex(IMM<7:0>)};

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

Block-size＝{4‖8‖16‖32}；Block-size={4‖8‖16‖32};

Stride＝SR_b+1<31：0>；Stride = SR _b+1 <31:0>;

for(i＝0；i＜VECSIZE/Block-size；i++)for(i=0; i<VECSIZE/Block-size; i++)

for(j＝0；j＜Block-size；j++)for(j=0; j<Block-size; j++)

VRd[i*Block-size+j]<8：0>＝sex BYTE{EA+i*StrideVRd[i*Block-size+j]<8:0>＝sex BYTE{EA+i*Stride

+j}；+j};

异常abnormal

无效的数据地址，未对齐的访问。Invalid data address, unaligned access.

VMAC 乘和累加VMAC multiply and accumulate

格式Format

汇编器句法assembler syntax

VMAC.dt VRa，VRbVMAC.dt VRa, VRb

VMAC.dt VRa，SRbVMAC.dt VRa, SRb

VMAC.dt VRa，#IMMVMAC.dt VRa, #IMM

VMAC.dt SRa，SRbVMAC.dt SRa, SRb

VMAC.dt SRa，#IMMVMAC.dt SRa, #IMM

其中dt＝{b，h，w，f}。where dt = {b, h, w, f}.

支持的模式 D：S：M V＜-V@V V＜-V@S V＜-V@I S＜-S@S S＜-S@I DS int8(b) int16(h) int32(w) float(f) supported modes D:S:M V<-V@V V<-V@S V<-V@I S<-S@S S<-S@I DS int8(b) int16(h) int32(w) float(f)

说明illustrate

Ra的每个元素与Rb中的每个元素相乘以产生一双精度的中间结果；将此中间结果的每个双精度元素与向量累加器的每个双精度元素相加，将每个元素的双精度和存入向量累加器中。Each element of Ra is multiplied by each element in Rb to produce a double-precision intermediate result; each double-precision element of this intermediate result is added to each double-precision element of the vector accumulator, and each element's The double precision sum is stored in the vector accumulator.

Ra和Rb使用指定的数据类型，而VAC使用合适的双精度数据类型(16、32和64位分别对应int8、int16和int32)。每个双精度元素的高位部分存放在VACH中。Ra and Rb use the specified data types, while VAC uses the appropriate double data type (16, 32, and 64 bits correspond to int8, int16, and int32, respectively). The high-order part of each double-precision element is stored in VACH.

对浮点数据类型，全部操作数和结果都是单精度的。For floating-point data types, all operands and results are single-precision.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++)(for(i=0; i<NumElem &&EMASK[i]; i++)(

Aop[i]＝{VRa[i]‖SRa}；Aop[i]={VRa[i]‖SRa};

Bop[i]＝{VRb[i]‖SRb)；Bop[i]={VRb[i]‖SRb);

if(dt＝float)VACL[i]＝Aop[i]*Bop[i]+VACL[i]；If(dt=float)VACL[i]=Aop[i]*Bop[i]+VACL[i];

else VACH[i]：VACL[i]＝Aop[i]*Bop[i]+VACH[i]：VACL[i]；Else VACH[i]:VACL[i]=Aop[i]*Bop[i]+VACH[i]:VACL[i];

}}

异常abnormal

溢出，浮点无效的操作数。Overflow, invalid operand for floating point.

编程注解programming notes

此指令不支持int9数据类型，用int16数据类型替代。This instruction does not support the int9 data type, use the int16 data type instead.

VMACF 乘和累加小数VMACF multiply and accumulate decimals

格式Format

汇编器句法assembler syntax

VMACF.dt VRa，VRbVMACF.dt VRa, VRb

VMACF.dt VRa，SRbVMACF.dt VRa, SRb

VMACF.dt VRd，#IMMVMACF.dt VRd, #IMM

VMACF.dt SRa，SRbVMACF.dt SRa, SRb

VMACF.dt SRa，#IMMVMACF.dt SRa, #IMM

其中dt＝{b，h，w}。where dt = {b, h, w}.

支持的模式 D：S：M V＜-V@V V＜-V@S V＜-V@I S＜-S@S S＜-S@I DS int8(b) int16(h) int32(w) supported modes D:S:M V<-V@V V<-V@S V<-V@I S<-S@S S<-S@I DS int8(b) int16(h) int32(w)

说明illustrate

VRa的每个元素与Rb中的每个元素相乘以产生一双精度的中间结果；将此双精度中间结果左移一位；将移位后的中间结果的每个双精度元素与向量累加器的每个双精度元素相加；每个元素的双精度和存放到向量累加器中。Multiply each element of VRa with each element in Rb to produce a double-precision intermediate result; shift this double-precision intermediate result to the left by one bit; combine each double-precision element of the shifted intermediate result with the vector accumulator Each double element of is added; the double sum of each element is placed in the vector accumulator.

VRa和Rb使用指定的数据类型，而VAC使用合适的双精度数据类型(16、32和64位分别对应int8、int16和int32)。每个双精度元素的高位部分存放在VACH中。VRa and Rb use the specified data type, while VAC uses the appropriate double precision data type (16, 32, and 64 bits correspond to int8, int16, and int32, respectively). The high-order part of each double-precision element is stored in VACH.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

VACH[i]：VACL[i]＝((VRa[i]*Bop[i])＜＜1)+VACH[i]：VACL[i]；VACH[i]:VACL[i]=((VRa[i]*Bop[i])<<1)+VACH[i]:VACL[i];

}}

异常abnormal

溢出。overflow.

编程注解programming notes

VMACL 乘和累加低位VMACL multiply and accumulate low bit

格式Format

汇编器句法assembler syntax

VMACL.dt VRd，VRa，VRbVMACL.dt VRd, VRa, VRb

VMACL.dt VRd，VRa，SRbVMACL.dt VRd, VRa, SRb

VMACL.dt VRd，VRa，#IMMVMACL.dt VRd, VRa, #IMM

VMACL.dt SRd，SRa，SRbVMACL.dt SRd, SRa, SRb

VMACL.dt SRd，SRa，#IMMVMACL.dt SRd, SRa, #IMM

其中dt＝{b，h，w，f}。where dt = {b, h, w, f}.

说明illustrate

将VRa的每个元素与Rb中的每个元素相乘以产生一双精度的中间结果；将此中间结果的每个双精度元素与向量累加器的每个双精度元素相加；将每个元素的双精度和存放到向量累加器；将较低位部分返回到目的寄存器VRd。Multiply each element of VRa with each element in Rb to produce a double-precision intermediate result; add each double-precision element of this intermediate result to each double-precision element of the vector accumulator; add each element The double-precision sum is stored in the vector accumulator; the lower part is returned to the destination register VRd.

VRa和Rb使用指定的数据类型，而VAC使用合适的双精度数据类型(16、32和64位分别对应int8、int16和int32)。每个双精度元素的位部分存放在VACH中。VRa and Rb use the specified data type, while VAC uses the appropriate double precision data type (16, 32, and 64 bits correspond to int8, int16, and int32, respectively). The bit portion of each double element is stored in VACH.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb}；Bop[i]={VRb[i]‖SRb};

if(dt＝float)VACL[i]＝VRa[i]*Bop[i]+VACL[i]；If(dt=float)VACL[i]=VRa[i]*Bop[i]+VACL[i];

else VACH[i]：VACL[i]＝VRa[i]*Bop[i]+VACH[i]：VACL[i]；Else VACH[i]:VACL[i]=VRa[i]*Bop[i]+VACH[i]:VACL[i];

VRd[i]＝VACL[i]；VRd[i]=VACL[i];

}}

异常abnormal

编程注解programming notes

此指令不支持int9数据类型。代之以使用int16数据类型。This instruction does not support int9 data type. Use the int16 data type instead.

VMAD 乘和加VMAD multiply and add

格式Format

汇编器句法assembler syntax

VMAD.dt VRc，VRd，VRa，VRbVMAD.dt VRc, VRd, VRa, VRb

VMAD.dt SRc，SRd，SRa，SRbVMAD.dt SRc, SRd, SRa, SRb

其中dt＝(b，h，w)。where dt = (b, h, w).

支持的模式 S VR SR DS int8(b) int16(h) int32(w) supported modes S VR SR DS int8(b) int16(h) int32(w)

说明illustrate

将Ra的每个元素与Rb中的每个元素相乘以产生一双精度的中间结果；使此中间结果的每个双精度元素与Rc的每个元素相加；将每个元素的双精度和存放到目的寄存器Rd+1：Rd。Multiply each element of Ra with each element of Rb to produce a double-precision intermediate result; add each double-precision element of this intermediate result to each element of Rc; multiply the double-precision sum of each element Stored in the destination register Rd+1: Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Aop[i]＝{VRa[i]‖SRa]；Aop[i]={VRa[i]‖SRa];

Bop[i]＝(VRb[i]‖SRb}；Bop[i]=(VRb[i]‖SRb};

Cop[i]＝(VRc[i]‖SRc}；Cop[i]=(VRc[i]‖SRc};

Rd+1[i]：Rd[i]＝Aop[i]*Bop[i]+sex_dp(Cop[i])；Rd+1[i]: Rd[i]=Aop[i]*Bop[i]+sex_dp(Cop[i]);

}}

异常abnormal

无。none.

VMADL 乘和加低位VMADL multiply and add low

格式Format

汇编器句法assembler syntax

VMADL.df VRc，VRd，VRa，VRbVMADL.df VRc, VRd, VRa, VRb

VMADL.dt SRc，SRd，SRa，SRbVMADL.dt SRc, SRd, SRa, SRb

其中dt＝{b，h，w，f}。where dt = {b, h, w, f}.

支持的模式 S VR SR DS int8(b) float(f) int16(h) int32(w) supported modes S VR SR DS int8(b) float(f) int16(h) int32(w)

说明illustrate

将Ra的每个元素与Rb中的每个元素相乘以产生一双精度的中间结果；将此中间结果的每个双精度元素与Rc的每个元素相加；将每个元素的双精度和的低位部分返回到目的寄存器Rd。Multiply each element of Ra with each element in Rb to produce a double-precision intermediate result; add each double-precision element of this intermediate result to each element of Rc; add the double-precision sum of each element The low part of is returned to the destination register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++)(for(i=0; i<NumElem &&EMASK[i]; i++)(

Aop[i]＝{VRa[i]‖SRa}；Aop[i]={VRa[i]‖SRa};

Bop[i]＝{VRb[i]‖SRb]；Bop[i]={VRb[i]‖SRb];

Cop[i]＝{VRc[i]‖SRc{；Cop[i]={VRc[i]‖SRc{;

if(dt＝Roat)Lo[i]＝Aop[i]*Bop[i]+Cop[i]；If(dt=Roat)Lo[i]=Aop[i]*Bop[i]+Cop[i];

else Hi[i]：Lo[i]＝Aop[i]*Bop[i]+sex_dp(Cop[i])；Else Hi[i]: Lo[i]=Aop[i]*Bop[i]+sex_dp(Cop[i]);

Rd[i]＝Lo[i]；Rd[i]=Lo[i];

}}

异常abnormal

VMAS 从累加器乘和减VMAS Multiply and Subtract from Accumulator

格式Format

汇编器句法assembler syntax

VMAS.dt VRa，VRbVMAS.dt VRa, VRb

VMAS.dt VRa，SRbVMAS.dt VRa, SRb

VMAS.dt VRa，#IMMVMAS.dt VRa, #IMM

VMAS.dt SRa，SRbVMAS.dt SRa, SRb

VMAS.dt SRa，#IMMVMAS.dt SRa, #IMM

其中dt＝{b，h，w，f}。where dt = {b, h, w, f}.

说明illustrate

将Ra的每个元素Rb中的每个元素相乘以产生一双精度的中间结果；从向量累加器的每个双精度元素中减去中间结果的每个双精度元素；将每个元素的双精度和存放到向量累加器。Multiply each element of Ra with each element in Rb to produce a double-precision intermediate result; subtract each double-precision element of the intermediate result from each double-precision element in the vector accumulator; multiply the double-precision elements of each element precision and store to the vector accumulator.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb}；Bop[i]={VRb[i]‖SRb};

if(dt＝float)VACL[i]＝VACL[i]-VRa[i]*Bop[i]；If(dt=float)VACL[i]=VACL[i]-VRa[i]*Bop[i];

else VACH[i]：VACL[i]＝VACH[i]：VACL[i]-VRa[i]*Bop[i]； else VACH[i]: VACL[i] = VACH[i]: VACL[i]-VRa[i]*Bop[i];

}}

异常abnormal

编程注解programming notes

此指令不支持int9数据类型，用int16数据类型来替代。This instruction does not support int9 data type, use int16 data type instead.

VMASF 从累加器小数乘和减VMASF Multiply and subtract decimals from accumulator

格式Format

汇编器句法assembler syntax

VMASF.dt VRa，VRbVMASF.dt VRa, VRb

VMASF.dt VRa，SRbVMASF.dt VRa, SRb

VMASF.dt VRa，#IMMVMASF.dt VRa, #IMM

VMASF.dt SRa，SRbVMASF.dt SRa, SRb

VMASF.dt SRa，#IMMVMASF.dt SRa, #IMM

其中dt＝{b，h，w}。where dt = {b, h, w}.

说明illustrate

将VRa的每个元素与Rb中的每个元素相乘以产生一双精度的中间结果；将双精度的中间结果左移一位；从向量累加器的每个双精度元素减去被移位的中间结果的每个双精度元素；将每个元素的双精度和存储到向量累加器。Multiply each element of VRa with each element in Rb to produce a double-precision intermediate result; left-shift the double-precision intermediate result by one bit; subtract the shifted value from each double-precision element of the vector accumulator Each double element of the intermediate result; stores the double sum of each element to the vector accumulator.

VRa和Rb使用指定的数据类型，而VAC使用合适的双精度数据类型(16、32和64位分别对并int8、int16和int32)。每个双精度元素的高位部分存放在VACH中。VRa and Rb use the specified data type, while VAC uses the appropriate double precision data type (16, 32, and 64 bits for int8, int16, and int32 respectively). The high-order part of each double-precision element is stored in VACH.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)]；Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)];

VACH[i]：VACL[i]＝VACH[i]：VACL[i]-VRa[i]*Bop[i]；VACH[i]:VACL[i]=VACH[i]:VACL[i]-VRa[i]*Bop[i];

}}

异常abnormal

溢出。overflow.

编程注解programming notes

VMASL 从累加器低位乘和减VMASL Multiply and subtract from the low bits of the accumulator

格式Format

汇编器句法assembler syntax

VMASL.dt VRd，VRa，VRbVMASL.dt VRd, VRa, VRb

VMASL.dt VRd，VRa，SRbVMASL.dt VRd, VRa, SRb

VMASL.dt VRd，VRa，#IMMVMASL.dt VRd, VRa, #IMM

VMASL.dt SRd，SRa，SRbVMASL.dt SRd, SRa, SRb

VMASL.dt SRd，SRa，#IMMVMASL.dt SRd, SRa, #IMM

其中dt＝{b，h，w，f}。where dt = {b, h, w, f}.

说明illustrate

将VRa的每个元素与Rb中的每个元素相乘以产生一双精度的中间结果；从向量累加器的每个双精度元素减去中间结果的每个双精度元素；将每个元素的双精度和存放到向量累加器；将低位部分存放到目的寄存器VRd。Multiply each element of VRa with each element in Rb to produce a double-precision intermediate result; subtract each double-precision element of the intermediate result from each double-precision element of the vector accumulator; multiply the double-precision element of each element The precision sum is stored in the vector accumulator; the lower part is stored in the destination register VRd.

RVa和Rb使用指定的数据类型，而VAC使用合适的双精度数据类型(16，32和64分别对应int8，int16和int32)。每个双精度元素的高位部分存放在VACH中。RVa and Rb use the specified data type, while VAC uses the appropriate double data type (16, 32, and 64 correspond to int8, int16, and int32, respectively). The high-order part of each double-precision element is stored in VACH.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb}；Bop[i]={VRb[i]‖SRb};

else VACH[i]：VACL[i]＝VACH[i]：VACL[i]-VRa[i]*Bop[i]：Else VACH[i]: VACL[i]=VACH[i]: VACL[i]-VRa[i]*Bop[i]:

VRd[i]＝VACL[i]；VRd[i]=VACL[i];

}}

异常abnormal

编程注解programming notes

VMAXE 成对方式的最大和交换VMAXE Pairwise maximum-sum exchange

格式Format

汇编器句法assembler syntax

VMAXE.dt VRd，VRbVMAXE.dt VRd, VRb

其中dt＝{b，b9，h，w，f}。where dt = {b, b9, h, w, f}.

支持的模式 D：S：M V＜-V DS int8(b) int9(b9) int16(h) int32(w) float(f) supported modes D:S:M V<-V DS int8(b) int9(b9) int16(h) int32(w) float(f)

说明illustrate

VRa应当等于VRb。当VRa与VRb不同时，结果未定义。VRa should be equal to VRb. When VRa differs from VRb, the result is undefined.

向量寄存器Rb的每个偶/奇数据元素被成对比较，并且每个数据元素对的较大值存储到偶数位置，每个数据元素对的较小值存储到向量寄存器Rd的奇数位置。Each even/odd data element of vector register Rb is compared in pairs and the larger value of each data element pair is stored into the even location and the smaller value of each data element pair is stored into the odd location of vector register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i＝i+2)(for(i=0; i<NumElem &&EMASK[i]; i=i+2)(

VRd[i]＝(VRb[i]＞VRb[i+1])？VRb[i]：VRb[i+1]；VRd[i]=(VRb[i]>VRb[i+1])? VRb[i]: VRb[i+1];

VRd[i+1]＝(VRb[i]＞VRb[i+1])？VRb[i+1]：VRb[i]：VRd[i+1]=(VRb[i]>VRb[i+1])? VRb[i+1]:VRb[i]:

}}

异常abnormal

无。none.

VMOV 转移VMOV transfer

格式Format

汇编器句法assembler syntax

VMOV.dt Rd，RbVMOV.dt Rd, Rb

其中dt＝{b，b9，h，w，f}。Rd和Rb指示结构上规定的寄存器名。where dt = {b, b9, h, w, f}. Rd and Rb indicate structurally specified register names.

注意.w和.f指示相同的操作。Note that .w and .f indicate the same operation.

支持的模式supported modes

说明illustrate

寄存器Rb的内容转移到寄存器Rd。Group字段指定源和目的寄存器组。寄存器组的标记办法是：The content of register Rb is transferred to register Rd. The Group field specifies the source and destination register groups. The marking method of the register group is:

VR 当前组向量寄存器VR current group vector register

VRA 替代组向量寄存器VRA Alternative Group Vector Register

SR 标量寄存器SR scalar register

SP 专用寄存器SP special register

RASR 返回地址堆栈寄存器RASR return address stack register

MAC 向量累加寄存器(参见下面的VAC寄存器编码表) 组<3：0> 源组目的组注释 0000 保留 0001 VR VRA 0010 VRA VR 0011 VRA VRA 0100 保留 0101 保留 0110 VRA VAC 0111 VAC VRA 1000 保留 1001 SR VRA 1010 保留 1011 保留 1100 SR SP 1101 SP SR 1110 SR RASR 1111 RASR SR MAC vector accumulation register (see VAC register encoding table below) group<3:0> source group target group note 0000 reserve 0001 VR VRA 0010 VRA VR 0011 VRA VRA 0100 reserve 0101 reserve 0110 VRA VAC 0111 VAC VRA 1000 reserve 1001 SR VRA 1010 reserve 1011 reserve 1100 SR SP 1101 SP SR 1110 SR RASR 1111 RASR SR

注意用此指令不能将向量寄存器传送到标量寄存器。VEXTRT指令是为此而提供的。Note that vector registers cannot be transferred to scalar registers with this instruction. The VEXTRT instruction is provided for this purpose.

对VAC寄存器编码使用下表： R<2：0> 寄存器注释 000 未定义 001 VAC0L 010 VAC0H 011 VAC0 指定VAC0H：VAC0L两者。如指定为源，VRd+1：VRd寄存器对被更新。VRd必须是偶数寄存器。 100 未定义 101 VAC1L 110 VAC1H 111 VAC1 指定VAC1H：VAC1L两者。如指定为源，VRd+1：VRd寄存器对被更新。VRd必须是偶数寄存器。其它未定义 Use the following table for VAC register coding: R<2:0> register note 000 undefined 001 VAC0L 010 VAC0H 011 VAC0 Specify both VAC0H:VAC0L. If specified as source, the VRd+1:VRd register pair is updated. VRd must be an even register. 100 undefined 101 VAC1L 110 VAC1H 111 VAC1 Specify both VAC1H:VAC1L. If specified as source, the VRd+1:VRd register pair is updated. VRd must be an even register. other undefined

操作operate

Rd＝RbRd=Rb

异常abnormal

在VCSR或VISRC中设置异常事件状态位将引起相应的异常事件。Setting an exception event status bit in VCSR or VISRC will cause the corresponding exception event.

编程注解programming notes

此指令不受元素屏蔽的影响。注意，用在VEC64模式下不存在替代组的概念，在VEC64模式下，本指令不能用于从替代组的寄存器或向替代组的寄存器转移。This directive is not affected by elemental masking. Note that the concept of substitution group does not exist in VEC64 mode. In VEC64 mode, this instruction cannot be used to transfer from or to the register of the substitution group.

VMUL 乘VMUL multiply

格式Format

汇编器句法assembler syntax

VMUL.dt VRc，VRd，VRa，VRbVMUL.dt VRc, VRd, VRa, VRb

VMUL.dt SRc，SRd，SRa，SRbVMUL.dt SRc, SRd, SRa, SRb

其中dt＝{b，h，w}。where dt = {b, h, w}.

说明illustrate

将Ra的每个元素与Rb中的每个元素相乘以产生一双精度的结果；将每个元素的双精度和返回到目的寄存器Rc：Rd。Multiply each element of Ra with each element in Rb to produce a double-precision result; return the double-precision sum of each element to the destination register Rc:Rd.

Ra和Rb使用指定的数据类型，而Rc：Rd使用合适的双精度数据类型(16、32和64位分别对应int8、int16和int32)。每个双精度元素的高位部分存放在Rc中。Ra and Rb use the specified data type, while Rc:Rd uses the appropriate double precision data type (16, 32, and 64 bits correspond to int8, int16, and int32, respectively). The high-order part of each double-precision element is stored in Rc.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Aop[i]＝{VRa[i]‖SRa}；Aop[i]={VRa[i]‖SRa};

Bop[i]＝{VRb[i]‖SRb}；Bop[i]={VRb[i]‖SRb};

Hi[i]：Lo[i]＝Aop[i]*Bop[i]：Hi[i]: Lo[i]=Aop[i]*Bop[i]:

Rc[i]＝Hi[i]；Rc[i]=Hi[i];

Rd[i]＝Lo[i]；Rd[i]=Lo[i];

}}

异常abnormal

无。none.

编程注解programming notes

此指令不支持int9数据类型，用int16数据类型替代。此指令也不支持浮点数据类型，因为扩展的结果是不被支持的数据类型。This instruction does not support the int9 data type, use the int16 data type instead. This instruction also does not support floating-point data types, because the result of the expansion is an unsupported data type.

VMULA 乘到累加器VMULA multiply to accumulator

格式Format

汇编器句法assembler syntax

VMULA.dt VRa，VRbVMULA.dt VRa, VRb

VMULA.dt VRa，SRbVMULA.dt VRa, SRb

VMULA.dt VRa，#IMMVMULA.dt VRa, #IMM

VMULA.dt SRa，SRbVMULA.dt SRa, SRb

VMULA.dt SRa，#IMMVMULA.dt SRa, #IMM

其中dt＝{b，h，w，f}。where dt = {b, h, w, f}.

支持的模式 D：S：M V@V V@S V@I S@S S@I DS int8(b) int16(h) int32(w) float(f) supported modes D:S:M V@V V@S V@I S@S S@I DS int8(b) int16(h) int32(w) float(f)

说明illustrate

将VRa的每个元素与Rb中的每个元素相乘以产生一双精度的结果；将此结果写到累加器。Multiply each element of VRa by each element of Rb to produce a double-precision result; write this result to the accumulator.

浮点数据类型，全部操作数和结果都是单精度的。Floating point data type, all operands and results are single precision.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb}；Bop[i]={VRb[i]‖SRb};

if(dt＝＝float)VACL[i]＝VRa[i]*Bop[i]；If(dt==float)VACL[i]=VRa[i]*Bop[i];

else VACH[i]：VACL[i]＝VRa[i]*Bop[i]；Else VACH[i]: VACL[i]=VRa[i]*Bop[i];

}}

异常abnormal

无。none.

编程注解programming notes

VMULAF 乘到累加器小数VMULAF Multiply to accumulator decimal

格式Format

汇编器句法assembler syntax

VMULAF.dt VRa，VRbVMULAF.dt VRa, VRb

VMULAF.dt VRa，SRbVMULAF.dt VRa, SRb

VMULAF.dt VRa，#IMMVMULAF.dt VRa, #IMM

VMULAF.dt SRa，SRbVMULAF.dt SRa, SRb

VMULAF.dt SRa，#IMMVMULAF.dt SRa, #IMM

其中dt＝{b，h，w}。where dt = {b, h, w}.

支持的模式 D：S：M V@V V@S V@I S@S S@I DS int8(b) int16(h) int32(w) supported modes D:S:M V@V V@S V@I S@S S@I DS int8(b) int16(h) int32(w)

说明illustrate

将VRa的每个元素与Rb中每个元素相乘以产生一以双精度的中间结果；此双精度中间结果左移一位；结果写到累加器。Multiply each element of VRa by each element of Rb to produce a double-precision intermediate result; this double-precision intermediate result is shifted left one bit; the result is written to the accumulator.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

VACH[i]：VACL[i]＝(VRa[i]*Bop[i])＜＜1；VACH[i]: VACL[i]=(VRa[i]*Bop[i])<<1;

}}

异常abnormal

无。none.

编程注解programming notes

VMULF 乘小数VMULF multiply decimals

格式Format

汇编器句法assembler syntax

VMULF.dt VRd，VRa，VRbVMULF.dt VRd, VRa, VRb

VMULF.dt VRd，VRa，SRbVMULF.dt VRd, VRa, SRb

VMULF.dt VRd，VRa，#IMMVMULF.dt VRd, VRa, #IMM

VMULF.dt SRd，SRa，SRbVMULF.dt SRd, SRa, SRb

VMULF.dt SRd，SRa，#IMMVMULF.dt SRd, SRa, #IMM

其中dt＝{b，h，w}。where dt = {b, h, w}.

说明illustrate

将VRa的每个元素与Rb中的每个元素相乘以产生一双精度的中间结果；此双精度中间结果左移一位；将结果的高位部分返回到目的寄存器VRd+1，低位部分返回到目的寄存器VRd。VRd必须是一偶数寄存器。Multiply each element of VRa with each element in Rb to produce a double-precision intermediate result; this double-precision intermediate result is shifted left by one bit; the high-order part of the result is returned to the destination register VRd+1, and the low-order part is returned to Destination register VRd. VRd must be an even register.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Hi[i]：Lo[i]＝(VRa[i]*Bop[i])＜＜1；Hi[i]: Lo[i]=(VRa[i]*Bop[i])<<1;

VRd+1[i]＝Hi[i]；VRd+1[i]=Hi[i];

VRd[i]＝Lo[i]；VRd[i]=Lo[i];

}}

异常abnormal

无。none.

编程注解programming notes

VMULFR 乘小数并舍入VMULFR multiply decimals and round

格式Format

汇编器句法assembler syntax

VMULFR.dt VRd，VRa，VRbVMULFR.dt VRd, VRa, VRb

VMULFR.dt VRd，VRa，SRbVMULFR.dt VRd, VRa, SRb

VMULFR.dt VRd，VRa，#IMMVMULFR.dt VRd, VRa, #IMM

VMULFR.dt SRd，SRa，SRbVMULFR.dt SRd, SRa, SRb

VMULFR.dt SRd，SRa，#IMMVMULFR.dt SRd, SRa, #IMM

其中dt＝{b，h，w}。where dt = {b, h, w}.

说明illustrate

将VRa的每个元素与Rb中每个元素相乘以产生一双精度的中间结果；此双精度中间结果左移一位；将此被移位的中间结果舍入到高位部分；高位部分返回到目的寄存器VRd。Multiply each element of VRa by each element of Rb to produce a double-precision intermediate result; shift the double-precision intermediate result to the left by one bit; round the shifted intermediate result to the high-order part; return the high-order part to Destination register VRd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝(VRb[i]‖SRb‖sex(IIMM<8：0>)}；Bop[i]=(VRb[i]‖SRb‖sex(IIMM<8:0>)};

Hi[i]：Lo[i]＝(VRa[i]*Bop[i])＜＜1；Hi[i]: Lo[i]=(VRa[i]*Bop[i])<<1;

if(Lo[i]<msb>＝＝1) Hi[i]＝Hi[i]+1；If(Lo[i]<msb>==1) Hi[i]=Hi[i]+1;

VRd[i]＝Hi[i]；VRd[i]=Hi[i];

}}

异常abnormal

无。none.

编程注解programming notes

VMULL 乘低位VMULL Multiply the low bit

格式Format

汇编器句法assembler syntax

VMULL.dt VRd，VRa，VRbVMULL.dt VRd, VRa, VRb

VMULL.dt VRd，VRa，SRbVMULL.dt VRd, VRa, SRb

VMULL.dt VRd，VRa，#IMMVMULL.dt VRd, VRa, #IMM

VMULL.dt SRd，SRa，SRbVMULL.dt SRd, SRa, SRb

VMULL.dt SRd，SRa，#IMMVMULL.dt SRd, SRa, #IMM

其中dt＝(b，h，w，f}。where dt = (b, h, w, f}.

说明illustrate

将VRa的每个元素与Rb中每个元素相乘以产生一双精度的结果；结果的低位部分返回到目的寄存器VRd。Each element of VRa is multiplied by each element of Rb to produce a double-precision result; the low-order part of the result is returned to the destination register VRd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb}；Bop[i]={VRb[i]‖SRb};

if(dt＝Roat)Lo[i]＝VRa[i]*Bop[i]；if(dt=Roat)Lo[i]=VRa[i]*Bop[i];

else Hi[i]：Lo[i]＝VRa[i]*Bop[i]；Else Hi[i]: Lo[i]=VRa[i]*Bop[i];

VRd[i]＝Lo[i]；VRd[i]=Lo[i];

}}

异常abnormal

编程注解programming notes

VNAND 与非VNAND NAND

格式Format

汇编器句法assembler syntax

VNAND.dt VRd，VRa，VRbVNAND.dt VRd, VRa, VRb

VNAND.dt VRd，VRa，SRbVNAND.dt VRd, VRa, SRb

VNAND.dt VRd，VRa，#IMMVNAND.dt VRd, VRa, #IMM

VNAND.dt SRd，SRa，SRbVNAND.dt SRd, SRa, SRb

VNAND.dt SRd，SRa，#IMMVNAND.dt SRd, SRa, #IMM

说明illustrate

对Ra中每个元素的每一位与Rb/立即操作数中的相应位进行逻辑NAND，结果返回到Rd中。Logical NAND is performed on each bit of each element in Ra with the corresponding bit in Rb/immediate operand, and the result is returned in Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝[VRb[i]‖SRb‖sex(IMM<8：0>)}；Bop[i]=[VRb[i]‖SRb‖sex(IMM<8:0>)};

Rd[i]<k>＝-(Ra[i]<k> & Bop[i]<k>).for k＝all bits in elementi；Rd[i]<k>＝-(Ra[i]<k> & Bop[i]<k>).for k＝all bits in elementi;

}}

异常abnormal

无。none.

VNOR 或非VNOR or not

格式Format

汇编器句法assembler syntax

VNOR.dt VRd，VRa，VRbVNOR.dt VRd, VRa, VRb

VNOR.dt VRd，VRa，SRbVNOR.dt VRd, VRa, SRb

VNOR.dt VRd，VRa，#IMMVNOR.dt VRd, VRa, #IMM

VNOR.dt SRd，SRa，SRbVNOR.dt SRd, SRa, SRb

VNOR.dt SRd，SRa，#IMMVNOR.dt SRd, SRa, #IMM

支持的模式 D：S：M V＜-V@V V＜-V@S V＜-V@I S＜-S@S S＜-S@I DS int8(b) ini9(b9) int16(h) int32(w) supported modes D:S:M V<-V@V V<-V@S V<-V@I S<-S@S S<-S@I DS int8(b) ini9(b9) int16(h) int32(w)

说明illustrate

对Ra中每个元素的每一位与Rb/立即操作数中的相应位作逻辑NOR；结果返回到Rd中。Logically NORs each bit of each element in Ra with the corresponding bit in Rb/immediate operand; the result is returned in Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]<k>＝-(Ra[i]<k>|Bop[i]<k>).for k＝all bits in elementi；Rd[i]<k>＝-(Ra[i]<k>|Bop[i]<k>).for k＝all bits in elementi;

}}

异常abnormal

无。none.

VOR或VOR or

格式Format

汇编器句法assembler syntax

VOR.dt VRd，VRa，VRbVOR.dt VRd, VRa, VRb

VOR.dt VRd，VRa，SRbVOR.dt VRd, VRa, SRb

VOR.dt VRd，VRa，#IMMVOR.dt VRd, VRa, #IMM

VOR.dt SRd，SRa，SRbVOR.dt SRd, SRa, SRb

VOR.dt SRd，SRa，#IMMVOR.dt SRd, SRa, #IMM

说明illustrate

对Ra中每个元素的每一位与Rb/立即操作数中的相应位进行逻辑OR；结果返回到Rd中。Logically ORs each bit of each element in Ra with the corresponding bit in Rb/immediate operand; result is returned in Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]<k>＝Ra[i]<k>|Bop[i]<k>，for k＝all bits in elementi；Rd[i]<k>=Ra[i]<k>|Bop[i]<k>, for k=all bits in elementi;

}}

异常abnormal

无。none.

VORC 或补码VORC or two's complement

格式Format

汇编器句法assembler syntax

VORC.dt VRd，VRa，VRbVORC.dt VRd, VRa, VRb

VORC.dt VRd，VRa，SRbVORC.dt VRd, VRa, SRb

VORC.dt VRd，VRa，#IMMVORC.dt VRd, VRa, #IMM

VORC.dt SRd，SRa，SRbVORC.dt SRd, SRa, SRb

VORC.dt SRd，SRa，#IMMVORC.dt SRd, SRa, #IMM

说明illustrate

对Ra中每个元素的每一位与Rb/立即操作数中相应位的补码进行逻辑OR；结果返回到Rd中。Perform logical OR of each bit of each element in Ra with the complement of the corresponding bit in Rb/immediate operand; result is returned in Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Bop[i]＝{VRb[i]‖SRb‖sex(IMM<8：0>)}Bop[i]={VRb[i]‖SRb‖sex(IMM<8:0>)}

Rd[i]<k>＝Ra[i]<k>|-Bop[i]<k>，for k＝all bits in elementi；Rd[i]<k>＝Ra[i]<k>|-Bop[i]<k>, for k＝all bits in elementi;

}}

异常abnormal

无。none.

VPFTCH 预取VPFTCH prefetch

格式Format

汇编器句法assembler syntax

VPFTCH.ln SRb，SRiVPFTCH.ln SRb, SRi

VPFTCH.ln SRb，#IMMVPFTCH.ln SRb, #IMM

VPFTCH.ln SRb+，SRiVPFTCH.ln SRb+, SRi

VPFTCH.ln SRb+，#IMMVPFTCH.ln SRb+, #IMM

其中ln＝{1，2，4，8}。where ln={1, 2, 4, 8}.

说明illustrate

从有效地址开始预取多个向量数据Cache行。Cache行的数目被指定如下：Prefetch multiple vector data cache lines starting from an effective address. The number of cache lines is specified as follows:

LN<1：0>＝00：预取1个64字节的Cache行LN<1:0>=00: prefetch a 64-byte Cache line

LN<1：0>＝01：预取2个64字节的Cache行LN<1:0>=01: prefetch two 64-byte Cache lines

LN<1：0>＝10：预取4个64字节的Cache行LN<1:0>=10: prefetch 4 cache lines of 64 bytes

LN<1：0>＝11：预取8个64字节的Cache行LN<1:0>=11: prefetch 8 cache lines of 64 bytes

如果有效地址不落在64字节的边界上，则首先截断以便与64字节的边界对齐。If the effective address does not fall on a 64-byte boundary, it is first truncated to align to a 64-byte boundary.

操作operate

异常abnormal

无效的数据地址异常事件。Invalid data address exception event.

编程注解programming notes

EA<31：0>指出局部存储器中的一个字节地址。EA<31:0> indicates a byte address in local memory.

VPFTCHSP预取到暂时存储器VPFTCHSP prefetches to scratch memory

格式Format

汇编器句法assembler syntax

VPFTCHSP.ln SRp，SRb，SRiVPFTCHSP.ln SRp, SRb, SRi

VPFTCHSP.ln SRp，SRb，#IMMVPFTCHSP.ln SRp, SRb, #IMM

VPFTCHSP.ln SRp，SRb+，SRiVPFTCHSP.ln SRp, SRb+, SRi

VPFTCHSP.ln SRP，SRb+，#IMMVPFTCHSP.ln SRP, SRb+, #IMM

其中ln＝{1，2，4，8}。注意VPFTCH和VPFTCHSP具有同样的操作码。where ln={1, 2, 4, 8}. Note that VPFTCH and VPFTCHSP have the same opcode.

说明illustrate

从存储器向暂时存储器传送多个64字节的块。有效地址给出存储器的起始地址，而SRp提供暂时存储器的起始地址。64字节块的数目指定如下：Multiple 64-byte blocks are transferred from memory to temporary storage. The effective address gives the starting address of the memory, while SRp gives the starting address of the scratch memory. The number of 64-byte blocks is specified as follows:

LN<1：0>＝00：传送1个64字节的块LN<1:0>=00: transfer a block of 64 bytes

LN<1：0>＝01：传送2个64字节的块LN<1:0>=01: transfer 2 blocks of 64 bytes

LN<1：0>＝10：传送4个64字节的块LN<1:0>=10: transfer 4 blocks of 64 bytes

LN<1：0>＝11：传送8个64字节的块LN<1:0>=11: transfer 8 blocks of 64 bytes

如果有效地址不落在64字节的边界上，首先截断以使与64字节的边界对齐。如果SRp中的暂时存储器指针地址不落在64字节的边界上，它也截断以与64字节的边界对齐。对齐的暂时存储器指针地址以传送字节数增加。If the effective address does not fall on a 64-byte boundary, it is first truncated to align to a 64-byte boundary. It also truncates to align to a 64-byte boundary if the scratch memory pointer address in SRp does not fall on a 64-byte boundary. Aligned scratch memory pointer address incremented by the number of bytes transferred.

操作operate

EA＝SRb+{SRi‖sex(IMM<7：0>)}；EA=SRb+{SRi‖sex(IMM<7:0>)};

if(A＝1)SRb＝EA；if(A=1)SRb=EA;

Num_bytes＝{64‖128‖256‖512}；Num_bytes={64‖128‖256‖512};

Mem_adrs＝EA<31：6>：6b′000000；Mem_adrs = EA<31:6>:6b'000000;

SRp＝SRp<31：6>：6b′000000；SRp = SRp<31:6>:6b'000000;

for(i＝0；i＜Num_bytes；i++)for(i=0; i<Num_bytes; i++)

SPAD[SRp++]＝MEM[Mem_adrs+i]；SPAD[SRp++]=MEM[Mem_adrs+i];

异常abnormal

无效的数据地址异常事件。Invalid data address exception event.

VROL 循环左移VROL Cycle left

格式Format

汇编器句法assembler syntax

VROL.dt VRd，VRa，SRbVROL.dt VRd, VRa, SRb

VROL.dt VRd，VRa，#IMMVROL.dt VRd, VRa, #IMM

VROL.dt SRd，SRa，SRbVROL.dt SRd, SRa, SRb

VROL.dt SRd，SRa，#IMMVROL.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

向量/标量寄存器Ra的每个数据元素循环左移，左移的位数在标量寄存器Rb或IMM字段中给出，结果存入向量/标量寄存器Rd。Each data element of the vector/scalar register Ra is cyclically shifted to the left, the number of bits shifted to the left is given in the field of the scalar register Rb or IMM, and the result is stored in the vector/scalar register Rd.

操作operate

rotate_amount＝{SRb％32‖IMM<4：0>}；rotate_amount={SRb%32‖IMM<4:0>};

for(i＝0；i<NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝Ra[i]rotate_left rotate_amount；Rd[i]=Ra[i]rotate_left rotate_amount;

}}

异常abnormal

无。none.

编程注解programming notes

注意rotate-amount从SRb或IMM<4：0>中取得的5位数。对byte、byte9、halfword数据类型，编程者负责正确指定小于或等于数据长度的位数的循环移位总量。如果移位总量大于指定的数据长度，则结果是未定义的。Note the 5 digits of rotate-amount taken from SRb or IMM<4:0>. For byte, byte9, and halfword data types, the programmer is responsible for correctly specifying the total amount of cyclic shifts that is less than or equal to the number of bits of the data length. If the total amount of shifting is greater than the specified data length, the results are undefined.

注意循环左移n位相当于循环右移ElemSize-n位，这里ElemSize表示给定数据长度的位数。Note that a circular left shift of n bits is equivalent to a circular right shift of ElemSize-n bits, where ElemSize represents the number of bits of a given data length.

VROR 循环右移VROR Circular shift right

格式Format

汇编器句法assembler syntax

VROR.dt VRd，SRa，SRbVROR.dt VRd, SRa, SRb

VROR.dt VRd，SRa，#IMMVROR.dt VRd, SRa, #IMM

VROR.dt SRd，SRa，SRbVROR.dt SRd, SRa, SRb

VROR.dt SRd，SRa，#IMMVROR.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

向量/标量寄存器Ra的每个数据元素循环右移，右移的位数在标量寄存器Rb或IMM字段中给出，结果存入向量/标量寄存器Rd。Each data element of the vector/scalar register Ra is cyclically shifted to the right, and the number of bits shifted to the right is given in the field of the scalar register Rb or IMM, and the result is stored in the vector/scalar register Rd.

操作operate

rotate_amount＝{SRb％32‖IMM<4：0>}；rotate_amount={SRb%32‖IMM<4:0>};

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝Ra[i]rotate_right rotate_amount；Rd[i]=Ra[i]rotate_right rotate_amount;

}}

异常abnormal

无。none.

编程注解programming notes

注意rotate-amount是从SRb或IMM<4：0>中取得的5位的数。对byte、byte9、halfword数据类型，编程者负责正确指定小于或等于数据长度的位数的循环移位总量。如果该移位总量大于指定的数据长度，则结果是未定义的。Note that rotate-amount is a 5-digit number obtained from SRb or IMM<4:0>. For byte, byte9, and halfword data types, the programmer is responsible for correctly specifying the total amount of cyclic shifts that is less than or equal to the number of bits of the data length. If the shift amount is greater than the specified data length, the results are undefined.

注意循环右移n位相当于循环左移ElemSize-n位，这里ElemSize表示给定数据长度的位数。Note that a cyclic right shift of n bits is equivalent to a cyclic left shift of ElemSize-n bits, where ElemSize represents the number of bits of a given data length.

VROUND 将浮点数舍成整型数VROUND rounds floating-point numbers to integers

格式Format

汇编器句法assembler syntax

VROUND.rm VRd，VRbVROUND.rm VRd, VRb

VROUND.rm SRd，SRbVROUND.rm SRd, SRb

其中m＝{ninf，zero，near，pinf}。where m = {ninf, zero, near, pinf}.

支持的模式 D：S：M V＜-V S＜-S supported modes D:S:M V<-V S<-S

说明illustrate

向量/标量寄存器Rb的浮点数据格式的内容舍入成为最接近的32位整数(Word)，该结果存放在向量/标量寄存器Rd中。舍入模式在RM中规定。 RM<1：0> 模式意义 00 ninf 向-∞舍入 01 zero 向零舍入 10 near 向最接近的偶数舍入 11 pinf 向+∞舍入 The content of the floating-point data format of the vector/scalar register Rb is rounded to the nearest 32-bit integer (Word), and the result is stored in the vector/scalar register Rd. The rounding mode is specified in RM. RM<1:0> model significance 00 ninf round towards -∞ 01 zero round towards zero 10 near round to nearest even 11 pinf round towards +∞

操作operate

for(i＝0；i＜NumElem；i++){for(i=0; i<NumElem; i++){

Rd[i]＝Convert to int32(Rb[i])；Rd[i]=Convert to int32(Rb[i]);

}}

异常abnormal

无。none.

编程注解programming notes

VSATL 饱和到低限VSATL saturate to low limit

格式Format

汇编器句法assembler syntax

VSATL.dt VRd，VRa，VRbVSATL.dt VRd, VRa, VRb

VSATL.dt VRd，VRa，SRbVSATL.dt VRd, VRa, SRb

VSATL.dt VRd，VRa，#IMMVSATL.dt VRd, VRa, #IMM

VSATL.dt SRd，SRa，SRbVSATL.dt SRd, SRa, SRb

VSATL.dt SRd，SRa，#IMMVSATL.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。注意9位立即数不支持.f数据类型。where dt = {b, b9, h, w}. Note that 9-bit immediates do not support the .f data type.

说明illustrate

向量/标量寄存器Ra的每个数据元素与它在向量/标量寄存器Rb或IMM字段中的对应低限对比检查。如果数据元素的值小于此低限，则被设置成等于低限，且最终结果存入向量/标量寄存器Rd。Each data element of the vector/scalar register Ra is checked against its corresponding lower bound in the vector/scalar register Rb or IMM field. If the data element's value is less than this lower bound, it is set equal to the lower bound and the final result is stored in vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝(Ra[i]＜Bop[i]？Bop[i]：Ra[i]；Rd[i]=(Ra[i]<Bop[i]? Bop[i]:Ra[i];

}}

异常abnormal

无。none.

VSATU 饱和到高限VSATU Saturation to high limit

格式Format

汇编器句法assembler syntax

VSATU.dt VRd，SRa，SRbVSATU.dt VRd, SRa, SRb

VSATU.dt VRd，SRa，#IMMVSATU.dt VRd, SRa, #IMM

VSATU.dt SRd，SRa，SRbVSATU.dt SRd, SRa, SRb

VSATU.dt SRd，SRa，#IMMVSATU.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w，f}。注意9位立即数不支持.f数据类型。where dt = {b, b9, h, w, f}. Note that 9-bit immediates do not support the .f data type.

说明illustrate

向量/标量寄存器Ra的每个数据元素与它在向量/标量寄存器Rb或IMM字段中的对应高限对比检查。如果数据元素的值大于此高限，则被设置成等于高限，且最终结果存入向量/标量寄存器Rd。Each data element of the vector/scalar register Ra is checked against its corresponding high limit in the vector/scalar register Rb or IMM field. If the value of the data element is greater than this high limit, it is set equal to the high limit and the final result is stored in vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝(Ra[i]＞Bop[i])？Bop[i]：Ra[i]；Rd[i]=(Ra[i]>Bop[i])? Bop[i]: Ra[i];

}}

异常abnormal

无。none.

VSHFL 混洗VSHFL Shuffle

格式Format

汇编器句法assembler syntax

VSHFL.dt VRc，VRd，VRa，VRbVSHFL.dt VRc, VRd, VRa, VRb

VSHFL.dt VRc，VRd，VRa，SRbVSHFL.dt VRc, VRd, VRa, SRb

支持的模式 S VRb SRb DS int8(b) int9(b9) int18(h) int32(w) supported modes S VRb SRb DS int8(b) int9(b9) int18(h) int32(w)

说明illustrate

向量寄存器Ra的内容与Rb混洗，结果存放在向量寄存器Rc：Rd，如下图所示：The content of the vector register Ra is shuffled with Rb, and the result is stored in the vector register Rc:Rd, as shown in the following figure:

操作operate

异常abnormal

无。none.

编程注解programming notes

此指令不使用元素屏蔽。This directive does not use element masking.

VSHFLH 混洗高位VSHFLH Shuffle High

格式Format

汇编器句法assembler syntax

VSHFLH.dt VRd，VRa，VRbVSHFLH.dt VRd, VRa, VRb

VSHFLH.dt VRd，VRa，SRbVSHFLH.dt VRd, VRa, SRb

其中dt＝{b，b9，h，w，f]。注意.w和.f指定相同的操作。where dt = {b, b9, h, w, f]. Note that .w and .f specify the same operation.

说明illustrate

向量寄存器Ra的内容与Rb混洗，结果的高位部分存放在向量寄存器Rd，如下图所示：The content of the vector register Ra is shuffled with Rb, and the high-order part of the result is stored in the vector register Rd, as shown in the following figure:

操作operate

异常abnormal

无。none.

编程注解programming notes

本指令不使用元素屏蔽。This directive does not use element masking.

VSHFLL 混洗低位VSHFLL Shuffle Low

格式Format

汇编器句法assembler syntax

VSHFLL.dt VRd，VRa，VRbVSHFLL.dt VRd, VRa, VRb

VSHFLL.dt VRd，VRa，SRbVSHFLL.dt VRd, VRa, SRb

其中dt＝{b，b9，h，W，f}。注意.w和.f指定同样的操作。where dt = {b, b9, h, W, f}. Note that .w and .f specify the same operation.

说明illustrate

向量寄存器Ra的内容与Rb混洗，结果的低位部分存放在向量寄存器Rd，如下图所示：The content of the vector register Ra is shuffled with Rb, and the lower part of the result is stored in the vector register Rd, as shown in the following figure:

操作operate

异常abnormal

无。none.

编程注解programming notes

此指令不使用元素屏蔽。This directive does not use element masking.

VST 存储VST storage

格式Format

汇编器句法assembler syntax

VST.st Rs，SRb，SRiVST.st Rs, SRb, SRi

VST.st Rs，SRb，#IMMVST.st Rs, SRb, #IMM

VST.st Rs，SRb+，SRiVST.st Rs, SRb+, SRi

VST.st Rs，SRb+，#IMMVST.st Rs, SRb+, #IMM

其中st＝{b，b9t，h，w，4，8，16，32，64}，Rs＝{VRs，VRAs，SRs}。Where st = {b, b9t, h, w, 4, 8, 16, 32, 64}, Rs = {VRs, VRAs, SRs}.

注意.b和.b9t指示相同的操作，.64和VRAs不能被一起指定。对Cache-off存储使用VSTOFF。Note that .b and .b9t indicate the same operation, and .64 and VRAs cannot be specified together. Use VSTOFF for Cache-off storage.

说明illustrate

存储一向量或标量寄存器。Store a vector or scalar register.

操作operate

EA＝SR_b+[SR_i‖sex(IMM<7：0>)}；EA=SR _b +[SR _i ‖sex(IMM<7:0>)};

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

MEM[EA]＝见下表： ST 存储操作 .b BYTE[EA]＝SR_s<7：0> .h HALF[EA]＝SR_s<15：0> .w WORD[EA]＝SR_s<31：0> .4 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 3 .8 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 7 .16 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 15 .32 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 31 .64 BYTE[EA+i]＝VR_0s<9i+7：9i>，i＝0 to 31BYTE[EA+32+i]＝VR_1s<9i+7：9i>，i＝0 to 31 MEM[EA]=see the table below: ST storage operation .b BYTE[EA]=SR _s <7:0> .h HALF[EA]= _SRs <15:0> .w WORD[EA]= _SRs <31:0> .4 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 3 .8 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 7 .16 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 15 .32 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 31 .64 BYTE[EA+i]=VR _0s <9i+7:9i>, i=0 to 31 BYTE[EA+32+i]=VR _1s <9i+7:9i>, i=0 to 31

异常abnormal

编程注解programming notes

VSTCB 存储到循环缓冲器VSTCB store to circular buffer

格式Format

汇编器句法assembler syntax

VSTCB.st Rs，SRb，SRiVSTCB.st Rs, SRb, SRi

VSTCB.st Rs，SRb，#IMMVSTCB.st Rs, SRb, #IMM

VSTCB.st Rs，SRb+，SRiVSTCB.st Rs, SRb+, SRi

VSTCB.st Rs，SRb+，#IMMVSTCB.st Rs, SRb+, #IMM

注意.b和.b9t指示相同的操作，.64和VRAs不能被一起指定。对Cache-off使用VSTCBOFF。Note that .b and .b9t indicate the same operation, and .64 and VRAs cannot be specified together. Use VSTCBOFF for Cache-off.

说明illustrate

从循环缓冲器存储到向量或标量寄存器，循环缓冲器的边界由SR_b+1中的BEGIN指针和SR_b+2中的END指针确定。From circular buffer storage to vector or scalar registers, the boundaries of the circular buffer are determined by the BEGIN pointer in SR _b+1 and the END pointer in SR _b+2 .

在存储及地址更新操作之前，如果有效地址大于END地址，则它将被调整。此外，对.h和.w标量加载循环缓冲器的边界必须分别与halfword和Word的边界对齐。Before store and address update operations, if the effective address is greater than the END address, it will be adjusted. In addition, the boundaries of the circular buffer for .h and .w scalar loads must be aligned with the boundaries of halfword and Word, respectively.

操作operate

EA＝SR_b+{SR_i‖sex(IMM<7：0>)}；EA=SR _b +{SR _i ‖sex(IMM<7:0>)};

BEGIN＝SR_b+1；BEGIN=SR _b+1 ;

END＝SR_b+2；END=SR _b+2 ;

cbsize＝END-BEGIN；cbsize=END-BEGIN;

if(EA＞END)EA＝BEGIN+(EA-END)；if(EA>END)EA=BEGIN+(EA-END);

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

MEM[EA]＝见下表： ST 存储操作 .b BYTE[EA]＝SR_s<7：0>； .h HALF[EA]＝SR_s<15：0>； .w WORD[EA]＝SR_s<31：0>； .4 BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR_s<9i+7：9i>，i＝0 to 3 .8 BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR_s<9i+7：9i>，i＝0 to 7 .16 BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR_s<9i+7：9i>，i＝0 to 15 .32 BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR_s<9i+7：9i>，i＝0 to 31 .64 BYTE[(EA+i＞END)？EA+i-cbsize：EA+i]＝VR_0s<9i+7：9i>，i＝0 to 31BYTE[(EA+32+i＞END)？EA+32+i-cbsize：EA+32+i]＝VR_1s<9i+7：9i>.i＝0 to 31 MEM[EA]=see the table below: ST storage operation .b BYTE[EA] = SR _s <7:0>; .h HALF[EA]= _SRs <15:0>; .w WORD[EA]=SR _s <31:0>; .4 BYTE[(EA+i>END)? EA+i-cbsize:EA+i]=VR _s <9i+7:9i>, i=0 to 3 .8 BYTE[(EA+i>END)? EA+i-cbsize:EA+i]=VR _s <9i+7:9i>, i=0 to 7 .16 BYTE[(EA+i>END)? EA+i-cbsize:EA+i]=VR _s <9i+7:9i>, i=0 to 15 .32 BYTE[(EA+i>END)? EA+i-cbsize:EA+i]=VR _s <9i+7:9i>, i=0 to 31 .64 BYTE[(EA+i>END)? EA+i-cbsize:EA+i]=VR _0s <9i+7:9i>, i=0 to 31BYTE[(EA+32+i>END)? EA+32+i-cbsize:EA+32+i]=VR _1s <9i+7:9i>.i=0 to 31

异常abnormal

编程注解programming notes

编程者必须确定下面的条件以使此指令按所希望的工作：The programmer must determine the following conditions for this instruction to work as desired:

BEGIN＜EA＜2*END-BEGINBEGIN＜EA＜2*END-BEGIN

即，EA＞BEGIN以及EA-END＜END-BEGINThat is, EA>BEGIN and EA-END<END-BEGIN

VSTD 双存储VSTD dual storage

格式Format

汇编器句法assembler syntax

VSTD.st Rs，SRb，SRiVSTD.st Rs, SRb, SRi

VSTD.st Rs，SRb，#IMMVSTD.st Rs, SRb, #IMM

VSTD.st Rs，SRb+，SRiVSTD.st Rs, SRb+, SRi

VSTD.st Rs，SRb+，#IMMVSTD.st Rs, SRb+, #IMM

其中st＝{b，b9t，h，w，4，8，16，32，64}，Rs＝{VRs，VRAs，SRs}。注意.b和.b9t指定相同的操作，.64和VRAs不能被一起指定。对Cache-off存储使用VSTDOFF。Where st = {b, b9t, h, w, 4, 8, 16, 32, 64}, Rs = {VRs, VRAs, SRs}. Note that .b and .b9t specify the same operation, and .64 and VRAs cannot be specified together. Use VSTDOFF for Cache-off storage.

说明illustrate

存储来自从当前或替代组的两个向量寄存器或两个标量寄存器。Store two vector registers or two scalar registers from the current or alternate bank.

操作operate

EA＝SR_b+{SR_i‖sex(IMM<7：0>)}；EA=SR _b +{SR _i ‖sex(IMM<7:0>)};

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

MEM[EA]＝见下表： ST 存储操作 .b BYTE[EA]＝SR_s<7：0>BYTE[EA+1]＝SR_s+1<7：0> .h HALF[EA]＝SR_s<15：0>HALF[EA+2]＝SR_s+1<15：0> .w WORD[EA]＝SR_s<31：0>WORD[EA+4]＝SR_s+1<31：0> .4 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 3BYTE[EA+4+i]＝VR_s+1<9i+7：9i>，i＝0 to 3 .8 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 7BYTE[EA+8+i]＝VR_s+1<9i+7：9i>，i＝0 to 7 .16 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 15BYTE[EA+16+i]＝VR_s+1<9i+7：9i>，i＝0 to 15 ST 存储操作 .32 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 31BYTE[EA+32+i]＝VR_s+1<9i+7：9i>，i＝0 to 31 .64 BYTE[EA+i]＝VR_0s<9i+7：9i>，i＝0 to 31BYTE[EA+32+i]＝VR_1s<9i+7：9i>，i＝0 to 31BYTE[EA+64+i]＝VR_0s+1<9i+7：9i>，i＝0 to 31BYTE[EA+96+i]＝VR_1s+1<9i+7：9i>，i＝0 to 31 MEM[EA]=see the table below: ST storage operation .b BYTE[EA]=SR _s <7:0> BYTE[EA+1]=SR _s+1 <7:0> .h HALF[EA]= _SRs <15:0>HALF[EA+2]=SRs ₊₁ <15:0> .w WORD[EA]=SR _s <31:0>WORD[EA+4]=SR _s+1 <31:0> .4 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 3 BYTE[EA+4+i]=VR _s+1 <9i+7:9i>, i=0 to 3 .8 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 7 BYTE[EA+8+i]=VR _s+1 <9i+7:9i>, i=0 to 7 .16 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 15 BYTE[EA+16+i]=VR _s+1 <9i+7:9i>, i=0 to 15 ST storage operation .32 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 31 BYTE[EA+32+i]=VR _s+1 <9i+7:9i>, i=0 to 31 .64 BYTE[EA+i]=VR _0s <9i+7:9i>, i=0 to 31BYTE[EA+32+i]=VR _1s <9i+7:9i>, i=0 to 31BYTE[EA+64+ i]=VR _0s+1 <9i+7:9i>, i=0 to 31BYTE[EA+96+i]=VR _1s+1 <9i+7:9i>, i=0 to 31

异常abnormal

编程注解programming notes

VSTQ 四存储VSTQ Quad storage

格式Format

汇编器句法assembler syntax

VSTQ.st Rs，SRb，SRiVSTQ.st Rs, SRb, SRi

VSTQ.st Rs，SRb，#IMMVSTQ.st Rs, SRb, #IMM

VSTQ.st Rs，SRb+，SRiVSTQ.st Rs, SRb+, SRi

VSTQ.st Rs，SRb+，#IMMVSTQ.st Rs, SRb+, #IMM

注意.b和.b9t指定相同的操作，.64和VRAs不能被一起指定。对Cache-off存储使用VSTQOFF。Note that .b and .b9t specify the same operation, and .64 and VRAs cannot be specified together. Use VSTQOFF for Cache-off storage.

说明illustrate

存储来自当前或替代组的四个向量寄存器或四个标量寄存器。Store four vector registers or four scalar registers from the current or alternate bank.

操作operate

EA＝SR_b+{SR_i‖sex(IMM<7：0>)}；EA=SR _b +{SR _i ‖sex(IMM<7:0>)};

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

MEM[EA]＝见下表： ST 存储操作 .b BYTE[EA]＝SR_s<7：0>BYTE[EA+1]＝SR_s+1<7：0>BYTE[EA+2]＝SR_s+2<7：0>BYTE[EA+3]＝SR_s+3<7：0> .h HALF[EA]＝SR_s<15：0>HALF[EA+2]＝SR_s+1<15：0>HALF[EA+4]＝SR_s+2<15：0>HALF[EA+6]＝SR_s+3<15：0> .w WORD[EA]＝SR_s<31：0>WORD[EA+4]＝SR_s+1<31：0>WORD[EA+8]＝SR_s+2<31：0>WORD[EA+12]＝SR_s+3<31：0> ST 存储操作 .4 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 3BYTE[EA+4+i]＝VR_s+1<9i+7：9i>，i＝0 to 3BYTE[EA+8+i]＝VR_s+2<9i+7：9i>，i＝0 to 3BYTE[EA+12+i]＝VR_s+3<9i+7：9i>，i＝0 to 3 .8 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 7BYTE[EA+8+i]＝VR_s+1<9i+7：9i>，i＝0 to 7BYTE[EA+16+i]＝VR_s+2<9i+7：9i>，i＝0 to 7BYTE[EA+24+i]＝VR_s+3<9i+7：9i>，i＝0 to 7 .16 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 15BYTE[EA+16+i]＝VR_s+1<9i+7：9i>，i＝0 to 15BYTE[EA+32+i]＝VR_s+2<9i+7：9i>，i＝0 to 15BYTE[EA+48+i]＝VR_s+3<9i+7：9i>，i＝0 to 15 .32 BYTE[EA+i]＝VR_s<9i+7：9i>，i＝0 to 31BYTE[EA+32+i]＝VR_s+1<9i+7：9i>，i＝0 to 31BYTE[EA+64+i]＝VR_s+2<9i+7：9i>，i＝0 to 31BYTE[EA+96+i]＝VR_s+3<9i+7：9i>，i＝0 to 31 .64 BYTE[EA+i]＝VR_0s<9i+7：9i>，i＝0 to 31BYTE[EA+32+i]＝VR_1s<9i+7：9i>，i＝0 to 31BYTE[EA+64+i]＝VR_0s+1<9i+7：9i>，i＝0 to 31BYTE[EA+96+i]＝VR_1s+1<9i+7：9i>，i＝0 to 31BYTE[EA+128+i]＝VR_0s+2<9i+7：9i>，i＝0 to 31BYTE[EA+160+i]＝VR_1s+2<9i+7：9i>，i＝0 to 31BYTE[EA+192+i]＝VR_0s+3<9i+7：9i>，i＝0 to 31BYTE[EA+224+i]＝VR_1s+3<9i+7：9i>，i＝0 to 31 MEM[EA]=see the table below: ST storage operation .b BYTE[EA]=SR _s <7:0> BYTE[EA+1]=SR _s+1 <7:0> BYTE[EA+2]=SR _s+2 <7:0> BYTE[EA+3] =SRs ₊₃ <7:0> .h HALF[EA]=SR _s <15:0>HALF[EA+2]=SR _s+1 <15:0>HALF[EA+4]=SR _s+2 <15:0>HALF[EA+6] =SRs ₊₃ <15:0> .w WORD[EA]=SR _s <31:0>WORD[EA+4]=SR _s+1 <31:0>WORD[EA+8]=SR _s+2 <31:0>WORD[EA+12] =SRs ₊₃ <31:0> ST storage operation .4 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 3BYTE[EA+4+i]=VR _s+1 <9i+7:9i>, i=0 to 3BYTE[EA+ 8+i]=VR _s+2 <9i+7:9i>, i=0 to 3BYTE[EA+12+i]=VR _s+3 <9i+7:9i>, i=0 to 3 .8 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 7BYTE[EA+8+i]=VR _s+1 <9i+7:9i>, i=0 to 7BYTE[EA+ 16+i]=VR _s+2 <9i+7:9i>, i=0 to 7BYTE[EA+24+i]=VR _s+3 <9i+7:9i>, i=0 to 7 .16 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 15BYTE[EA+16+i]=VR _s+1 <9i+7:9i>, i=0 to 15BYTE[EA+ 32+i]=VR _s+2 <9i+7:9i>, i=0 to 15BYTE[EA+48+i]=VR _s+3 <9i+7:9i>, i=0 to 15 .32 BYTE[EA+i]=VR _s <9i+7:9i>, i=0 to 31BYTE[EA+32+i]=VR _s+1 <9i+7:9i>, i=0 to 31BYTE[EA+ 64+i]=VR _s+2 <9i+7:9i>, i=0 to 31BYTE[EA+96+i]=VR _s+3 <9i+7:9i>, i=0 to 31 .64 BYTE[EA+i]=VR _0s <9i+7:9i>, i=0 to 31BYTE[EA+32+i]=VR _1s <9i+7:9i>, i=0 to 31BYTE[EA+64+ i]=VR _0s+1 <9i+7:9i>, i=0 to 31BYTE[EA+96+i]=VR _1s+1 <9i+7:9i>, i=0 to 31BYTE[EA+128+ i]=VR _0s+2 <9i+7:9i>, i=0 to 31BYTE[EA+160+i]=VR _1s+2 <9i+7:9i>, i=0 to 31BYTE[EA+192+ i]=VR _0s+3 <9i+7:9i>, i=0 to 31BYTE[EA+224+i]=VR _1s+3 <9i+7:9i>, i=0 to 31

异常abnormal

编程注解programming notes

VSTR 反向存储VSTR reverse storage

格式Format

汇编器句法assembler syntax

VSTR.st Rs，SRb，SRiVSTR.st Rs, SRb, SRi

VSTR.st Rs，SRb，#IMMVSTR.st Rs, SRb, #IMM

VSTR st Rs，SRb+，SRiVSTR st Rs, SRb+, SRi

VSTR.st Rs，SRb+，#IMMVSTR.st Rs, SRb+, #IMM

其中st＝{4，8，16，32，64}，Rs＝{VRs，VRAs}。注意.64和VRAs不能被一起指定。对Cache-off存储使用VSTROFF。where st = {4, 8, 16, 32, 64}, Rs = {VRs, VRAs}. Note that .64 and VRAs cannot be specified together. Use VSTROFF for Cache-off storage.

说明illustrate

按逆元素顺序存储向量寄存器。本指令不支持标量数据源寄存器。Store vector registers in reverse element order. This instruction does not support scalar data source registers.

操作operate

EA＝SR_b+{SR_i‖sex(IMM<7：0>)}；EA=SR _b +{SR _i ‖sex(IMM<7:0>)};

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

MEM[EA]＝见下表： ST 存储操作 .b BYTE[EA+i]＝VR_s[31-i]<7：0>，for i＝0 to 31 .h HALF[EA+i]＝VR_s[15-i]<15：0>，for i＝0 to 15 .w WORD[EA+i]＝VR_s[7-i]<31：0>，for i＝0 to 7 .4 BYTE[EA+i]＝VR_s[31-i]<7：0>，i＝0 to 3 .8 BYTE[EA+i]＝VR_s[31-i ]<7：0>，i＝0 to 7 .16 BYTE[EA+i]＝VR_s[31-i]<7：0>，i＝0 to 15 .32 BYTE[EA+i]＝VR_s[31-i]<7：0>，i＝0 to 31 .64 BYTE[EA+32+i]＝VR_0s[31-i]<7：0>，i＝0 to 31BYTE[EA+i]＝VR_1s[31-i]<7：0>，i＝0 to 31 MEM[EA]=see the table below: ST storage operation .b BYTE[EA+i]=VR _s [31-i]<7:0>, for i=0 to 31 .h HALF[EA+i]=VR _s [15-i]<15:0>, for i=0 to 15 .w WORD[EA+i]=VR _s [7-i]<31:0>, for i=0 to 7 .4 BYTE[EA+i]=VR _s [31-i]<7:0>, i=0 to 3 .8 BYTE[EA+i]=VR _s [31-i]<7:0>, i=0 to 7 .16 BYTE[EA+i]=VR _s [31-i]<7:0>, i=0 to 15 .32 BYTE[EA+i]=VR _s [31-i]<7:0>, i=0 to 31 .64 BYTE[EA+32+i]=VR _0s [31-i]<7:0>, i=0 to 31BYTE[EA+i]=VR _1s [31-i]<7:0>, i=0 to 31

异常abnormal

编程注解programming notes

VSTWS 跨距存储VSTWS span storage

格式Format

汇编器句法assembler syntax

VSTWS.st Rs，SRb，SRiVSTWS.st Rs, SRb, SRi

VSTWS.st Rs，SRb，#IMMVSTWS.st Rs, SRb, #IMM

VSTWS.st Rs，SRb+，SRiVSTWS.st Rs, SRb+, SRi

VSTWS.st Rs，SRb+，#IMMVSTWS.st Rs, SRb+, #IMM

其中st＝[8，16，32}，Rs＝{VRs，VRAs}。注意不支持.64模式，用VST替代。对Cache-off存储使用VSTWSOFF。where st = [8, 16, 32}, Rs = {VRs, VRAs}. Note that .64 mode is not supported, use VST instead. Use VSTWSOFF for Cache-off storage.

说明illustrate

从有效地址开始，用标量寄存器SR_b+1作为跨距控制寄存器，从向量寄存器VRs向存储器存储32个字节。Starting from the effective address, store 32 bytes from the vector register VRs to the memory using the scalar register SR _b+1 as the stride control register.

ST指示block size、从每个块存储的连续字节数。SR_b+1指示stride、分隔两个连续块的起始的字节数。ST indicates the block size, the number of consecutive bytes stored from each block. SR _b+1 indicates stride, the number of bytes separating the beginning of two consecutive blocks.

Stride必须等于或大于block size。EA必须对齐数据长度。stride和block size必须是多数据长度。Stride must be equal to or greater than block size. EA must align data length. stride and block size must be multiple data lengths.

操作operate

EA＝SR_b+{SR_i‖sex(IMM<7：0>)}；EA=SR _b +{SR _i ‖sex(IMM<7:0>)};

if(A＝1)SR_b＝EA；if(A=1)SR _b =EA;

Block-size＝{4‖8‖16‖32}；Block-size={4‖8‖16‖32};

Stride＝SR_b+1＜31：0)；Stride=SR _b+1 <31:0);

for(i＝0；i＜VECSIZE/Block-size；i⁺⁺)for(i=0; i<VECSIZE/Block-size; i ⁺⁺ )

for(j＝0；j＜Block-size；j⁺⁺)for(j=0; j<Block-size; j ⁺⁺ )

BYTE[EA+I*Stride+j]＝VRs{i*Block-size+j}<7：0>；BYTE[EA+I*Stride+j]=VRs{i*Block-size+j}<7:0>;

异常abnormal

VSUB 减VSUB minus

格式Format

汇编器句法assembler syntax

VSUB.dt VRd，VRa，VRbVSUB.dt VRd, VRa, VRb

VSUB.dt VRd，VRa，SRbVSUB.dt VRd, VRa, SRb

VSUB.dt VRd，VRa，#IMMVSUB.dt VRd, VRa, #IMM

VSUB.dt SRd，SRa，SRbVSUB.dt SRd, SRa, SRb

VSUB.dt SRd，SRa，#IMMVSUB.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w，f}。where dt = {b, b9, h, w, f}.

说明illustrate

从向量/标量寄存器Ra的内容中减去向量/标量寄存器Rb的内容，其结果存放在向量/标量寄存器Rd中。The contents of the vector/scalar register Rb are subtracted from the contents of the vector/scalar register Ra, and the result is stored in the vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]＝Ra[i]-Bop[i]；Rd[i]=Ra[i]-Bop[i];

}}

异常abnormal

VSUBS 减及置位VSUBS Subtract and set

格式Format

汇编器句法assembler syntax

VSUBS.dt SRd，SRa，SRbVSUBS.dt SRd, SRa, SRb

VSUBS.dt SRd，SRa，#IMMVSUBS.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w，f}。where dt = {b, b9, h, w, f}.

支持的模式 D：S：M S＜-S@S S＜-S@I DS int8(b) int9(b9) int16(h) int32(w) float(f) supported modes D:S:M S<-S@S S<-S@I DS int8(b) int9(b9) int16(h) int32(w) float(f)

说明illustrate

从SRa减去SRb；结果存入SRd，并且设置VCSR中的VFLAG位。Subtract SRb from SRa; store result in SRd and set VFLAG bit in VCSR.

操作operate

Bop＝{SRb‖sex(IMM<8：0>)}；Bop={SRb‖sex(IMM<8:0>)};

SRd＝SRa-Bop；SRd = SRa - Bop;

VCSR<lt，eq，gt>＝status(SRa-Bop)；VCSR<lt, eq, gt> = status(SRa-Bop);

异常abnormal

VUNSHFL 去混洗VUNSHFL to shuffle

格式Format

汇编器句法assembler syntax

VUNSHFL.dt VRc，VRd，VRa，VRbVUNSHFL.dt VRc, VRd, VRa, VRb

VUNSHFL.dt VRc，VRd，VRa，SRbVUNSHFL.dt VRc, VRd, VRa, SRb

其中dt＝{b，b9，h，w，f}。注意.w和.f指示相同的操作。where dt = {b, b9, h, w, f}. Note that .w and .f indicate the same operation.

说明illustrate

向量寄存器VRa的内容与Rb去混洗并送入向量寄存器VRc：VRd，如下所示：The contents of vector register VRa are deshuffled with Rb and sent to vector registers VRc:VRd as follows:

操作operate

异常abnormal

无。none.

编程注解programming notes

本指令不使用元素屏蔽。This directive does not use element masking.

VUNSHFLH 去混洗高位VUNSHFLH to shuffle high

格式Format

汇编器句法assembler syntax

VUNSHFLH.dt VRd，VRa，VRbVUNSHFLH.dt VRd, VRa, VRb

VUNSHFLH.dt VRd，VRa，SRbVUNSHFLH.dt VRd, VRa, SRb

说明illustrate

向量寄存器VRa的内容与Rb被去混洗；结果的高位部分返回到向量寄存器VRd，如下所示：The contents of vector register VRa are deshuffled with Rb; the upper part of the result is returned to vector register VRd as follows:

操作operate

异常abnormal

无。none.

编程注解programming notes

本指令不使用元素屏蔽。This directive does not use element masking.

VUNSHFLL 去混洗低位VUNSHFLL to shuffle low bits

格式Format

汇编器句法assembler syntax

VUNSHFLL.dt VRd，VRa，VRbVUNSHFLL.dt VRd, VRa, VRb

VUNSHFLL.dt VRd，VRa，SRbVUNSHFLL.dt VRd, VRa, SRb

说明illustrate

向量寄存器VRa的内容与Rb被去混洗；结果的低位部分返回到向量寄存器VRd，如下所示：The contents of vector register VRa are deshuffled with Rb; the low-order part of the result is returned to vector register VRd as follows:

操作operate

异常abnormal

无。none.

编程注解programming notes

此指令不使用元素屏蔽。This directive does not use element masking.

VWBACKSP 从暂时存储器回写VWBACKSP Write back from temporary storage

格式Format

汇编器句法assembler syntax

VWBACKSP.ln SRp，SRb，SRiVWBACKSP.ln SRp, SRb, SRi

VWBACKSP.ln SRp，SRb，#IMMVWBACKSP.ln SRp, SRb, #IMM

VWEACKSP.ln SRp，SRb+，SRiVWEACKSP.ln SRp, SRb+, SRi

VWBACKSP.ln SRp，SRb+，#IMMVWBACKSP.ln SRp, SRb+, #IMM

其中ln＝{1，2，4，8}。注意VWBACK和VWBACKSP使用相同的操作码。where ln={1, 2, 4, 8}. Note that VWBACK and VWBACKSP use the same opcode.

说明illustrate

从暂时存储器向存储器传送多个64字节的块。有效地址给出存储器的起始地址，SRp给出暂时存储器的起始地址。64字节块的数目指定如下：Transfer multiple 64-byte blocks from temporary storage to storage. The effective address gives the starting address of the memory, and SRp gives the starting address of the scratch memory. The number of 64-byte blocks is specified as follows:

如果有效地址不落在64字节的边界上，则它首先截断以与64字节的边界对齐。如果SRp中的暂时存储器的指针地址不落在64字节的边界，也要截断并以及和64字节的边界对齐。对齐的暂时存储器的指针地址以传送的字节数增加。If the effective address does not fall on a 64-byte boundary, it is first truncated to align with a 64-byte boundary. If the address of the scratchpad pointer in SRp does not fall on a 64-byte boundary, it is also truncated and aligned to a 64-byte boundary. The pointer address of the aligned scratchpad is incremented by the number of bytes transferred.

操作operate

EA＝SRb+{SRi‖sex(IMM<7：0>)}；EA＝SRb+{SRi‖sex(IMM<7:0>)};

if(A＝1)SRb＝EA；if(A=1)SRb=EA;

Num_bytes＝{64‖128‖256‖512}；Num_bytes={64‖128‖256‖512};

Mem_adrs＝EA<31：6>：6b′000000；Mem_adrs=EA<31:6>:6b′000000;

SRp＝SRp<31：6>：6b′000000；SRp=SRp<31:6>:6b′000000;

for(i＝0；i＜Num_bytes；i++)for(i=0; i<Num_bytes; i++)

SPAD[SRp++]＝MEM[Mem_adrs+i]；SPAD[SRp++]=MEM[Mem_adrs+i];

异常abnormal

无效的数据地址异常事件。Invalid data address exception event.

VXNOR 异或非VXNOR XOR

格式Format

汇编器句法assembler syntax

VXNOR.dt VRd，VRa，VRbVXNOR.dt VRd, VRa, VRb

VXNOR.dt VRd，VRa，SRbVXNOR.dt VRd, VRa, SRb

VXNOR.dt VRd，VRa，#IMMVXNOR.dt VRd, VRa, #IMM

VXNOR.dt SRd，SRa，SRbVXNOR.dt SRd, SRa, SRb

VXNOR.dt SRd，SRa，#IMMVXNOR.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

向量/标量寄存器Ra的内容与向量/标量寄存器Rb的内容进行逻辑异或非，结果存入向量/标量寄存器Rd中。The contents of the vector/scalar register Ra and the contents of the vector/scalar register Rb are logically XORed, and the result is stored in the vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]<k>＝-(Ra[i]<k>^Bop[i]<k>)，for k＝all bits in elementi；Rd[i]<k>＝-(Ra[i]<k>^Bop[i]<k>), for k＝all bits in elementi;

}}

异常abnormal

无。none.

VXOR 异或VXOR XOR

格式Format

汇编器句法assembler syntax

VXOR.dt VRd，VRa，VRbVXOR.dt VRd, VRa, VRb

VXOR.dt VRd，VRa，SRbVXOR.dt VRd, VRa, SRb

VXOR.dt VRd，VRa，#IMMVXOR.dt VRd, VRa, #IMM

VXOR.dt SRd，SRa，SRbVXOR.dt SRd, SRa, SRb

VXOR.dt SRd，SRa，#IMMVXOR.dt SRd, SRa, #IMM

其中dt＝{b，b9，h，w}。where dt = {b, b9, h, w}.

说明illustrate

向量/标量寄存器Ra的内容与向量/标量寄存器Rb的内容进行逻辑异或，结果存入向量/标量寄存器Rd。The contents of the vector/scalar register Ra and the contents of the vector/scalar register Rb are logically ORed, and the result is stored in the vector/scalar register Rd.

操作operate

for(i＝0；i＜NumElem && EMASK[i]；i++){for(i=0; i<NumElem &&EMASK[i]; i++){

Rd[i]<k>＝Ra[i]<k>^Bop[i]<k>，for k＝all bits in elementi；Rd[i]<k>＝Ra[i]<k>^Bop[i]<k>, for k＝all bits in elementi;

}}

异常abnormal

无。none.

VXORALL 异或全部元素VXORALL XOR all elements

格式Format

汇编器句法assembler syntax

VXORALL.dt SRd，VRbVXORALL.dt SRd, VRb

其中dt＝{b，b9，h，w}。注意.b和.b9指定相同的操作。where dt = {b, b9, h, w}. Note that .b and .b9 specify the same operation.

支持的模式 DS int8(b) int9(b9) int16(h) int32(w) supported modes DS int8(b) int9(b9) int16(h) int32(w)

说明illustrate

VRb中每个元素的最低有效位一起被异或，1位的结果返回到SRd的最低有效位上。此指令不受元素屏蔽的影响。The least significant bits of each element in VRb are XORed together, and the 1-bit result is returned to the least significant bit of SRd. This directive is not affected by elemental masking.

操作operate

异常abnormal

无。none.

VWBACK 回写VWBACK write back

格式Format

汇编器句法assembler syntax

VWBACK.ln SRb，SRiVWBACK.ln SRb, SRi

VWBACK.ln SRb，#IMMVWBACK.ln SRb, #IMM

VWBACK.ln SRb+，SRiVWBACK.ln SRb+, SRi

VWBACK.ln SRb+，#IMMVWBACK.ln SRb+, #IMM

其中ln＝{1，2，4，8}。where ln={1, 2, 4, 8}.

说明illustrate

其索引被向量数据Cache中的EA指定的(与其标签同EA匹配的那些相反)Cache行，如其包含修改过的数据，则被更新到存储器中。如果多于一个Cache行被指定，当它们包含修改过的数据时，后面相继的Cache行被更新到存储器。Cache行的数目指定如下：Cache lines whose indices are specified by the EA in the Vector Data Cache (as opposed to those whose tags match the EA) are updated into memory if they contain modified data. If more than one cache line is specified, subsequent cache lines are updated to memory when they contain modified data. The number of cache lines is specified as follows:

LN<1：0>＝00：写1个64字节的Cache行LN<1:0>=00: write a 64-byte cache line

LN<1：0>＝01：写2个64字节的Cache行LN<1:0>=01: write two 64-byte Cache lines

LN<1：0>＝10：写4个64字节的Cache行LN<1:0>=10: write four 64-byte Cache lines

LN<1：0>＝11：写8个64字节的Cache行LN<1:0>=11: Write 8 Cache lines of 64 bytes

如果有效地址不落在64字节的边界上，则它首先截断以与64字节的边界对齐。If the effective address does not fall on a 64-byte boundary, it is first truncated to align with a 64-byte boundary.

操作operate

异常abnormal

无效的数据地址异常事件。Invalid data address exception event.

编程注解programming notes

EA<31：0>指出局部存储器中的字节地址。EA<31:0> indicates the byte address in local memory.

Claims

1. A processor comprising:

a scalar register, suitable for storing a single scalar value;

a vector register suitable for storing multiple data elements; and

processing circuitry coupled to said scalar register and said vector register, wherein the processing circuitry performs a plurality of operations in parallel in response to a single instruction, each operation combining a data element in said vector register with a data element in said scalar register The scalar values of are combined.

2. A method of operating a processing circuit to execute instructions, comprising:

read the register data elements that make up the vector-valued components; and

Performs a parallel operation that combines a scalar value with each data element to produce a vector result.

3. The method of claim 2, wherein the parallel operations performed include multiplying the scalar value by each of the data elements to produce a vector data result.

4. The method of claim 2, wherein said parallel operations performed include adding said scalar value to each of said data elements to produce a vector data result.

5. The method of claim 2, further comprising reading the scalar value from another register adapted to store a single scalar value to combine with the data element.

6. The method of claim 2, further comprising extracting the scalar value from an instruction to combine with the data element.

7. A method of operating a processor, comprising:

providing a plurality of scalar registers and a plurality of vector registers in said processor, wherein each scalar register is adapted to store a single scalar value and each vector register is adapted to store a plurality of data elements constituting a vector component;

Assign each scalar register a register number that is different from the register numbers assigned to other scalar registers;

assigning each vector register a register number different from register numbers assigned to other vector registers, wherein at least some of the register numbers assigned to the vector registers are the same as the register numbers assigned to the scalar registers;

forming an instruction comprising a first operand and a second operand, wherein the first operand is a register number identifying a scalar register and the second operand is a register number identifying a vector register; and

The instruction is executed to transfer data between data elements in the one scalar register identified by the first operand and the one vector register identified by the second operand.

8. The method of claim 7, wherein:

The instruction formed also includes a third operand for identifying data elements in a vector; and wherein

The instruction is executed to transfer data between data elements identified by the third operand in the scalar register identified by the first operand and the vector register identified by the second operand.

9. The method of claim 7, wherein:

said instruction formed also includes a third operand identifying another scalar register; and wherein

executing the instruction to operate in the scalar register identified by the first operand and in the vector register identified by the second operand and identified by a stored value in the other scalar register Transfer data between data elements.