JP2005322274A

JP2005322274A - Microcomputer

Info

Publication number: JP2005322274A
Application number: JP2005222618A
Authority: JP
Inventors: Hiroshi Osuga; 宏大須賀; Atsushi Kiuchi; 淳木内; Hironori Hasegawa; 博宣長谷川; Toru Umaji; 徹馬路; Yoshiki Noguchi; 孝樹野口; Yasushi Akao; 泰赤尾; Shiro Baba; 志朗馬場
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 1995-05-02
Filing date: 2005-08-01
Publication date: 2005-11-17
Anticipated expiration: 2015-12-14
Also published as: JP3733137B2

Abstract

<P>PROBLEM TO BE SOLVED: To speed up digital signal processing in a microcomputer in which a DSP engine is mounted on a single LSI chip together with a CPU core. <P>SOLUTION: A built-in memory is divided into two types, that is, first memories 5 and 7, and second memories 4 and 6, and is made accessible in parallel by third buses XAB and XDB, and second buses YAB and YDB respectively. Operand access and address calculation processing necessary for data processing by the DSP engine is carried out by a CPU. The CPU core 2 can simultaneously transfer two data values from the built-in memory to the DSP engine 3. The third buses XAB and XDB, and the second buses YAB and YDB are separate from first buses IAB and IDB, and the CPU core 2 can execute instruction fetching, etc. through the first buses IAB and IDB in parallel with the access of the second memories 4 and 6, and the first memories 5 and 7. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明はセントラルプロセッシングユニットとディジタル信号処理ユニットとを有する半導体集積回路化された論理ＬＳＩに係り、高速演算処理を要するマイクロコンピュータに適用して有効な技術に関する。 The present invention relates to a logic LSI formed as a semiconductor integrated circuit having a central processing unit and a digital signal processing unit, and relates to a technique effective when applied to a microcomputer requiring high-speed arithmetic processing.

算術論理演算器と共に乗算器を同一チップに搭載したマイクロコンピュータについて記載されたものの例としては特願平４−２９６７７８号又は米国特許出願第１４５１５７号がある。これによればマイクロコンピュータのようなロジックＬＳＩチップは、セントラルプロセッシングユニット、バス、メモリ、乗算器を供え、特にメモリからデータを読み出す間に、該読出しデータに関する乗算命令のコマンドをセントラルプロセッシングユニットから乗算器へ転送するコマンド信号線を有する。その結果、セントラルプロセッシングユニットがメモリからデータを読み出す間に、読み出しデータに関する乗算命令のコマンドをセントラルプロセッシングユニットから乗算器へ転送するので、メモリと乗算器との間で直接データを転送することが可能になる。 Japanese Patent Application No. 4-296778 or US Patent Application No. 145157 is an example of a microcomputer in which a multiplier and an arithmetic logic unit are mounted on the same chip. According to this, a logic LSI chip such as a microcomputer is provided with a central processing unit, a bus, a memory, and a multiplier. In particular, while reading data from the memory, a command of a multiplication instruction relating to the read data is multiplied from the central processing unit. A command signal line to be transferred to the device. As a result, while the central processing unit reads data from the memory, the command of the multiplication instruction relating to the read data is transferred from the central processing unit to the multiplier, so that the data can be directly transferred between the memory and the multiplier. become.

特願平４−２９６７７８号公報Japanese Patent Application No. 4-296778 米国特許出願第１４５１５７号明細書US Patent Application No. 145157

本発明者らはセントラルプロセッシングユニットと共にディジタル信号処理ユニットを一つのＬＳＩに搭載してディジタル信号処理を高速化することについて検討した。このとき、前記従来技術はメモリから乗算器へ直接データを転送可能にしている点において乗算処理の高速化を実現しているが、セントラルプロセッシングユニットによる命令実行のパイプライン処理を想定したとき、セントラルプロセッシングユニットが実行すべき命令のフェッチサイクルと乗算処理のためのメモリアクセスサイクルとが競合するような事態に対しては考慮されていなかった。また、加算や乗算のための複数のオペランドを並列的にメモリから読み出して演算処理の高速化を図る点についても考慮されていない。更にその場合には、セントラルプロセッシングユニットによる外部アクセスとの関係も考慮しなければ、マイクロコンピュータの使い勝手が悪くなることが見出された。また、セントラルプロセッシングユニットと共にディジタル信号処理ユニットを一つのＬＳＩに搭載する場合、ＣＰＵ命令とＤＳＰ命令とのコード割り当てやＤＳＰ命令のフォーマットを工夫することも、命令デコード回路などの論理規模の増大を極力抑える上においては必要であることが見出された。 The inventors of the present invention studied to increase the speed of digital signal processing by mounting a digital signal processing unit together with a central processing unit on one LSI. At this time, the prior art realizes speeding up of multiplication processing in that data can be directly transferred from the memory to the multiplier. However, when pipeline processing of instruction execution by the central processing unit is assumed, The situation where the fetch cycle of the instruction to be executed by the processing unit and the memory access cycle for the multiplication process conflict is not considered. In addition, it is not considered that a plurality of operands for addition and multiplication are read from the memory in parallel to speed up the arithmetic processing. Furthermore, in that case, it was found that the usability of the microcomputer deteriorates if the relationship with the external access by the central processing unit is not taken into consideration. In addition, when a digital signal processing unit is mounted on a single LSI together with a central processing unit, devising the code assignment of CPU instructions and DSP instructions and the format of DSP instructions can increase the logic scale of instruction decode circuits as much as possible. It has been found necessary to suppress.

本発明の目的は、セントラルプロセッシングユニットと共にディジタル信号処理ユニットを一つのＬＳＩに搭載してディジタル信号処理を高速化することにある。本発明の別の目的は、セントラルプロセッシングユニットと共にディジタル信号処理ユニットを一つのＬＳＩに搭載したとき、その物理的な規模の増大を極力抑えることである。 An object of the present invention is to mount a digital signal processing unit together with a central processing unit in one LSI to speed up digital signal processing. Another object of the present invention is to suppress an increase in the physical scale as much as possible when a digital signal processing unit is mounted on a single LSI together with a central processing unit.

本発明の前記並びにその他の目的と新規な特徴は本明細書の記述及び添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち代表的なものの概要を簡単に説明すれば下記の通りである。 The following is a brief description of an outline of typical inventions disclosed in the present application.

すなわち、マイクロコンピュータは、セントラルプロセッシングユニット（２）と、該セントラルプロセッシングユニットから選択的にアドレスが伝達される第１乃至第３のアドレスバス（ＩＡＢ，ＹＡＢ，ＸＡＢ）と、前記第１のアドレスバス（ＩＡＢ）及び第２のアドレスバス（ＹＡＢ）に接続され、セントラルプロセッシングユニットからのアドレスによってアクセスされる第１のメモリ（５，７）と、前記第１のアドレスバス（ＩＡＢ）及び第３のアドレスバス（ＸＡＢ）に接続され、セントラルプロセッシングユニットからのアドレスによってアクセスされる第２のメモリ（４，６）と、前記第１及び第２のメモリと前記セントラルプロセッシングユニットとに接続されてデータが伝達される第１のデータバス（ＩＤＢ）と、前記第１のメモリに接続されてデータが伝達される第２のデータバス（ＹＤＢ）と、前記第２のメモリに接続されてデータが伝達される第３のデータバス（ＸＤＢ）と、前記第１のアドレスバスと第１のデータバスに接続された外部インタフェース回路（１２）と、第１乃至第３のデータバスに接続されセントラルプロセッシングユニットに同期動作されるディジタル信号処理ユニット（３）と、前記ディジタル信号処理ユニットの動作を制御するＤＳＰ制御信号（２０）をセントラルプロセッシングユニットからディジタル信号処理ユニットに伝達する制御信号線とを１チップに含んで半導体集積回路化されて成る。 That is, the microcomputer includes a central processing unit (2), first to third address buses (IAB, YAB, XAB) to which addresses are selectively transmitted from the central processing unit, and the first address bus. (IAB) and a second address bus (YAB) connected to the first memory (5, 7) by an address from the central processing unit, and the first address bus (IAB) and the third address bus Data is connected to the second memory (4, 6) connected to the address bus (XAB) and accessed by the address from the central processing unit, and to the first and second memories and the central processing unit. A first data bus (IDB) to be transmitted; A second data bus (YDB) connected to the first memory and transmitting data; a third data bus (XDB) connected to the second memory and transmitting data; and the first data bus An external interface circuit (12) connected to the address bus and the first data bus; a digital signal processing unit (3) connected to the first to third data buses and operated in synchronization with the central processing unit; A single chip includes a control signal line for transmitting a DSP control signal (20) for controlling the operation of the signal processing unit from the central processing unit to the digital signal processing unit.

上記した手段によれば、内蔵メモリは、ディジタル信号処理プロセッサ（３）による積和演算を考慮して第１のメモリ（５，７）と第２のメモリ（４，６）に２面化され、セントラルプロセッシングユニット（２）が第１のメモリと第２のメモリを第３のバス（ＸＡＢ，ＸＤＢ）と第２のバス（ＹＡＢ，ＹＤＢ）によってそれぞれ並列的にアクセス可能にされている。これにより、内蔵メモリから２個のデータを同時にディジタル信号処理ユニットに転送可能にされる。さらに、第３のバス（ＸＡＢ，ＸＤＢ）と第２のバス（ＹＡＢ，ＹＤＢ）は、外部にインタフェースされる第１のバス（ＩＡＢ，ＩＤＢ）とも個別化されているので、セントラルプロセッシングユニットは第２のメモリ（４，６）と第１のメモリ（５，７）のアクセスに並行して外部メモリアクセスも可能にされる。このように、それぞれセントラルプロセッシングユニット（２）に接続された第１乃至第３の３種類のアドレスバス（ＩＡＢ，ＸＡＢ，ＹＡＢ）及びデータバス（ＩＤＢ，ＸＤＢ，ＹＤＢ）があるために、当該３種類の内部バスを使用して同一クロックサイクルで異なるメモリアクセス動作を実行することが可能である。したがって、プログラムやデータが外部メモリに存在する場合にも容易に対応して演算処理の高速化を実現できる。 According to the above-described means, the built-in memory is divided into two planes, the first memory (5, 7) and the second memory (4, 6) in consideration of the product-sum operation by the digital signal processor (3). The central processing unit (2) allows the first memory and the second memory to be accessed in parallel by the third bus (XAB, XDB) and the second bus (YAB, YDB), respectively. As a result, two pieces of data from the built-in memory can be simultaneously transferred to the digital signal processing unit. Furthermore, since the third bus (XAB, XDB) and the second bus (YAB, YDB) are also individualized with the first bus (IAB, IDB) interfaced to the outside, the central processing unit is the first bus. External memory access is also enabled in parallel with the access to the second memory (4, 6) and the first memory (5, 7). In this way, since there are first to third three types of address buses (IAB, XAB, YAB) and data buses (IDB, XDB, YDB) respectively connected to the central processing unit (2), the 3 Different types of internal buses can be used to perform different memory access operations in the same clock cycle. Therefore, it is possible to easily cope with the case where the program and data exist in the external memory and to speed up the arithmetic processing.

マイクロコンピュータの使い勝手を向上させるには、前記第１のメモリと第２のメモリの夫々をＲＡＭとＲＯＭから構成するとよい。 In order to improve the usability of the microcomputer, each of the first memory and the second memory may be composed of a RAM and a ROM.

前記セントラルプロセッシングユニットにおける積和演算などの繰返し演算のためのアドレス生成の高速化のためには、セントラルプロセッシングユニットはモジュロアドレス出力部（２００）を備えるとよい。このとき、モジュロアドレス出力部で生成されたアドレスは前記第２又は第３のアドレスバスに選択的に出力可能にすることが望ましい。 In order to increase the speed of address generation for repetitive operations such as product-sum operations in the central processing unit, the central processing unit may include a modulo address output unit (200). At this time, it is preferable that the address generated by the modulo address output unit can be selectively output to the second or third address bus.

前記ディジタル信号処理プロセッサは、第１乃至第３のデータバス（ＩＤＢ，ＹＤＢ，ＸＤＢ）と個別的にインタフェースされる第１乃至第３のデータバッファ手段（ＭＤＢＩ，ＭＤＢＹ，ＭＤＢＸ）と、夫々のデータバッファ手段に内部バスを介して接続可能にされた複数のレジスタ手段（３０５〜３０８）と、前記内部バスに接続された乗算器（３０４）及び算術論理演算器（３０２）と、前記ＤＳＰ制御信号をデコードして前記データバッファ手段、乗算器、算術論理演算器、及びレジスタ手段の動作を制御するデコーダ（３４）とを含んで構成することができる。 The digital signal processor includes first to third data buffer means (MDBI, MDBY, MDBX) individually interfaced with first to third data buses (IDB, YDB, XDB) and respective data. A plurality of register means (305 to 308) connectable to the buffer means via an internal bus, a multiplier (304) and an arithmetic logic unit (302) connected to the internal bus, and the DSP control signal And a decoder (34) for controlling the operation of the data buffer means, multiplier, arithmetic logic unit, and register means.

命令デコードという点に着目したとき、マイクロコンピュータは、セントラルプロセッシングユニット（２）と、前記セントラルプロセッシングユニットによってアクセス制御されるメモリ（４〜７）と、前記メモリ及び前記セントラルプロセッシングユニットとの間でデータが伝達されセントラルプロセッシングユニットに同期動作されるディジタル信号処理ユニット（３）とを１チップに含んで半導体集積回路化される。このマイクロコンピュータによって実行可能な命令セットは、セントラルプロセッシングユニット（２）が実行すべきＣＰＵ命令と、データフェッチのためのアドレス演算等の一部の処理をセントラルプロセッシングユニットに負担させてディジタル信号処理ユニット（３）が実行すべきＤＳＰ命令とを含む。前記セントラルプロセッシングユニットは、前記データバスを介して１６ビット固定長のＣＰＵ命令と、１６ビット又は３２ビット長のＤＳＰ命令とをフェッチする命令レジスタ（２５）と、前記命令レジスタにフェッチされた命令の一部の複数ビットに基づいて、ＣＰＵ命令とＤＳＰ命令とを識別し、識別結果に応じて、前記ディジタル信号処理ユニットの動作制御のためのＤＳＰ制御信号（２０）とセントラルプロセッシングユニットの動作制御のためのＣＰＵ制御信号とを生成するデコーダ（２４）とを含んで構成することができる。 When attention is focused on instruction decoding, the microcomputer transmits data between the central processing unit (2), the memory (4-7) controlled to be accessed by the central processing unit, and the memory and the central processing unit. And a digital signal processing unit (3) operated in synchronization with the central processing unit on a single chip to be a semiconductor integrated circuit. The instruction set that can be executed by the microcomputer is a digital signal processing unit that causes the central processing unit to bear a part of processing such as CPU instruction to be executed by the central processing unit (2) and address calculation for data fetching. (3) includes a DSP instruction to be executed. The central processing unit includes an instruction register (25) for fetching a 16-bit fixed length CPU instruction and a 16-bit or 32-bit length DSP instruction via the data bus, and an instruction fetched to the instruction register. The CPU instruction and the DSP instruction are identified based on some of the plurality of bits, and the DSP control signal (20) for controlling the operation of the digital signal processing unit and the operation control of the central processing unit are determined according to the identification result. And a decoder (24) for generating a CPU control signal.

例えば、ＣＰＵ命令は命令コードの最上位４ビットが”００００”〜”１１１０”とされる範囲に割り当てられている。ＤＳＰ命令は、命令コードの最上位４ビットが”１１１１”とされる範囲に割り当てられている。さらに命令コードの最上位６ビットが”１１１１００”及び”１１１１０１”の範囲に割り当てられた命令は、ＤＳＰ命令でも１６ビット長の命令コードとされる。命令コードの最上位６ビットが”１１１１１０”の命令は、３２ビット長の命令コードとされる。命令コードの最上位６ビットが”１１１１１１”の範囲には命令を割り当てておらず、その範囲を未使用領域とする。このように、最大３２ビットの命令に対するコード割り当てに上記のような規則を設けることにより、各命令コードの一部例えば最上位側６ビットをデコードすれば、当該命令がＣＰＵ命令であるか、１６ビット長のＤＳＰ命令であるか、３２ビット長のＤＳＰ命令であるかを、小さな論理規模のデコーダで判定することができ、常に３２ビット全部を一度にデコードすることを要しない。 For example, the CPU instruction is assigned to a range in which the most significant 4 bits of the instruction code are “0000” to “1110”. The DSP instruction is assigned to a range in which the most significant 4 bits of the instruction code are “1111”. Further, an instruction in which the most significant 6 bits of the instruction code are assigned to a range of “111100” and “111101” is a 16-bit instruction code even in a DSP instruction. An instruction in which the most significant 6 bits of the instruction code are “111110” is a 32-bit instruction code. No instruction is assigned to the range in which the most significant 6 bits of the instruction code are “111111”, and the range is set as an unused area. In this way, by providing the above-described rules for code allocation for instructions of a maximum of 32 bits, if a part of each instruction code, for example, the most significant 6 bits is decoded, whether the instruction is a CPU instruction or 16 Whether the DSP instruction is a bit length DSP instruction or a 32-bit length DSP instruction can be determined by a small logic scale decoder, and it is not always necessary to decode all 32 bits at once.

前記デコーダは、命令レジスタの上位１６ビットをデコードして前記ＣＰＵデコード信号（２４３）及びＤＳＰデコード信号（２４４）を生成する第１のデコード回路（２４０）と、第１のデコード回路にて３２ビット長のＤＳＰ命令を識別したときには命令レジスタの下位１６ビットをコード化した信号を、それ以外の命令を識別したときには出力が無効であることを意味するコードを出力するコード変換回路（２４２）とを含み、前記ＤＳＰデコード信号及びコード変換回路の出力をＤＳＰ制御信号（２０）とする。 The decoder decodes the upper 16 bits of the instruction register to generate the CPU decode signal (243) and the DSP decode signal (244), and the first decode circuit has 32 bits. A code conversion circuit (242) for outputting a signal in which the lower 16 bits of the instruction register are encoded when a long DSP instruction is identified, and a code indicating that the output is invalid when identifying other instructions; In addition, the DSP decode signal and the output of the code conversion circuit are set as a DSP control signal (20).

ＤＳＰ命令の命令フォーマットの点に着目したとき、マイクロコンピュータは、セントラルプロセッシングユニット（２）と、前記セントラルプロセッシングユニットに同期動作されるディジタル信号処理ユニット（３）と、前記セントラルプロセッシングユニット及び前記ディジタル信号処理ユニットが共通接続された内部バス（ＩＤＢ）とを含んで半導体集積回路化され、前記セントラルプロセッシングユニットは、ディジタル信号処理ユニットとの間でのデータ転送を当該セントラルプロセッシングユニットに対して規定する第１のコード領域（図１８に例示される１６ビットＤＳＰ命令のビット９〜ビット０）を有する第１フォーマットの命令と、前記第１のコード領域と同一フォーマットの第２のコード領域（図２０、図２１に例示される３２ビットのＤＳＰ命令のＡフィールド）を有すると共に、当該第２のコード領域で規定された転送データを用いた演算処理をディジタル信号処理ユニットに対して規定する第３のコード領域（図２０、図２１に例示される３２ビットのＤＳＰ命令のＢフィールド）を有する第２フォーマットの命令とを実行するための実行制御手段を備えて成る。 When focusing on the instruction format of the DSP instruction, the microcomputer includes a central processing unit (2), a digital signal processing unit (3) operated in synchronization with the central processing unit, the central processing unit and the digital signal. A processing unit is integrated into a semiconductor integrated circuit including an internally connected internal bus (IDB), and the central processing unit defines data transfer with the digital processing unit with respect to the central processing unit. An instruction of the first format having one code area (bit 9 to bit 0 of the 16-bit DSP instruction illustrated in FIG. 18), and a second code area having the same format as the first code area (FIG. 20, Example in Figure 21 And a third code area (FIG. 20) that defines arithmetic processing using the transfer data defined in the second code area for the digital signal processing unit. , And a second format instruction having a B field of the 32-bit DSP instruction illustrated in FIG. 21.

これにより、実行制御手段は、第１及び第２フォーマットの夫々の命令を実行するとき、第１のコード領域と第２のコード領域に対して共通のデコード論理を持つデコード手段を採用でき、マイクロコンピュータの論理規模の縮小に寄与する。 Thereby, the execution control means can employ a decoding means having a common decoding logic for the first code area and the second code area when executing the instructions in the first and second formats. Contributes to reducing the logical scale of computers.

前記第１フォーマットの命令及び第２フォーマットの命令は、それが第１フォーマットか第２フォーマットかを示すための第４コード領域（例えば１６ビットＤＳＰ命令におけるビット１５〜ビット１０、３２ビットＤＳＰ命令におけるビット３２〜ビット２６）を有する。 The first format instruction and the second format instruction include a fourth code area (for example, bit 15 to bit 10 in a 16-bit DSP instruction, or a 32-bit DSP instruction to indicate whether the instruction is in the first format or the second format. Bits 32 to 26).

前記実行制御手段は、前記第１フォーマットの命令と第２フォーマットの命令に共通に用いられる命令レジスタ（２５）と、前記命令レジスタにフェッチされた命令に含まれる前記第１のコード領域と第４のコード領域又は第２のコード領域と第４のコード領域をデコードするデコード手段（２４０）と、そのデコード結果に従ってアドレス演算を行い、前記データ転送制御を行う実行手段とを含んで構成することができる。 The execution control means includes an instruction register (25) used in common for the first format instruction and the second format instruction, the first code area included in the instruction fetched in the instruction register, and a fourth Or a decoding means (240) for decoding the second code area and the fourth code area, and an execution means for performing address calculation according to the decoding result and performing the data transfer control. it can.

前記命令レジスタは、前記第１のコード領域と第４のコード領域又は第２のコード領域と第４のコード領域の保持に共用される上位領域（ＵＩＲ）と、前記第３のコード領域の保持に利用される下位領域（ＬＩＲ）とを有し、前記デコード手段は、前記第４のコード領域のデコード結果に基づいて、前記命令レジスタが第２フォーマットの命令を保持したことを示す制御信号（２４８）を出力し、その制御信号に基づいて、前記下位領域から第３のコード領域のコードデータを前記ディジタル信号処理ユニットに向けて供給する手段（２４２，２４２Ａ，２４２Ｂ）を含むことができる。 The instruction register includes an upper area (UIR) shared for holding the first code area and the fourth code area or the second code area and the fourth code area, and holding the third code area. The decoding means uses a control signal (indicating that the instruction register holds an instruction of the second format based on the decoding result of the fourth code area). 248) and a means (242, 242A, 242B) for supplying code data of the third code area from the lower area to the digital signal processing unit based on the control signal.

本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば下記の通りである。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.

すなわち、内蔵メモリは、ディジタル信号処理プロセッサによる積和演算を考慮して第１のメモリと第２のメモリに２面化され、第３のバスと第２のバスによってそれぞれ並列的にアクセス可能にされているから、セントラルプロセッシングユニットは内蔵メモリから２個のデータを同時にディジタル信号処理ユニットに転送することができる。 That is, the built-in memory is divided into a first memory and a second memory in consideration of a product-sum operation by the digital signal processor, and can be accessed in parallel by the third bus and the second bus, respectively. Therefore, the central processing unit can simultaneously transfer two pieces of data from the built-in memory to the digital signal processing unit.

さらに、第３のバスと第２のバスは、外部にインタフェースされる第１のバスとも個別化されているので、セントラルプロセッシングユニットは第２のメモリと第１のメモリのアクセスに並行して外部メモリアクセスすることができる。 In addition, since the third bus and the second bus are also individualized with the first bus interfaced to the outside, the central processing unit is externally connected in parallel with the access to the second memory and the first memory. Memory access is possible.

このように、それぞれセントラルプロセッシングユニットに接続された第１乃至第３の３種類のアドレスバス及びデータバスがあるために、当該３種類の内部バスを使用して同一クロックサイクルで異なるメモリアクセス動作を実行することができるので、プログラムやデータが外部メモリに存在する場合にも容易に対応して演算処理の高速化を実現できる。 As described above, since there are the first to third types of address bus and data bus respectively connected to the central processing unit, different memory access operations can be performed in the same clock cycle using the three types of internal buses. Since it can be executed, it is possible to easily cope with a case where a program or data exists in an external memory and to speed up the arithmetic processing.

さらに、内蔵メモリは第１のメモリと第２のメモリに２面化され、２面化された各メモリはＲＯＭとＲＡＭを供え、ＲＡＭをデータメモリ、ＲＯＭをプログラムメモリとすることにより、データメモリとプログラムメモリの分離も可能になり、ディジタル信号処理ユニットに２個のデータを並列的に転送し、また、命令フェッチ、データ転送、及び演算を並列パイプライン処理にて能率的に行うことができる。 Further, the built-in memory is divided into a first memory and a second memory. Each of the two memories is provided with a ROM and a RAM. The RAM is a data memory, and the ROM is a program memory. And program memory can be separated, two data can be transferred in parallel to the digital signal processing unit, and instruction fetch, data transfer, and operation can be efficiently performed by parallel pipeline processing. .

したがって、セントラルプロセッシングユニットと共にディジタル信号処理ユニットを一つのＬＳＩに搭載したときにディジタル信号処理の高速化を実現できる。 Therefore, when the digital signal processing unit is mounted on one LSI together with the central processing unit, the digital signal processing can be speeded up.

ＣＰＵ命令とＤＳＰ命令が混在された命令に対し、命令コードの一部をデコードすることによって当該命令がＣＰＵ命令であるか、１６ビット長のＤＳＰ命令であるか、３２ビット長のＤＳＰ命令であるかを識別可能に命令コードを割り当てることにより、小さな論理規模のデコーダで命令の種別を判定することができ、常に３２ビット全部を一度にデコードすることを要しない。したがって、セントラルプロセッシングユニットと共にディジタル信号処理ユニットを一つのＬＳＩに搭載したとき、その物理的な規模の増大を極力抑えることができる。 By decoding a part of the instruction code for an instruction in which a CPU instruction and a DSP instruction are mixed, the instruction is a CPU instruction, a 16-bit DSP instruction, or a 32-bit DSP instruction. By assigning an instruction code so as to be identifiable, it is possible to determine the type of instruction with a small logic scale decoder, and it is not always necessary to decode all 32 bits at once. Therefore, when the digital signal processing unit is mounted together with the central processing unit in one LSI, the increase in the physical scale can be suppressed as much as possible.

ＤＳＰ命令の命令フォーマットとして、ディジタル信号処理ユニットとの間でのデータ転送を当該セントラルプロセッシングユニットに対して規定する第１のコード領域（図１８に例示される１６ビットＤＳＰ命令のビット９〜ビット０）を有する第１フォーマットの命令と、前記第１のコード領域と同一フォーマットの第２のコード領域（図２０、図２１に例示される３２ビットのＤＳＰ命令のＡフィールド）を有すると共に、当該第２のコード領域で規定された転送データを用いた演算処理をディジタル信号処理ユニットに対して規定する第３のコード領域（図２０、図２１に例示される３２ビットのＤＳＰ命令のＢフィールド）を有する第２フォーマットの命令とを採用することにより、第１及び第２フォーマットの夫々の命令を実行する手段は、第１のコード領域と第２のコード領域に対して共通のデコード論理を持つデコード手段を採用でき、この点においても、マイクロコンピュータの論理規模を縮小することができる。 As the instruction format of the DSP instruction, a first code area (bit 9 to bit 0 of the 16-bit DSP instruction illustrated in FIG. 18) that defines data transfer with the digital signal processing unit to the central processing unit. ) And a second code area (A field of the 32-bit DSP instruction illustrated in FIGS. 20 and 21) having the same format as the first code area. A third code area (B field of the 32-bit DSP instruction illustrated in FIGS. 20 and 21) that defines the arithmetic processing using the transfer data defined in the code area 2 for the digital signal processing unit. The second format instructions are used to execute the first and second format instructions. That means may employ a decoding means having a common decode logic for the first coding region and the second coding region, also in this respect, it is possible to reduce the logical scale of the microcomputer.

図１には本発明の一実施例に係るマイクロコンピュータ１の全体的なブロック図が示される。同図に示されるマイクロコンピュータは半導体集積回路製造技術によって単結晶シリコンのような１個の半導体基板に形成されている。マイクロコンピュータ１は、セントラルプロセッシングユニットとしてのＣＰＵコア（CPU Core）２、ディジタル信号処理ユニットとしてのＤＳＰエンジン（DSP Engine）３、Ｘ-ＲＯＭ４、Ｙ-ＲＯＭ５、Ｘ-ＲＡＭ６、Ｙ-ＲＡＭ７、割り込みコントローラ（Interrupt Controller）８、バスステートコントローラ（Bus State Controller）９、内蔵周辺回路（Peripheral Circuit）１０、１１、外部メモリインターフェース（External Memory Interface）１２、クロックパルスジェネレータ（ＣＰＧ）１３より構成されている。前記Ｘ-ＲＯＭ４，Ｙ-ＲＯＭ５は命令若しくは定数データ等を格納するための読み出し専用又は電気的に書き換え可能なリード・オンリ・メモリであり、Ｘ-ＲＡＭ６，Ｙ-ＲＡＭ７はデータの一時格納若しくはＣＰＵコア２とＤＳＰエンジン３の作業領域などとして利用されるランダム・アクセス・メモリである。前記Ｘ-ＲＯＭ４とＸ-ＲＡＭ６を内部命令／データ用のＸメモリ（Internal Instruction/Data X Mem.）と総称し、Ｙ-ＲＯＭ５とＹ-ＲＡＭ７を内部命令／データ用のＹメモリ（Internal Instruction/Data Y Mem.）と総称する。 FIG. 1 shows an overall block diagram of a microcomputer 1 according to an embodiment of the present invention. The microcomputer shown in the figure is formed on a single semiconductor substrate such as single crystal silicon by a semiconductor integrated circuit manufacturing technique. The microcomputer 1 includes a CPU core 2 as a central processing unit, a DSP engine 3 as a digital signal processing unit, an X-ROM 4, a Y-ROM 5, an X-RAM 6, a Y-RAM 7, and an interrupt controller. (Interrupt Controller) 8, Bus State Controller 9, Built-in Peripheral Circuits 10 and 11, External Memory Interface 12, and Clock Pulse Generator (CPG) 13. The X-ROM 4 and Y-ROM 5 are read-only or electrically rewritable read-only memories for storing instructions or constant data, and the X-RAM 6 and Y-RAM 7 are temporary data storage or CPU This is a random access memory used as a work area for the core 2 and the DSP engine 3. The X-ROM 4 and the X-RAM 6 are collectively referred to as an internal instruction / data X memory (Internal Instruction / Data X Mem.), And the Y-ROM 5 and the Y-RAM 7 are referred to as an internal instruction / data Y memory (Internal Instruction / Data). Data Y Mem.)

本実施例のマイクロコンピュータ１はそのバス構成として、外部メモリインタフェース１２に結合される内部アドレスバスＩＡＢ及び内部データバスＩＤＢ、外部メモリインタフェース１２に結合されない内部アドレスバスＸＡＢ及び内部データバスＸＤＢ、外部メモリインタフェース１２に結合されない内部アドレスバスＹＡＢ及び内部データバスＹＤＢ、そして内蔵周辺回路１０，１１のための周辺アドレスバスＰＡＢ及び周辺データバスＰＤＢを備える。尚、コントロールバスについては図示を省略してあるが、アドレスバス及びデータバスの対に対応してそれぞれ設けられている。 The microcomputer 1 of this embodiment has, as its bus configuration, an internal address bus IAB and an internal data bus IDB coupled to the external memory interface 12, an internal address bus XAB and an internal data bus XDB not coupled to the external memory interface 12, an external memory An internal address bus YAB and an internal data bus YDB which are not coupled to the interface 12 and a peripheral address bus PAB and a peripheral data bus PDB for the built-in peripheral circuits 10 and 11 are provided. Although the control bus is not shown, it is provided corresponding to a pair of address bus and data bus.

ＣＰＵコア２には、外部メモリインターフェース１２を通してチップ外部に接続可能なデータバスＩＤＢが接続され、割り込みコントローラ８からの割り込み信号８０が与えられる。ＣＰＵコア２はＤＳＰエンジン３を制御するための制御信号２０をＤＳＰエンジン３に供給する。さらにＣＰＵコア２は、外部メモリインターフェース１２を通してチップ外部に接続可能なアドレスバスＩＡＢと外部メモリインターフェース１２には接続されていないアドレスバスＸＡＢ，ＹＡＢにアドレス信号を出力する。ＣＰＵコア２は、クロックパルスジェネレータ（ＣＰＧ）１３から出力されるノンオーバーラップ２相のクロック信号Ｃｌｏｃｋ１，Ｃｌｏｃｋ２を動作基準クロック信号として動作される。ＣＰＵコア２についてはその詳細を後述するが、図１のＣＰＵコア２には、レジスタファイル２１、算術論理演算器（ＡＬＵ）２２、アドレス加算器（Add-ALU）２３、デコーダ２４、命令レジスタ（ＩＲ）２５が代表的に図示されている。レジスタファイル２１はアドレスレジスタやデータレジスタとして任意に利用され、また、プログラムカウンタ、そしてコントロールレジスタなどを含む。デコーダ２４は命令レジスタ２５にフェッチされた命令をデコードして内部制御信号（図１には図示を省略）及び制御信号２０を生成する。命令レジスタ（ＩＲ）２５は、それぞれ１６ビットの上位側領域（ＵＩＲ）と下位側領域（ＬＩＲ）から成る。詳細については後述するが、下位側領域（ＬＩＲ）の値は選択的に上位側領域（ＵＩＲ）にシフト可能にされている。尚、割込み等の例外発生時の命令実行手順を制御したり、例外発生に対する内部状態の退避復帰をハードウェア的に制御したりするためのシーケンス制御回路は図示を省略してある。 A data bus IDB that can be connected to the outside of the chip through the external memory interface 12 is connected to the CPU core 2, and an interrupt signal 80 from the interrupt controller 8 is given. The CPU core 2 supplies a control signal 20 for controlling the DSP engine 3 to the DSP engine 3. Further, the CPU core 2 outputs address signals to an address bus IAB that can be connected to the outside of the chip through the external memory interface 12 and address buses XAB and YAB that are not connected to the external memory interface 12. The CPU core 2 is operated using the non-overlapping two-phase clock signals Clock 1 and Clock 2 output from the clock pulse generator (CPG) 13 as operation reference clock signals. Details of the CPU core 2 will be described later. The CPU core 2 in FIG. 1 includes a register file 21, an arithmetic logic unit (ALU) 22, an address adder (Add-ALU) 23, a decoder 24, an instruction register ( IR) 25 is representatively shown. The register file 21 is arbitrarily used as an address register or a data register, and includes a program counter and a control register. The decoder 24 decodes the instruction fetched into the instruction register 25 and generates an internal control signal (not shown in FIG. 1) and a control signal 20. The instruction register (IR) 25 includes a 16-bit upper area (UIR) and a lower area (LIR). Although details will be described later, the value of the lower region (LIR) can be selectively shifted to the upper region (UIR). Note that a sequence control circuit for controlling the instruction execution procedure when an exception such as an interrupt occurs or for controlling the saving and restoring of the internal state in response to the occurrence of an exception is not shown.

ＤＳＰエンジン３は前記データバスＩＤＢ，ＸＤＢ，ＹＤＢに接続され、クロック信号Ｃｌｏｃｋ１，Ｃｌｏｃｋ２を動作基準クロック信号として動作される。ＤＳＰエンジン３についてはその詳細を後述するが、図１のＤＳＰエンジン３には、データレジスタファイル３１、算術論理演算器及びシフタ（ALU/Shifter）３２、乗算器（ＭＡＣ）３３、及びデコーダ３４が代表的に図示されている。データレジスタファイル３１は積和演算等に利用される。デコーダ３４はＣＰＵコア２から与えられる制御信号２０をデコードして、ＤＳＰエンジン３の内部制御信号（図１には図示を省略）を生成する。 The DSP engine 3 is connected to the data buses IDB, XDB, and YDB, and is operated using the clock signals Clock1 and Clock2 as operation reference clock signals. Details of the DSP engine 3 will be described later, but the DSP engine 3 of FIG. It is representatively shown. The data register file 31 is used for multiply-add operations and the like. The decoder 34 decodes the control signal 20 supplied from the CPU core 2 and generates an internal control signal (not shown in FIG. 1) of the DSP engine 3.

Ｘ-ＲＯＭ４及びＸ-ＲＡＭ６はアドレスバスＩＡＢ，ＸＡＢとデータバスＩＤＢ，ＸＤＢに接続されている。Ｙ-ＲＯＭ５及びＹ-ＲＡＭ７はアドレスバスＩＡＢ，ＹＡＢとデータバスＩＤＢ，ＹＤＢに接続されている。内蔵メモリは、ＤＳＰエンジン３による積和演算を考慮してＸメモリ４，６とＹメモリ５，７に２面化され、内部バスＸＡＢ，ＸＤＢとＹＡＢ，ＹＤＢによってそれぞれ並列的にアクセス可能にされている。さらに、内部バスＸＡＢ，ＸＤＢとＹＡＢ，ＹＤＢは外部にインタフェースされるバスＩＡＢ，ＩＤＢとも個別化されているので、Ｘメモリ４，６とＹメモリ５，７のアクセスに並行して外部メモリアクセスも可能にされる。Ｘメモリ４，６及びＹメモリ５，７はＤＳＰエンジン３による積和演算のためのデータ一時記憶領域、定数データの記憶領域などとして利用される。尚、Ｘ−ＲＡＭ，Ｙ−ＲＡＭはＣＰＵコア２のデータ一時記憶領域若しくはワーク領域としても利用可能であることは言うまでもない。 X-ROM 4 and X-RAM 6 are connected to address buses IAB and XAB and data buses IDB and XDB. Y-ROM 5 and Y-RAM 7 are connected to address buses IAB, YAB and data buses IDB, YDB. The built-in memory is divided into X memory 4 and 6 and Y memory 5 and 7 in consideration of the product-sum operation by the DSP engine 3, and can be accessed in parallel by the internal buses XAB and XDB and YAB and YDB, respectively. ing. Furthermore, since the internal buses XAB, XDB and YAB, YDB are also individually separated from the externally interfaced buses IAB, IDB, external memory access is also performed in parallel with the access to the X memories 4, 6 and the Y memories 5, 7. Made possible. The X memories 4 and 6 and the Y memories 5 and 7 are used as a data temporary storage area, a constant data storage area, etc. for the product-sum operation by the DSP engine 3. Needless to say, the X-RAM and Y-RAM can also be used as a temporary data storage area or work area of the CPU core 2.

前記割り込みコントローラ８は、内蔵周辺回路１０，１１などからの割り込み要求信号（Interrupts）８１を入力し、各種割込み要求に対する優先順位付けや割込み要求に対するマスキングのための情報に従って割込み要求を調停して受け付け、受け付けた割込み要求に応ずる割り込みベクタ（Interrupt Vector）８２をアドレスバスＩＡＢに出力し、さらに割り込み信号８０をＣＰＵコア２に出力する。 The interrupt controller 8 receives an interrupt request signal (Interrupts) 81 from the built-in peripheral circuits 10, 11, etc., and arbitrates and accepts interrupt requests according to information for prioritizing various interrupt requests and masking interrupt requests. Then, an interrupt vector 82 corresponding to the accepted interrupt request is output to the address bus IAB, and an interrupt signal 80 is output to the CPU core 2.

バスステートコントローラ９はアドレスバスＩＡＢ，ＰＡＢとデータバスＩＤＢ，ＰＤＢに接続され、アドレスバスＰＡＢ及びデータバスＰＤＢに接続されている内蔵周辺回路１０，１１とＣＰＵコア２とのインタフェース制御を行う。 The bus state controller 9 is connected to the address buses IAB and PAB and the data buses IDB and PDB, and performs interface control between the built-in peripheral circuits 10 and 11 connected to the address bus PAB and the data bus PDB and the CPU core 2.

外部メモリインターフェース１２は、アドレスバスＩＡＢとデータバスＩＤＢに接続され、マイクロコンピュータ１のチップ外部の図示を省略したアドレスバスとデータバスに接続され、外部とのインタフェース制御を行う。 The external memory interface 12 is connected to the address bus IAB and the data bus IDB, is connected to an address bus and a data bus (not shown) outside the chip of the microcomputer 1, and performs interface control with the outside.

図２にマイクロコンピュータ１のアドレスマップの一例が示される。本実施例のマイクロコンピュータ１は３２ビットで規定されるアドレス空間を管理する。前記アドレスバスＩＡＢはビット幅が３２ビットとされる。そのアドレス空間の中には、例外処理ベクタ領域、Ｘ-ＲＯＭ空間（Ｘ−ＲＯＭ４に割り当てられたアドレス空間）、Ｘ-ＲＡＭ空間（Ｘ−ＲＡＭ７に割り当てられたアドレス空間）、Ｙ-ＲＯＭ空間（Ｙ−ＲＯＭ５に割り当てられたアドレス空間）、Ｙ-ＲＡＭ空間（Ｙ−ＲＡＭ７に割り当てられたアドレス空間）、内蔵周辺回路割付け空間（内蔵周辺回路１０，１１が割り当てられたアドレス空間）などが存在する。図２の例はＸ-ＲＯＭ４は２４ＫＢ、Ｘ-ＲＡＭ６は４ＫＢ、Ｙ-ＲＯＭ５は２４ＫＢ、Ｙ-ＲＡＭ７は４ＫＢが割り当てられている。
図２に従えば、１６進数表記でＨ'００００００００〜Ｈ'０００００３ＦＦの空間の２５６Ｂ領域には例外処理ベクタ領域が割り付けられている。Ｈ'０００００４００〜Ｈ'０１ＦＦＦＦＦＦにはユーザによって使用可能な通常空間が割り付けられている。通常空間はマイクロコンピュータ１の外部に接続可能なメモリ領域とされる。Ｈ'０２００００００〜Ｈ'０２００５ＦＦＦには、Ｘ-ＲＯＭ空間が割り付けられている。Ｈ'０２００６０００〜Ｈ'０２００６ＦＦＦには、Ｘ-ＲＡＭ空間が割り付けられている。Ｈ'０２００７０００〜Ｈ'０２００７ＦＦＦはＸ-ＲＡＭ_Ｍirror空間となっており、ここをアクセスすると実際にはＨ'０２００６０００〜Ｈ'０２００６ＦＦＦのＸ-ＲＡＭ空間をアクセスすることになる。Ｈ'０２００８０００〜Ｈ'０２００ＦＦＦＦは、Ｘ-ＲＡＭ，ＲＡＭ_Ｍirror空間となっており、ここをアクセスすると実際にはＨ'０２００００００〜Ｈ'０２００７ＦＦＦのＸ-ＲＯＭ空間およびＸ-ＲＡＭ空間をアクセスすることになる。Ｈ'０２０１００００〜Ｈ'０２０１５ＦＦＦには、Ｙ-ＲＯＭ空間が割り付けられている。Ｈ'０２０１６０００〜Ｈ'０２０１６ＦＦＦには、Ｙ-ＲＡＭ空間が割り付けられている。Ｈ'０２０１７０００〜Ｈ'０２０１７ＦＦＦはＹ-ＲＡＭ_Ｍirror空間となっており、ここをアクセスすると実際にはＨ'０２０１６０００〜Ｈ'０２０１６ＦＦＦのＹ-ＲＡＭ空間をアクセスすることになる。Ｈ'０２０１８０００〜Ｈ'０２０１ＦＦＦＦは、Ｙ-ＲＯＭ，ＲＡＭ_Ｍirror空間となっており、ここをアクセスすると実際にはＨ'０２０１００００〜Ｈ'０２０１７ＦＦＦのＹ-ＲＯＭ空間およびＹ-ＲＡＭ空間をアクセスすることになる。Ｈ'０２０２００００〜Ｈ'０７ＦＦＦＦＦＦＦには、通常空間が割り付けられている。Ｈ'０８００００００〜Ｈ'１ＦＦＦＦＦＦＦＦには、予約領域が割り付けられている。この予約領域は、ユーザチップ（実チップ）の場合にはアクセス不可能になっており、エバチップ（エミュレーションなどに利用される評価用チップ）の場合にはＡＳＥ空間（エミュレーション用の制御空間）領域として割り当てられる。Ｈ'２０００００００〜Ｈ'２７ＦＦＦＦＦＦＦには、通常空間が割り付けられている。Ｈ'２８００００００〜Ｈ'ＦＦＦＦＦＤＦＦには、予約領域が割り付けられている。Ｈ'ＦＦＦＦＦＥ００〜Ｈ'ＦＦＦＦＦＦＦＦには内蔵周辺回路のレジスタアドレス値を割り付ける内蔵周辺回路割り付け領域が割り付けられている。 FIG. 2 shows an example of the address map of the microcomputer 1. The microcomputer 1 of this embodiment manages an address space defined by 32 bits. The address bus IAB has a bit width of 32 bits. The address space includes an exception vector area, an X-ROM space (address space assigned to the X-ROM 4), an X-RAM space (address space assigned to the X-RAM 7), and a Y-ROM space ( Address space allocated to the Y-ROM 5), Y-RAM space (address space allocated to the Y-RAM 7), built-in peripheral circuit allocation space (address space to which the built-in peripheral circuits 10 and 11 are allocated), and the like. . In the example of FIG. 2, X-ROM 4 is assigned 24 KB, X-RAM 6 is assigned 4 KB, Y-ROM 5 is assigned 24 KB, and Y-RAM 7 is assigned 4 KB.
According to FIG. 2, an exception processing vector area is allocated to the 256B area in the space of H'00000000 to H'000003FF in hexadecimal notation. A normal space that can be used by the user is allocated to H'00000040 to H'01FFFFFF. The normal space is a memory area that can be connected to the outside of the microcomputer 1. An X-ROM space is allocated to H'0200000 to H'02005FFF. An X-RAM space is allocated to H'02006600 to H'02006FFF. H'0200700 to H'02007FFF is an X-RAM_Mirror space, and accessing this actually accesses the X-RAM space of H'0200600 to H'02006FFF. H'0208000 to H'0200FFFF is an X-RAM, RAM_Mirror space, and accessing this actually accesses the X-ROM space and the X-RAM space of H'0200000 to H'02007FFF. . A Y-ROM space is allocated to H'02010000 to H'02015FFF. A Y-RAM space is allocated to H'02016600 to H'02016FFF. H'0201000 to H'02017FFF is a Y-RAM_Mirror space, and when this is accessed, the Y-RAM space of H'020016 to H'02016FFF is actually accessed. H'0201000 to H'0201FFFF is a Y-ROM, RAM_Mirror space, and accessing this actually accesses the Y-ROM space and Y-RAM space of H'02010000 to H'02017FFF. . A normal space is allocated to H'02020000 to H'07FFFFFFF. A reserved area is allocated to H'080000 to H'1FFFFFFFF. This reserved area is inaccessible in the case of a user chip (real chip), and as an ASE space (emulation control space) in the case of an evaluation chip (evaluation chip used for emulation). Assigned. A normal space is allocated to H'20000000 to H'27FFFFFFF. A reserved area is allocated to H'2800000 to H'FFFFFDFF. H'FFFFFE00 to H'FFFFFFFF are assigned with built-in peripheral circuit allocation areas for assigning register address values of the built-in peripheral circuits.

図３にはモジュロアドレス出力部を詳細に示したＣＰＵコア２のブロック図が示される。図３において破線で囲った部分がモジュロアドレス出力部２００である。モジュロアドレス出力部２００は、モジュロアドレスレジスタ（例えばＡ０Ｘ）より出力した値をバッファ（例えばＭＡＢＸ）を通してアドレスバス（例えばＸＡＢ）に出力すると同時に、モジュロアドレスレジスタ（Ａ０Ｘ）より出力した値を加算手段（例えばＡＬＵ）で加算して再びモジュロアドレスレジスタ（Ａ０Ｘ）に格納するアドレス更新出力動作などを行う回路ブロックであり、積和演算のような繰返し演算のためのデータアクセスアドレスを順次更新して生成する。ランダムロジック回路（Random Logic Circuit）２０１と記載された回路ブロックは、図１のデコーダ２４や前記シーケンス制御回路、そしてコントロールレジスタやステータスレジスタなどを含む回路ブロックである。 FIG. 3 is a block diagram of the CPU core 2 showing the modulo address output unit in detail. A portion surrounded by a broken line in FIG. 3 is a modulo address output unit 200. The modulo address output unit 200 outputs the value output from the modulo address register (for example, A0X) to the address bus (for example, XAB) through the buffer (for example, MABX), and simultaneously adds the value output from the modulo address register (A0X) ( This is a circuit block that performs an address update output operation that is added by, for example, ALU) and stored again in the modulo address register (A0X). . A circuit block described as a random logic circuit 201 is a circuit block including the decoder 24 of FIG. 1, the sequence control circuit, a control register, a status register, and the like.

図３においてＣ１，Ｃ２，ＤＲ，Ａ１，Ｂ１，Ａ２，Ｂ２，ＤＷはそれぞれＣＰＵコア２内部の代表的に示されたバスである。ＣＰＵコア２とデータバスＩＤＢとのインタフェースは前記命令レジスタ（ＩＲ）２５及びデータバッファ（Data Buffer）２０３にて行われる。命令レジスタ（ＩＲ）２５にフェッチされた命令はランダムロジック回路（Random Logic Circuit）２０１に含まれる前記デコーダ２４等に供給される。ＣＰＵコア２とアドレスバスＩＡＢとのインタフェースはプログラムカウンタ（ＰＣ）２０４及びアドレスバッファ（Address Buffer）２０５で行われる。ＣＰＵコア２とアドレスバスＸＡＢとのインタフェースはメモリアドレスバッファ（ＭＡＢＸ）２０６で行われ、ＣＰＵコア２とアドレスバスＹＡＢとのインタフェースはメモリアドレスバッファ（ＭＡＢＹ）２０７で行われる。アドレスバッファ２０５へのアドレス情報の入力経路は、バスＣ１，Ａ１，Ａ２の中から選択可能にされ、メモリアドレスバッファ２０６，２０７へのアドレス情報の入力経路は、バスＣ１，Ｃ２，Ａ１，Ａ２の中から選択可能にされる。算術演算器（ＡＵ）２０８はプログラムカウンタ２０４の値のインクリメントに利用される。２０９は汎用レジスタ（Reg.）、２１０はアドレスのインデックス修飾に利用されるインデックスレジスタ（Ｉｘ）、２１１は同じくインデックス修飾に利用されるインデックスレジスタ（Ｉｙ）、２１２はアドレス演算専用の加算器（ＰＡＵ）、２１３は算術論理演算器（ＡＬＵ）である。 In FIG. 3, C1, C2, DR, A1, B1, A2, B2, and DW are buses representatively shown in the CPU core 2, respectively. The interface between the CPU core 2 and the data bus IDB is performed by the instruction register (IR) 25 and the data buffer 203. The instruction fetched to the instruction register (IR) 25 is supplied to the decoder 24 included in the random logic circuit 201. The interface between the CPU core 2 and the address bus IAB is performed by a program counter (PC) 204 and an address buffer (Address Buffer) 205. The interface between the CPU core 2 and the address bus XAB is performed by a memory address buffer (MABX) 206, and the interface between the CPU core 2 and the address bus YAB is performed by a memory address buffer (MABY) 207. The address information input path to the address buffer 205 can be selected from the buses C1, A1, and A2. The address information input path to the memory address buffers 206 and 207 can be selected from the buses C1, C2, A1, and A2. It can be selected from the inside. An arithmetic operator (AU) 208 is used to increment the value of the program counter 204. 209 is a general-purpose register (Reg.), 210 is an index register (Ix) used for address index modification, 211 is an index register (Iy) also used for index modification, and 212 is an adder (PAU) dedicated to address calculation. ) 213 is an arithmetic logic unit (ALU).

制御ビットＭＸＹはアドレスバスＸＡＢ又はアドレスバスＹＡＢのどちらのアドレスに対しモジュロ演算を行うかを指定し、論理値”１”によってアドレスバスＸＡＢを、論理値”０”によってアドレスバスＹＡＢを指定する。制御ビットＤＭはモジュロ演算を行うか否かを指示し、論理値”１”によってモジュロ演算を行うことを指示し、論理値”０”によってモジュロ演算を行わないことを指示する。モジュロスタートアドレスレジスタ（ＭＳ）２１４はモジュロ演算開始アドレスを格納し、モジュロエンドアドレスレジスタ（ＭＥ）２１５はモジュロ演算終了アドレスを格納する。 The control bit MXY designates which address of the address bus XAB or the address bus YAB is to be subjected to the modulo operation. The address bus XAB is designated by the logical value “1”, and the address bus YAB is designated by the logical value “0”. The control bit DM indicates whether or not to perform a modulo operation, indicates that a modulo operation is performed by a logical value “1”, and indicates that a modulo operation is not performed by a logical value “0”. The modulo start address register (MS) 214 stores a modulo operation start address, and the modulo end address register (ME) 215 stores a modulo operation end address.

モジュロアドレスレジスタ（Ａ０ｘ,Ａ１ｘ）２１６は現在のモジュロアドレスを格納するカレントアドレスレジスタ、２１７はモジュロエンドアドレスレジスタ（ＭＥ）２１５の値とモジュロアドレスレジスタ（Ａ０ｘ,Ａ１ｘ）２１６の値とを比較するコンパレータ（ＣＭＰ）、２１８はコンパレータ２１７の出力と制御ビットＭＸＹ，ＤＭの３入力に対して論理積を採るアンドゲート、２１９はバスＣ１の値とモジュロスタートアドレスレジスタ（ＭＳ）２１４の値とを選択するセレクタであり、それらはアドレスバスＸＡＢに関するモジュロ演算に利用される。セレクタ２１９はアンドゲート２１８の論理値”１”出力によってレジスタ（ＭＳ）２１４の値を選択し、選択した値をモジュロアドレスレジスタ（Ａ０ｘ,Ａ１ｘ）２１６に与える。モジュロアドレスレジスタ２１６はＡ０ｘ又はＡ１ｘの何れかが選択されて利用される。 A modulo address register (A0x, A1x) 216 is a current address register that stores the current modulo address, and 217 is a comparator that compares the value of the modulo end address register (ME) 215 with the value of the modulo address register (A0x, A1x) 216 (CMP) 218 is an AND gate that takes the logical product of the output of the comparator 217 and the three inputs of the control bits MXY and DM, and 219 selects the value of the bus C1 and the value of the modulo start address register (MS) 214. Selectors, which are used for modulo operations on the address bus XAB. The selector 219 selects the value of the register (MS) 214 by the logical value “1” output of the AND gate 218, and supplies the selected value to the modulo address register (A0x, A1x) 216. The modulo address register 216 is used by selecting either A0x or A1x.

モジュロアドレスレジスタ（Ａ０ｙ,Ａ１ｙ）２２６は現在のモジュロアドレスを格納するカレントアドレスレジスタ、２２７はモジュロエンドアドレスレジスタ（ＭＥ）２１５の値とモジュロアドレスレジスタ（Ａ０ｙ,Ａ１ｙ）２１６の値とを比較するコンパレータ（ＣＭＰ）、２２８はコンパレータ２２７の出力と制御ビットＭＸＹの反転ビットと制御ビットＤＭとの３入力に対して論理積を採るアンドゲート、２２９はバスＣ２の値とモジュロスタートアドレスレジスタ（ＭＳ）２１４の値とを選択するセレクタであり、それらはアドレスバスＹＡＢに関するモジュロ演算に利用される。セレクタ２２９はアンドゲート２２８の論理値”１”出力によってレジスタ（ＭＳ）２１４の値を選択し、選択した値をモジュロアドレスレジスタ（Ａ０ｙ,Ａ１ｙ）２２６に与える。モジュロアドレスレジスタ２２６はＡ０ｙ又はＡ１ｙの何れかが選択されて利用される。 A modulo address register (A0y, A1y) 226 is a current address register that stores the current modulo address, and 227 is a comparator that compares the value of the modulo end address register (ME) 215 with the value of the modulo address register (A0y, A1y) 216. (CMP) 228 is an AND gate that takes the logical product of three outputs of the output of the comparator 227, the inverted bit of the control bit MXY, and the control bit DM, and 229 is the value of the bus C2 and the modulo start address register (MS) 214. These selectors are used for modulo operation on the address bus YAB. The selector 229 selects the value of the register (MS) 214 by the logical value “1” output of the AND gate 228 and supplies the selected value to the modulo address register (A0y, A1y) 226. The modulo address register 226 is used by selecting either A0y or A1y.

尚、ランダムロジック回路２０１に記載されたOP Codeは命令レジスタ２５から供給される命令コードを意味し、CONSTは定数値を意味する。 Note that OP Code written in the random logic circuit 201 means an instruction code supplied from the instruction register 25, and CONST means a constant value.

ここで、ＣＰＵコア２におけるモジュロ演算動作として、例えば、モジュロアドレスレジスタ（Ａ０ｘ）２１６を用いて、アドレスバスＸＡＢへ供給すべきアドレス情報をモジュロ演算にて生成する動作を説明する。 Here, as the modulo arithmetic operation in the CPU core 2, for example, an operation for generating address information to be supplied to the address bus XAB by modulo arithmetic using the modulo address register (A0x) 216 will be described.

先ず、モジュロ演算開始アドレスがモジュロスタートアドレスレジスタ（ＭＳ）２１４に、モジュロ演算終了アドレスがモジュロエンドアドレスレジスタ（ＭＥ）２１５にそれぞれ書き込まれる。モジュロアドレスレジスタ（Ａ０ｘ）にはモジュロ演算を開始するアドレス値が書き込まれる。次にアドレスバスＸＡＢのアドレスに対しモジュロ演算を行うので、ＸＡＢ、ＹＡＢのどちらのアドレスに対しモジュロ演算を行うかを決定する制御ビットＭＸＹに対し論理値”１”が書き込まれる（アドレスバスＹＡＢに対しモジュロ演算を行う場合は、制御ビットＭＸＹに論理値”０”が書き込まれる）。最後にモジュロ演算を行うか否かを判定する制御ビットＤＭに論理値”１”が書き込まれる。 First, the modulo calculation start address is written in the modulo start address register (MS) 214, and the modulo calculation end address is written in the modulo end address register (ME) 215. In the modulo address register (A0x), an address value for starting the modulo operation is written. Next, since the modulo operation is performed on the address of the address bus XAB, the logical value “1” is written to the control bit MXY for determining which address of the XAB or YAB the modulo operation is to be performed (the address bus YAB). When a modulo operation is performed, a logical value “0” is written in the control bit MXY). Finally, a logical value “1” is written in a control bit DM for determining whether or not to perform a modulo operation.

モジュロ演算命令は例えば、ＭＯＶＳ.Ｗ＠Ａｘ, Ｄｘとされる。この命令記述において、Ａｘはモジュロアドレスレジスタ（Ａ０ｘ）２１６又はモジュロアドレスレジスタ（Ａ１ｘ）２１６とされ、ＤｘはＤＳＰエンジン３内のレジスタに対応する。図３にはＤｘは図示されていない。上記モジュロ演算命令が実行されると、モジュロアドレスレジスタ（Ａ０ｘ）２１６より値が読み出され、メモリアドレスバッファ（ＭＡＢＸ）２０６及び算術論理演算器（ＡＬＵ）２１３に入力される。メモリアドレスバッファ（ＭＡＢＸ）２０６に入力された値はそのままアドレスバスＸＡＢに出力されて、ＸＲＯＭ４またはＸＲＡＭ６のアドレスを指定する。一方、算術論理演算器（ＡＬＵ）２１３に入力されたモジュロアドレスレジスタ（Ａ０ｘ）２１６の値は、インデックスレジスタ（Ｉｘ）２１０の値又は定数（Const）が加算される。インデックスレジスタ（Ｉｘ）２１０との加算を行なう場合は、命令ＭＯＶＳ.Ｗ＠(Ａｘ, Ｉｘ), Ｄｘ等を実行したときであり、定数加算される場合は命令ＭＯＶＳ.Ｗ＠Ａｘ, Ｄｘ等を実行したときである。その加算結果は算術論理演算器（ＡＬＵ）２１３より出力される。算術論理演算器（ＡＬＵ）２１３より出力された値は、セレクタ２１９に入る。このセレクタ２１９のもう一方の入力は、モジュロスタートアドレスレジスタ（ＭＳ）２１４に格納されているモジュロ演算開始アドレスである。 The modulo operation instruction is, for example, MOVS.W @Ax, Dx. In this instruction description, Ax is a modulo address register (A0x) 216 or a modulo address register (A1x) 216, and Dx corresponds to a register in the DSP engine 3. In FIG. 3, Dx is not shown. When the modulo operation instruction is executed, a value is read from the modulo address register (A0x) 216 and input to the memory address buffer (MABX) 206 and the arithmetic logic unit (ALU) 213. The value input to the memory address buffer (MABX) 206 is output to the address bus XAB as it is to specify the address of the XROM 4 or XRAM 6. On the other hand, the value of the modulo address register (A0x) 216 input to the arithmetic logic unit (ALU) 213 is added with the value of the index register (Ix) 210 or a constant (Const). When adding to the index register (Ix) 210, the instruction MOVS.W @ (Ax, Ix), Dx, etc. is executed. When adding a constant, the instruction MOVS.W @Ax, Dx, etc. When it is executed. The addition result is output from an arithmetic logic unit (ALU) 213. The value output from the arithmetic logic unit (ALU) 213 enters the selector 219. The other input of the selector 219 is a modulo calculation start address stored in the modulo start address register (MS) 214.

セレクタ２１９の出力が算術論理演算器（ＡＬＵ）２１３の出力になるか、モジュロスタートアドレスレジスタ（ＭＳ）２１４の値になるかは、次のようにして決定される。モジュロアドレスレジスタ（Ａ０ｘ）２１６の値とモジュロエンドアドレスレジスタ（ＭＥ）２１５の値は、コンパレータ（ＣＭＰ）２１７で常に比較されており、一致すればコンパレータ（ＣＭＰ）２１７より論理値”１”が出力され、不一致ならば論理値”０”が出力される。コンパレータ（ＣＭＰ）２１７より出力された値は、制御ビットＤＭ，ＭＸＹと共にアンドゲート２１８で論理積が採られ（この例の場合、ＤＭ、ＭＸＹ共に論理値”１”なので、コンパレータ２１７の値がそのままアンドゲート２１８から出力される。）、セレクタ２１９に入力される。セレクタ２１９は、アンドゲート２１８より入力される値が論理値”１”の場合にモジュロスタートアドレスレジスタ（ＭＳ）２１４の値を選択し、論理値”０”の場合には算術論理演算器（ＡＬＵ）２１３からの出力値を選択する。 Whether the output of the selector 219 becomes the output of the arithmetic logic unit (ALU) 213 or the value of the modulo start address register (MS) 214 is determined as follows. The value of the modulo address register (A0x) 216 and the value of the modulo end address register (ME) 215 are always compared by the comparator (CMP) 217, and if they match, the logical value “1” is output from the comparator (CMP) 217. If they do not match, a logical value “0” is output. The value output from the comparator (CMP) 217 is ANDed by the AND gate 218 together with the control bits DM and MXY (in this example, both the DM and MXY are logical values “1”, so the value of the comparator 217 remains unchanged. Output from the AND gate 218) and input to the selector 219. The selector 219 selects the value of the modulo start address register (MS) 214 when the value input from the AND gate 218 is a logical value “1”, and selects the arithmetic logic unit (ALU) when the logical value is “0”. ) Select the output value from 213.

アンドゲート２１８より入力される値が論理値”０”の間は、算術論理演算器（ＡＬＵ）２１３からの出力値を選択し続けるため、アドレスバスＸＡＢに出力される値は、順次更新されていく。モジュロエンドアドレスレジスタ（ＭＥ）２１５の値とモジュロアドレスレジスタ（Ａ０ｘ）２１６の値とが一致すると、アンドゲート２１８からセレクタ２１９に入力される値が論理値”１”になり、モジュロスタートアドレスレジスタ（ＭＳ）２１４の値を選択する。それによって、モジュロアドレスレジスタ（Ａ０ｘ）２１６はモジュロスタートアドレスレジスタ（ＭＳ）２１４の値によって初期化される。 While the value input from the AND gate 218 is the logical value “0”, the output value from the arithmetic logic unit (ALU) 213 is continuously selected, so that the value output to the address bus XAB is sequentially updated. Go. When the value of the modulo end address register (ME) 215 and the value of the modulo address register (A0x) 216 match, the value input from the AND gate 218 to the selector 219 becomes the logical value “1”, and the modulo start address register ( MS) Select a value of 214. Thereby, the modulo address register (A0x) 216 is initialized by the value of the modulo start address register (MS) 214.

上記モジュロ演算の説明では、モジュロアドレスレジスタ（Ａ０ｘ）２１６を利用したときの動作を説明をしたが、モジュロ演算命令ＭＯＶＳ.Ｗ＠Ａｘ, ＤｘにおけるＡｘをモジュロアドレスレジスタ（Ａ１ｘ）２１６に指定することも可能である。また制御ビットＭＸＹに論理値”０”を指定すれば、アドレスバスＹＡＢに対してモジュロ演算が可能になる。この場合、モジュロ演算命令ＭＯＶＳ.Ｗ＠Ａｘ, ＤｘにおけるＡｘを、モジュロアドレスレジスタ（Ａ０ｙ）２２６又は（Ａ１ｙ）２２６を指定するための値Ａｙに変更しなければならない。また制御ビットＤＭに０を指定すれば、モジュロ演算の実行を禁止することもできる。 In the above description of the modulo operation, the operation when the modulo address register (A0x) 216 is used has been described. Is also possible. If a logical value “0” is designated for the control bit MXY, a modulo operation can be performed on the address bus YAB. In this case, Ax in the modulo arithmetic instruction MOVS.W @Ax, Dx must be changed to a value Ay for designating the modulo address register (A0y) 226 or (A1y) 226. If 0 is specified for the control bit DM, execution of the modulo operation can be prohibited.

図４にはＤＳＰエンジン３の一例ブロック図が示される。ランダムロジック回路（Random Logic Circuit）３０１と記載された回路ブロックは、図１のデコーダ３４や制御回路、そしてコントロールレジスタやステータスレジスタなどを含む回路ブロックである。その他にＤＳＰエンジン３は、算術論理演算器（ＡＬＵ）３０２、シフタ（ＳＦＴ）３０３、乗算器（ＭＡＣ）３０４、レジスタ（Reg.）３０５、レジスタ（Ａ０,Ａ１）３０６、レジスタ（Ｙ０,Ｙ１）３０７、レジスタ（Ｘ０,Ｘ１）３０８、メモリデータバッファ（ＭＤＢＩ）３０９、メモリデータバッファ（ＭＤＢＸ）３１０、メモリデータバッファ（ＭＤＢＹ）３１１を備える。メモリデータバッファ（ＭＤＢＹ）３１１はデータバスＹＤＢとバスＤ２を接続する。メモリデータバッファ（ＭＤＢＸ）３１０はデータバスＸＤＢとバスＤ１を接続する。メモリデータバッファ（ＭＤＢＩ）３０９はデータバスＩＤＢとバスＣ１，Ｄ１，Ａ１，Ｂ１に接続している。乗算器（ＭＡＣ）３０４はバスＡ１及びＢ１よりデータを入力し、それに対する乗算結果をバスＣ１及びＤ１に出力する。シフタ（ＳＦＴ）３０３はバスＡ２よりデータを入力し、シフト演算結果をバスＣ２に出力する。算術論理演算器（ＡＬＵ）３０２はバスＡ２及びＢ２よりデータを入力し、演算結果をバスＣ２に出力する。 FIG. 4 shows an example block diagram of the DSP engine 3. A circuit block described as a random logic circuit 301 is a circuit block including the decoder 34, the control circuit, the control register, the status register, and the like in FIG. In addition, the DSP engine 3 includes an arithmetic logic unit (ALU) 302, a shifter (SFT) 303, a multiplier (MAC) 304, a register (Reg.) 305, a register (A0, A1) 306, and a register (Y0, Y1). 307, a register (X0, X1) 308, a memory data buffer (MDBI) 309, a memory data buffer (MDBX) 310, and a memory data buffer (MDBY) 311. A memory data buffer (MDBY) 311 connects the data bus YDB and the bus D2. A memory data buffer (MDBX) 310 connects the data bus XDB and the bus D1. A memory data buffer (MDBI) 309 is connected to the data bus IDB and the buses C1, D1, A1, and B1. A multiplier (MAC) 304 inputs data from the buses A1 and B1, and outputs a multiplication result for the data to the buses C1 and D1. A shifter (SFT) 303 receives data from the bus A2 and outputs a shift operation result to the bus C2. An arithmetic logic unit (ALU) 302 inputs data from the buses A2 and B2, and outputs an operation result to the bus C2.

図５にはマイクロコンピュータ１の命令セットに含まれる命令のフォーマット及び命令コードの一例が示される。マイクロコンピュータ１は、ＣＰＵ命令とＤＳＰ命令の２種類の命令をサポートしている。ＣＰＵ命令の全てとＤＳＰ命令の一部は、１６ビット長の命令コードであり、残りのＤＳＰ命令は３２ビット長の命令コードになっている。ＣＰＵ命令とは、ＤＳＰエンジン３を動作させることなく専らＣＰＵコア２によって実行される命令である。ＤＳＰ命令とは、アドレス演算若しくはオペランドアクセスなどの一部の処理をＣＰＵコア２に負担させてＤＳＰエンジン３が実行する命令である。 FIG. 5 shows an example of an instruction format and an instruction code included in the instruction set of the microcomputer 1. The microcomputer 1 supports two types of instructions, a CPU instruction and a DSP instruction. All of the CPU instructions and a part of the DSP instructions are 16-bit instruction codes, and the remaining DSP instructions are 32-bit instruction codes. The CPU command is a command executed exclusively by the CPU core 2 without operating the DSP engine 3. The DSP instruction is an instruction executed by the DSP engine 3 by causing the CPU core 2 to perform a part of processing such as address calculation or operand access.

ＣＰＵ命令は命令コードの最上位側の４ビットが”００００”〜”１１１０”までの空間に命令が割り当てられている。ＤＳＰ命令は、命令コードの最上位側の４ビットが”１１１１”に全て割り当てられている。さらに命令コードの最上位側の６ビットが”１１１１００”及び”１１１１０１”に割り当てられた命令は、ＤＳＰ命令でも１６ビット長の命令コードになっている。命令コードの最上位側の６ビットが”１１１１１０”の命令は、３２ビット長の命令コードになっている。命令コードの最上位側の６ビットが”１１１１１１”の空間には命令を割り当てておらず、未使用領域（未定義命令領域）となっている。将来この領域を利用して命令コードを更に拡張することができる。この命令フォーマットより明らかなように、各命令コードの最上位側の６ビットをデコードすれば、当該命令がＣＰＵ命令であるか、１６ビット長のＤＳＰ命令であるか、３２ビット長のＤＳＰ命令であるか、未定義命令であるかの判定を、小さな論理規模のデコーダで行うことができる。図５のＣＰＵ命令フォーマットにおいて、ｎｎｎｎはディスティネーションオペランドの指定領域、ｓｓｓｓはソースオペランドの指定領域、ｄｄｄｄはディスプレースメントの指定領域、ｉｉｉｉｉｉｉｉはイミディエイト値の指定領域である。尚、ＡＤＤ命令などの場合は、ｎｎｎｎもソースオペランドの指定領域とされ、演算結果はｎｎｎｎに格納される。また、図３に基づいて説明した前記モジュロ演算命令は、図５の命令ＭＯＶＳ.Ｗ＠Ｒ２，Ａ０に対応されるが、図５における命令記述はオペランド指定の記述形態が図３で説明した内容と相違されている。これは単なる形式の相違であり、実質は同じである。 The CPU instruction is assigned to a space in which the most significant 4 bits of the instruction code are “0000” to “1110”. In the DSP instruction, the most significant 4 bits of the instruction code are all assigned to “1111”. Further, an instruction in which 6 bits on the most significant side of the instruction code are assigned to “111100” and “111101” is a 16-bit instruction code even in a DSP instruction. An instruction in which the most significant 6 bits of the instruction code are “111110” is a 32-bit instruction code. No instruction is assigned to the space in which the most significant 6 bits of the instruction code are “111111”, which is an unused area (undefined instruction area). In the future, this area can be used to further extend the instruction code. As is clear from this instruction format, if the most significant 6 bits of each instruction code are decoded, the instruction is a CPU instruction, a 16-bit DSP instruction, or a 32-bit DSP instruction. It is possible to determine whether there is an undefined instruction or not with a small logic scale decoder. In the CPU instruction format of FIG. 5, nnnn is a destination operand designation area, ssss is a source operand designation area, dddd is a displacement designation area, and iiiiii is an immediate value designation area. In the case of an ADD instruction or the like, nnnn is also designated as a source operand designation area, and the operation result is stored in nnnn. The modulo operation instruction explained based on FIG. 3 corresponds to the instruction MOVS.W @ R2, A0 of FIG. 5. The instruction description in FIG. Is different. This is just a difference in form and is essentially the same.

図６にはＣＰＵコア２のデコーダ２４とＤＳＰエンジン３のデコーダ３４との接続構成例が示される。マイクロコンピュータ１による命令フェッチは３２ビット単位で命令レジスタ（ＩＲ）２５に行われる。デコーダ２４は第１のデコード回路２４０、第２のデコード回路２４１、及びコード変換回路２４２を備える。第１のデコード回路２４０は命令レジスタ（ＩＲ）２５の上位側１６ビットの領域（ＵＩＲ）の値をデコードして、当該命令がＣＰＵ命令か、１６ビットのＤＳＰ命令か、３２ビットのＤＳＰ命令かに応じて、ＣＰＵデコード信号２４３、ＤＳＰデコード信号２４４、コード変換制御信号２４５、及びシフト制御信号２４６を生成する。第２のデコード回路２４１はＣＰＵデコード信号２４３をデコードして、ＣＰＵコア２内部の演算器やレジスタ選択などを行う各種内部制御信号（ＣＰＵ制御信号）２４７を生成する。コード変換回路２４２は、コード変換制御信号２４５にて活性化されると、命令レジスタ（ＩＲ）２５の下位側１６ビットの領域（ＬＩＲ）が保持する情報のビット数を圧縮し若しくはそのまま出力し、コード変換制御信号２４５にて非活性化されると、その出力の無効を意味する情報（ノンオペレーションコード）を出力する。コード変換回路２４２は、信号２４５が非活性状態のとき下位側１６ビットの領域（ＬＩＲ）の値に代えてノンオペレーションコードを出力するという意味では、セレクタによって実現することも可能である。ＤＳＰデコード信号２４４とコード変換回路２４２の出力は、前記ＤＳＰ制御信号２０としてＤＳＰエンジン３のデコーダ３４に供給される。前記第１のデコード回路２４０は、命令レジスタ（ＩＲ）２５の上位側１６ビットの領域（ＵＩＲ）に格納された最上位側の６ビットをデコードすることにより、当該命令コードがＣＰＵ命令か、１６ビットのＤＳＰ命令か、３２ビットのＤＳＰ命令かを判定することができる。 FIG. 6 shows a connection configuration example between the decoder 24 of the CPU core 2 and the decoder 34 of the DSP engine 3. Instruction fetch by the microcomputer 1 is performed in the instruction register (IR) 25 in units of 32 bits. The decoder 24 includes a first decoding circuit 240, a second decoding circuit 241, and a code conversion circuit 242. The first decoding circuit 240 decodes the value of the upper 16-bit area (UIR) of the instruction register (IR) 25 and determines whether the instruction is a CPU instruction, a 16-bit DSP instruction, or a 32-bit DSP instruction. In response, the CPU decode signal 243, the DSP decode signal 244, the code conversion control signal 245, and the shift control signal 246 are generated. The second decoding circuit 241 decodes the CPU decoding signal 243 and generates various internal control signals (CPU control signals) 247 for selecting an arithmetic unit and a register in the CPU core 2. When activated by the code conversion control signal 245, the code conversion circuit 242 compresses or outputs the number of bits of information held in the lower 16-bit area (LIR) of the instruction register (IR) 25, When deactivated by the code conversion control signal 245, information (non-operation code) indicating that the output is invalid is output. The code conversion circuit 242 can also be realized by a selector in the sense that when the signal 245 is inactive, it outputs a non-operation code instead of the value of the lower 16-bit area (LIR). The DSP decode signal 244 and the output of the code conversion circuit 242 are supplied to the decoder 34 of the DSP engine 3 as the DSP control signal 20. The first decoding circuit 240 decodes the most significant 6 bits stored in the upper 16-bit area (UIR) of the instruction register (IR) 25, so that the instruction code is a CPU instruction or 16 It can be determined whether the instruction is a bit DSP instruction or a 32-bit DSP instruction.

デコードされた命令が１６ビット命令である場合、コード変換制御信号２４５は非活性状態とされ、それによってコード変換回路２４２は出力の無効を意味するノンオペレーションコードを出力する。また、デコードされた命令が１６ビット命令である場合にはシフト制御信号２４６が活性化され、それを受ける命令レジスタ（ＩＲ）２５はその下位側１６ビットの領域（ＬＩＲ）の値を上位側１６ビットの領域（ＬＩＲ）にシフトさせ、シフトされた命令を次に実行すべき命令の全部若しくは一部として利用する。例えば命令レジスタＩＲの上位側１６ビット領域ＵＩＲに１６ビットＣＰＵ命令が格納され、下位側ビット領域ＬＩＲに３２ビットＤＳＰ命令の上位１６ビットの命令コードが格納された場合について説明する。まず、上位側１６ビット領域ＵＩＲに格納された１６ビットＣＰＵ命令が第１デコード回路２４０にてデコードされ、その結果に従ってＣＰＵコア２はその命令を実行し、下位側１６ビット領域ＬＩＲに格納された３２ビットＤＳＰ命令の上位１６ビットの命令コードデータは、上位側１６ビット領域ＵＩＲに転送される。このときランダムロジック回路２０１は、算術演算器ＡＵ２０８に対し、プログラムカウンタＰＣに格納されるべきアドレスのアドレス演算を実行させる。プログラムカウンタＰＣは、算術演算器ＡＵ２０８によって演算されたアドレス演算結果に従うアドレスを格納する。プログラムカウンタＰＣに格納されたアドレスに従って、上記３２ビットＤＳＰ命令の下位１６ビットの命令コードデータが、それを格納する命令メモリから命令レジスタＩＲの下位側１６ビット領域ＬＩＲに転送される。これにより、３２ビットＤＳＰ命令が命令レジスタＩＲに格納される。そして、この命令レジスタＩＲに格納された３２ビットＤＳＰ命令は、デコーダ２４を介してＤＳＰエンジン３のデコーダ３４に供給される。また、他の方法として、図示していないが、複数の命令プリフェッチバッファがＣＰＵコア２内に設けられている。複数の命令プリフェッチバッファは、現在実行されている命令から数サイクル先に実行されるべき命令をプリフェッチする。このようなプリフェッチバッファが設けられている場合において、上述のように３２ビットＤＳＰ命令の上位１６ビットの命令コードデータが下位側領域ＬＩＲから上位側１６ビット領域ＵＩＲに転送されるとき、ランダムロジック回路２０１は、上記３２ビットＤＳＰ命令の下位１６ビットの命令コードデータがプリフェッチされている命令プリフェッチバッファを選択する。その選択された命令プリフェッチバッファから３２ビットＤＳＰ命令の下位１６ビットの命令コードデータが読み出され、命令レジスタＩＲの下位側１６ビット領域ＬＩＲに格納される。 When the decoded instruction is a 16-bit instruction, the code conversion control signal 245 is deactivated, and the code conversion circuit 242 outputs a non-operation code indicating that the output is invalid. When the decoded instruction is a 16-bit instruction, the shift control signal 246 is activated, and the instruction register (IR) 25 that receives it activates the lower 16-bit area (LIR) value to the upper 16 Shift to a bit area (LIR) and use the shifted instruction as all or part of the next instruction to be executed. For example, a case where a 16-bit CPU instruction is stored in the upper 16-bit area UIR of the instruction register IR and an upper 16-bit instruction code of the 32-bit DSP instruction is stored in the lower-side bit area LIR will be described. First, the 16-bit CPU instruction stored in the upper 16-bit area UIR is decoded by the first decoding circuit 240, and the CPU core 2 executes the instruction in accordance with the result, and is stored in the lower 16-bit area LIR. The upper 16-bit instruction code data of the 32-bit DSP instruction is transferred to the upper 16-bit area UIR. At this time, the random logic circuit 201 causes the arithmetic operation unit AU 208 to perform an address operation of an address to be stored in the program counter PC. The program counter PC stores an address according to the address calculation result calculated by the arithmetic operator AU208. According to the address stored in the program counter PC, the lower 16-bit instruction code data of the 32-bit DSP instruction is transferred from the instruction memory storing the 32-bit DSP instruction to the lower 16-bit area LIR of the instruction register IR. As a result, the 32-bit DSP instruction is stored in the instruction register IR. The 32-bit DSP instruction stored in the instruction register IR is supplied to the decoder 34 of the DSP engine 3 via the decoder 24. As another method, though not shown, a plurality of instruction prefetch buffers are provided in the CPU core 2. The plurality of instruction prefetch buffers prefetch instructions to be executed several cycles ahead of the currently executed instruction. When such a prefetch buffer is provided, when the upper 16-bit instruction code data of the 32-bit DSP instruction is transferred from the lower-side area LIR to the upper-side 16-bit area UIR as described above, the random logic circuit 201 selects an instruction prefetch buffer in which instruction code data of lower 16 bits of the 32-bit DSP instruction is prefetched. The lower 16-bit instruction code data of the 32-bit DSP instruction is read from the selected instruction prefetch buffer and stored in the lower 16-bit area LIR of the instruction register IR.

デコードされた命令が１６ビットのＣＰＵ命令である場合、ＤＳＰデコード信号２４４はノンオペレーションを意味するコードとされる。デコードされた命令が１６ビットのＤＳＰ命令である場合には、ＣＰＵ制御信号２４７はＣＰＵデコード信号２４３に基づいて第２のデコード回路２４１が生成し、ＤＳＰエンジン３内部の制御信号は実質的にＤＳＰデコード信号２４４をデコーダ３４が解読して生成する。デコードされた命令が３２ビットのＤＳＰ命令である場合、ＣＰＵ制御信号２４７はＣＰＵデコード信号２４３に基づいて第２のデコード回路２４１が生成し、ＤＳＰエンジン３内部の制御信号はＤＳＰデコード信号２４４とコード変換回路２４２の出力をデコーダ３４が解読して生成する。 When the decoded instruction is a 16-bit CPU instruction, the DSP decode signal 244 is a code meaning non-operation. When the decoded instruction is a 16-bit DSP instruction, the CPU control signal 247 is generated by the second decoding circuit 241 based on the CPU decode signal 243, and the control signal inside the DSP engine 3 is substantially the DSP. Decode signal 244 is decoded by decoder 34 and generated. When the decoded instruction is a 32-bit DSP instruction, the CPU control signal 247 is generated by the second decoding circuit 241 based on the CPU decode signal 243, and the control signal inside the DSP engine 3 is the DSP decode signal 244 and the code. The decoder 34 decodes and generates the output of the conversion circuit 242.

マイクロコンピュータ１の命令セットには命令コード長が、１６ビットのものと３２ビットのものがあり、上述のように１６ビット長命令と３２ビット長命令では処理が異なるので、それぞれの場合を分けてその動作を詳述する。 The instruction set of the microcomputer 1 has an instruction code length of 16 bits and 32 bits, and the processing is different between the 16-bit instruction and the 32-bit instruction as described above. The operation will be described in detail.

始めに１６ビット長命令の場合について説明する。第１のデコード回路２４０は命令レジスタ（ＩＲ）２５にフェッチされた３２ビットの命令コードの内、上位１６ビットをデコードする。第１のデコード回路２４０では、命令コードの最上位６ビットのコードが”１１１１１０”、”１１１１１”以外のときは１６ビット長命令であることがわかるので、このときはＣＰＵデコード信号２４３とＤＳＰデコード信号２４４の出力と共に、命令レジスタ（ＩＲ）２５の下位１６ビット領域ＬＩＲの命令コードデータを上位１６ビット領域ＵＩＲにシフトさせるシフト制御信号２４６を活性化する。活性化されたシフト制御信号２４６を受けた命令レジスタ（ＩＲ）２５は、下位１６ビット領域ＬＩＲに格納されている命令コードを上位１６ビット領域ＵＩＲにシフトする。シフトされた命令コードは、その次に第１のデコード回路２４０でデコードされることになる。デコーダ２４より出力されるＣＰＵデコード信号２４３は、第２デコード回路２４１に出力され、ＤＳＰデコード信号２４４は、ＤＳＰエンジン３に供給される。また、第１のデコード回路２４０は１６ビット長命令であることがわかると、コード変換制御信号２４５を非活性とし、これによってコード変換回路２４２は、下位１６ビットの命令コードが無効であることを示すコードをＤＳＰ制御信号２０の一部として生成する。ＤＳＰエンジン３側では第１のデコード回路２４０より出力されたＤＳＰデコード信号２４４とコード変換回路２４２より出力されたコード信号とをＤＳＰ制御信号２０として入力すると、デコーダ３４が当該ＤＳＰ制御信号２０のデコードを行なう。１６ビットのＤＳＰ命令の場合、コード変換回路２４２より出力されたＤＳＰ制御信号は無効を表わす信号になっているので、デコーダ３４はＤＳＰデコード信号２４４に着目して、ＤＳＰエンジン３内にある乗算器（ＭＡＣ）３０４、算術論理演算器（ＡＬＵ）３０２、及びシフタ（ＳＦＴ）３０３等の制御信号を出力する。ＤＳＰエンジン３はそれら制御信号に従って演算処理を行なう。 First, the case of a 16-bit instruction will be described. The first decoding circuit 240 decodes the upper 16 bits of the 32-bit instruction code fetched into the instruction register (IR) 25. In the first decoding circuit 240, when the most significant 6-bit code of the instruction code is other than “111110” and “11111”, it is understood that the instruction is a 16-bit length instruction. Along with the output of the signal 244, the shift control signal 246 for activating the instruction code data in the lower 16-bit area LIR of the instruction register (IR) 25 to the upper 16-bit area UIR is activated. In response to the activated shift control signal 246, the instruction register (IR) 25 shifts the instruction code stored in the lower 16-bit area LIR to the upper 16-bit area UIR. The shifted instruction code is then decoded by the first decoding circuit 240. The CPU decode signal 243 output from the decoder 24 is output to the second decode circuit 241, and the DSP decode signal 244 is supplied to the DSP engine 3. If the first decoding circuit 240 is found to be a 16-bit instruction, the code conversion control signal 245 is deactivated, and the code conversion circuit 242 confirms that the lower 16-bit instruction code is invalid. The code shown is generated as part of the DSP control signal 20. On the DSP engine 3 side, when the DSP decode signal 244 output from the first decode circuit 240 and the code signal output from the code conversion circuit 242 are input as the DSP control signal 20, the decoder 34 decodes the DSP control signal 20. To do. In the case of a 16-bit DSP instruction, the DSP control signal output from the code conversion circuit 242 is a signal indicating invalidity. Therefore, the decoder 34 pays attention to the DSP decode signal 244 and a multiplier in the DSP engine 3. Control signals such as (MAC) 304, arithmetic logic unit (ALU) 302, and shifter (SFT) 303 are output. The DSP engine 3 performs arithmetic processing according to these control signals.

次に３２ビット長命令の場合について説明する。ＣＰＵコア２内部にある第１のデコード回路２４０では、命令レジスタ（ＩＲ）２５に３２ビットの命令コードを格納する。そして上位１６ビットを第１のデコード回路２４０でデコードし、デコード信号２４３，２４４を出力する。第１のデコード回路２４０では、命令コードの最上位６ビットのコードが”１１１１１０”のときは３２ビット長命令であることがわかるので、コード変換制御信号２４５を活性化し、これによってコード変換回路２４２は、命令レジスタ（ＩＲ）２５の下位１６ビットの命令コードをコード変換する。コード変換された情報はＤＳＰデコード信号２４４と共にＤＳＰエンジン３にＤＳＰ制御信号２０として供給される。デコーダ３４はＤＳＰ制御信号２０をデコードしてＤＳＰエンジン３内部の制御信号を生成する。尚、デコーダ２４，３４は例えばランダムロジック回路で実現することができる。 Next, the case of a 32-bit length instruction will be described. In the first decoding circuit 240 in the CPU core 2, a 32-bit instruction code is stored in the instruction register (IR) 25. Then, the upper 16 bits are decoded by the first decoding circuit 240 and decoded signals 243 and 244 are output. In the first decoding circuit 240, when the most significant 6-bit code of the instruction code is “111110”, it can be seen that the instruction is a 32-bit length instruction. Converts the instruction code of the lower 16 bits of the instruction register (IR) 25. The code-converted information is supplied as a DSP control signal 20 to the DSP engine 3 together with the DSP decode signal 244. The decoder 34 decodes the DSP control signal 20 to generate a control signal inside the DSP engine 3. The decoders 24 and 34 can be realized by a random logic circuit, for example.

図１７には図６に対応される別の実施例が示される。図６の実施例では、命令レジスタ２５の下位領域ＬＩＲの命令データが上位領域ＵＩＲにシフトされるものとして説明した。図１７の実施例は、前記命令レジスタ２５と内部データバスＩＤＢとの間に、命令プリフェッチキューを構成する直列２段の命令プリフェッチバッファ２５０，２５１を供え、命令プリフェッチバッファ２５０，２５１の保持データをセレクタ２５２で選択して命令レジスタ２５に与えるようになっている。命令プリフェッチバッファ２５０，２５１及び命令レジスタ２５の夫々は、３２ビット単位でデータを保持し、その保持動作は、制御信号φ１〜φ３（ＣＬＫ１に同期）によって制御される。特に図示されないが、命令プリフェッチバッファ２５０，２５１及び命令レジスタ２５の夫々は、マスタ・スレーブの構成を有し、マスタ段は対応される制御信号の立ち上がりに同期して入力のラッチ動作を行い、スレーブ段は対応される制御信号の立ち下がりに同期して入力のラッチ動作を行う。これによって、直列２段の命令プリフェッチバッファ２５０，２５１には、プリフェッチされた前後の命令データが格納される。 FIG. 17 shows another embodiment corresponding to FIG. In the embodiment of FIG. 6, the instruction data in the lower area LIR of the instruction register 25 is described as being shifted to the upper area UIR. The embodiment of FIG. 17 is provided with serial two-stage instruction prefetch buffers 250 and 251 constituting an instruction prefetch queue between the instruction register 25 and the internal data bus IDB. The data is selected by the selector 252 and given to the instruction register 25. Each of the instruction prefetch buffers 250 and 251 and the instruction register 25 holds data in units of 32 bits, and the holding operation is controlled by control signals φ1 to φ3 (synchronized with CLK1). Although not shown in particular, each of the instruction prefetch buffers 250 and 251 and the instruction register 25 has a master / slave configuration, and the master stage performs an input latch operation in synchronization with the rise of the corresponding control signal, and the slave The stage performs an input latch operation in synchronization with the fall of the corresponding control signal. As a result, the instruction data before and after the prefetch are stored in the instruction prefetch buffers 250 and 251 in two stages in series.

前記セレクタ２５２は、選択制御信号φ４に従って、ポートＰａに供給される３２ビットの命令データ又はポートＰｂ供給される３２ビットの命令データを選択して命令レジスタ２５に与える。前記ポートＰａには、命令プリフェッチバッファ２５０の上位１６ビット領域ＵＰＢ１を下位側とし、命令プリフェッチバッファ２５１の下位１６ビット領域ＬＰＢ２を上位側とする３２ビットの命令データが供給される。ポートＰｂには命令プリフェッチバッファ２５１に格納されている３２ビットの命令データがそのまま供給される。 The selector 252 selects 32-bit instruction data supplied to the port Pa or 32-bit instruction data supplied to the port Pb in accordance with the selection control signal φ4, and supplies it to the instruction register 25. The port Pa is supplied with 32-bit instruction data having the upper 16-bit area UPB1 of the instruction prefetch buffer 250 as the lower side and the lower 16-bit area LPB2 of the instruction prefetch buffer 251 as the upper side. The port Pb is supplied with the 32-bit instruction data stored in the instruction prefetch buffer 251 as it is.

これにより、命令プリフェッチバッファ２５１が３２ビットのＤＳＰ命令を保持しているとき、セレクタ２５２は、ポートＰｂの出力を選択することによって当該３２ビットのＤＳＰ命令を命令レジスタ２５にセットすることができる。命令プリフェッチバッファ２５１が１６ビットのＤＳＰ命令又は１６ビットのＣＰＵ命令を上位領域ＵＰＢ２に保持しているとき、セレクタ２５２は、ポートＰｂの出力を選択することによって当該１６ビットの命令を命令レジスタ２５の上位領域ＵＩＲにセットすることができる。命令プリフェッチバッファ２５１が１６ビットのＤＳＰ命令又は１６ビットのＣＰＵ命令を下位領域ＬＰＢ２に保持しているときは、セレクタ２５２が、ポートＰａの出力を選択することによって当該１６ビットの命令を命令レジスタ２５の上位領域ＵＩＲにセットすることができる。命令プリフェッチバッファ２５１が３２ビットＤＳＰ命令の上位側１６ビット命令コードを下位領域ＬＰＢ２に保持し、命令プリフェッチバッファ２５０がその上位領域ＵＰＢ１に当該３２ビットＤＳＰ命令の下位側１６ビット命令コードを保持しているときは、セレクタ２５２が、ポートＰａの出力を選択することによって当該３２ビットＤＳＰ命令を命令レジスタ２５にセットすることができる。 Thus, when the instruction prefetch buffer 251 holds a 32-bit DSP instruction, the selector 252 can set the 32-bit DSP instruction in the instruction register 25 by selecting the output of the port Pb. When the instruction prefetch buffer 251 holds a 16-bit DSP instruction or a 16-bit CPU instruction in the upper area UPB2, the selector 252 selects the output of the port Pb to select the 16-bit instruction in the instruction register 25. It can be set in the upper area UIR. When the instruction prefetch buffer 251 holds a 16-bit DSP instruction or a 16-bit CPU instruction in the lower area LPB2, the selector 252 selects the output of the port Pa to select the 16-bit instruction in the instruction register 25. Can be set in the upper region UIR of the. The instruction prefetch buffer 251 holds the upper 16-bit instruction code of the 32-bit DSP instruction in the lower area LPB2, and the instruction prefetch buffer 250 holds the lower 16-bit instruction code of the 32-bit DSP instruction in the upper area UPB1. When the selector 252 selects the output of the port Pa, the 32-bit DSP instruction can be set in the instruction register 25.

図１７において２５３は、前記命令プリフェッチバッファのラッチ制御信号φ１，φ２、命令レジスタ２５のラッチ制御信号φ３、及び前記選択制御信号φ４を生成する制御ロジックである。この制御ロジック２５３は、１６ビット命令か３２ビット命令かを示す制御信号２４８と命令プリフェッチバッファ２５０，２５１の各領域に実行されないまま残っている命令コードの状態に従って、前記制御信号φ１〜φ４を生成する。この制御ロジック２５３は命令フェッチのための制御論理の一部を構成する。尚、前記制御信号２４８は、第１のデコード回路２４０が命令レジスタ２５の上位領域ＵＩＲから供給される命令コードデータの上位側６ビットをデコードして生成されるものであり、その詳細については後述する。 In FIG. 17, reference numeral 253 denotes control logic for generating the latch control signals φ1 and φ2 of the instruction prefetch buffer, the latch control signal φ3 of the instruction register 25, and the selection control signal φ4. The control logic 253 generates the control signals φ1 to φ4 in accordance with the control signal 248 indicating whether the instruction is a 16-bit instruction or a 32-bit instruction and the state of the instruction code that remains unexecuted in each area of the instruction prefetch buffers 250 and 251. To do. This control logic 253 constitutes a part of control logic for instruction fetch. The control signal 248 is generated when the first decoding circuit 240 decodes the upper 6 bits of the instruction code data supplied from the upper area UIR of the instruction register 25, and details thereof will be described later. To do.

前記制御論理２５３による命令レジスタ２５への命令コードデータのセットは以下のようにされる。外部からの命令フェッチは、ＣＰＵコア２の命令フェッチタイミング（例えば後述する複数段のパイプラインステージにおける命令フェッチステージＩＦ）において、命令プリフェッチバッファ２５０に３２ビットの命令コードデータを新たに格納する余地がある場合に行われる。そのタイミングで命令フェッチが行われるときは、命令プリフェッチバッファ２５１にはまだ実行されていない命令が残っている。命令プリフェッチバッファ２５１の領域ＵＰＢ２，ＬＰＢ２に格納されている命令コードの双方がまだ実行されていない第１の状態の場合には、命令プリフェッチバッファ２５１の３２ビットの出力がポートＰｂを介してセレクタ２５２で選択されて命令レジスタ２５にセットされる。一方、命令プリフェッチバッファ２５１の下位領域ＬＰＢ２に格納されている命令コードだけがまだ実行されていない第２の状態の場合には、命令プリフェッチバッファ２５０にプリフェッチした上位領域ＵＰＢ１と命令プリフェッチバッファ２５１の下位領域ＬＰＢ２の命令コードデータがポートＰａを介して命令レジスタ２５にセットされる。 The instruction code data is set in the instruction register 25 by the control logic 253 as follows. In the instruction fetch from the outside, there is room for newly storing 32-bit instruction code data in the instruction prefetch buffer 250 at the instruction fetch timing of the CPU core 2 (for example, an instruction fetch stage IF in a plurality of pipeline stages described later). Done in some cases. When an instruction fetch is performed at that timing, an instruction that has not yet been executed remains in the instruction prefetch buffer 251. In the first state where both the instruction codes stored in the areas UPB2 and LPB2 of the instruction prefetch buffer 251 are not yet executed, the 32-bit output of the instruction prefetch buffer 251 is sent to the selector 252 via the port Pb. Is selected and set in the instruction register 25. On the other hand, in the second state where only the instruction code stored in the lower area LPB2 of the instruction prefetch buffer 251 is not yet executed, the upper area UPB1 prefetched to the instruction prefetch buffer 250 and the lower area of the instruction prefetch buffer 251 are displayed. The instruction code data in the area LPB2 is set in the instruction register 25 via the port Pa.

前記第１の状態において、命令レジスタ２５の上位領域ＵＩＲにセットされた命令コードデータをデコード回路２４０がデコードした結果、それが３２ビット命令を構成するものである場合には、そのとき、命令プリフェッチバッファ２５０にプリフェッチされた３２ビットの命令コードデータがそのまま命令プリフェッチバッファ２５１に転送される。一方、デコード結果によって１６ビット命令であることが検出されたときは、命令プリフェッチバッファ２５０から次段のバッファ２５１へのデータシフトは行われない。 In the first state, when the decode circuit 240 decodes the instruction code data set in the upper area UIR of the instruction register 25, when it constitutes a 32-bit instruction, the instruction prefetch is performed at that time. The 32-bit instruction code data prefetched to the buffer 250 is transferred to the instruction prefetch buffer 251 as it is. On the other hand, when it is detected from the decoding result that the instruction is a 16-bit instruction, data shift from the instruction prefetch buffer 250 to the next-stage buffer 251 is not performed.

前記第２の状態では、ポートＰａを介する命令レジスタ２５へのデータセットの後、命令プリフェッチバッファ２５０にプリフェッチされている３２ビットの命令コードデータは、そのまま命令プリフェッチバッファ２５１にシフトされてセットされる。このシフトセット後、命令プリフェッチバッファ２５０に未だ実行されていない命令コードデータが存在しないならば、命令プリフェッチバッファ２５０には、次の命令フェッチタイミングで命令コードデータがプリフェッチされる。 In the second state, after the data is set to the instruction register 25 via the port Pa, the 32-bit instruction code data prefetched to the instruction prefetch buffer 250 is shifted and set to the instruction prefetch buffer 251 as it is. . After this shift set, if there is no instruction code data not yet executed in the instruction prefetch buffer 250, the instruction code data is prefetched into the instruction prefetch buffer 250 at the next instruction fetch timing.

このような制御により、命令フェッチタイミングの後には、まだ処理されていない命令コードデータが命令レジスタ２５にセットされる。このとき、実行されるべき命令が、１６ビットＣＰＵ命令、１６ビットＤＳＰ命令又は３２ビットＤＳＰ命令の何れであっても、その上位側１６ビットは必ず第１のデコード回路２４０に供給されることになる。 By such control, instruction code data not yet processed is set in the instruction register 25 after the instruction fetch timing. At this time, regardless of whether the instruction to be executed is a 16-bit CPU instruction, a 16-bit DSP instruction, or a 32-bit DSP instruction, the upper 16 bits are always supplied to the first decoding circuit 240. Become.

図６で説明したコード変換回路２４２は、図１７ではセレクタ２４２Ａとコード変換ロジック２４２Ｂによって構成される。また、第１のデコード回路２４０は、図６の説明ではそれがデコードした命令コードが１６ビット命令であるか否かによってそのレベルが制御される制御信号２４５，２４６を生成したが、図１７の例では、それがデコードした命令コードが１６ビット命令であるのか３２ビット命令（本実施例において３２ビット命令はＤＳＰ命令である）であるのかを識別するための制御信号２４８を出力する。セレクタ２４２Ａは、制御信号２４８が１６ビット命令を意味するときは、ノーオペレーションコードＮＯＰを選択してコード変換ロジック２４２Ｂに供給し、制御信号２４８が３２ビットＤＳＰ命令であることを意味するときは、命令レジスタ２５の下位領域ＬＩＲの命令コードをコード変換ロジック２４２Ｂに供給する。コード変換ロジック２４２Ｂは、特に制限されないが、命令レジスタ２５の下位領域ＬＩＲの命令コードデータの一部例えばレジスタ選択のためのコード情報をＤＳＰエンジン３のデコーダ３４に適する形態に修正して出力する。 The code conversion circuit 242 described with reference to FIG. 6 includes a selector 242A and code conversion logic 242B in FIG. Further, in the description of FIG. 6, the first decoding circuit 240 generates the control signals 245 and 246 whose levels are controlled depending on whether or not the instruction code decoded by the first decoding circuit 240 is a 16-bit instruction. In the example, a control signal 248 for identifying whether the decoded instruction code is a 16-bit instruction or a 32-bit instruction (in this embodiment, the 32-bit instruction is a DSP instruction) is output. When the control signal 248 means a 16-bit instruction, the selector 242A selects and supplies the no-operation code NOP to the code conversion logic 242B, and when the control signal 248 means a 32-bit DSP instruction, The instruction code in the lower area LIR of the instruction register 25 is supplied to the code conversion logic 242B. Although not particularly limited, the code conversion logic 242B corrects and outputs a part of the instruction code data in the lower area LIR of the instruction register 25, for example, code information for register selection, to a form suitable for the decoder 34 of the DSP engine 3.

図１７の実施例において第１のデコード回路２４０は命令レジスタ２５の上位領域ＵＩＲが保持する１６ビットの命令コードデータを解読し、これによって得られたＣＰＵデコード信号２４３を第２のデコード回路２４３に与え、また、ＤＳＰデコード信号２４４をデコーダ３４に与える。ＣＰＵデコード信号２４３は、ＣＰＵ命令及びＤＳＰ命令の何れにおいても有意とされ、第２のデコード回路２４１に供給される。第２のデコード回路２４１は、ＣＰＵデコード信号２４３をデコードして、ＣＰＵコア２が行うべきアドレス演算やデータ演算のための制御情報、及び内部メモリＸ−ＲＯＭ４，Ｙ−ＲＯＭ５，Ｘ−ＲＡＭ，Ｙ−ＲＡＭそして外部メモリをアクセスしたりするためのアドレスバスやデータバスの選択制御情報等を出力する。前述の通り、ＤＳＰ命令に対しても、それに必要なアドレス演算やデータパスの選択はＣＰＵコア２が行う。 In the embodiment of FIG. 17, the first decoding circuit 240 decodes the 16-bit instruction code data held in the upper area UIR of the instruction register 25 and sends the CPU decode signal 243 obtained thereby to the second decoding circuit 243. The DSP decode signal 244 is supplied to the decoder 34. The CPU decode signal 243 is significant in both the CPU instruction and the DSP instruction and is supplied to the second decode circuit 241. The second decode circuit 241 decodes the CPU decode signal 243 to control information for address calculation and data calculation to be performed by the CPU core 2, and internal memories X-ROM 4, Y-ROM 5, X-RAM, Y Outputs address bus and data bus selection control information for accessing the RAM and external memory. As described above, the CPU core 2 performs address calculation and data path selection necessary for a DSP instruction.

前記ＤＳＰデコード信号２４４は、前述の通り、第１のデコード回路２４０に供給される命令コードがＤＳＰ命令のためのコードデータである場合に有意とされるデコード信号である。有意ＤＳＰデコード信号２４４は、例えば、ＣＰＵコア２で行われるアドレス演算に従ってアクセスされるメモリとの間でデータの受け渡しを行うＤＳＰエンジン３内のレジスタ等の指定情報を含んでいる。第１のデコード回路２４０に供給される命令コードがＣＰＵ命令である場合には、ＤＳＰデコード信号２４４は無効を意味するコードにされる。 As described above, the DSP decode signal 244 is a decode signal that is significant when the instruction code supplied to the first decode circuit 240 is code data for a DSP instruction. The significant DSP decode signal 244 includes, for example, designation information such as a register in the DSP engine 3 that exchanges data with a memory that is accessed according to an address calculation performed in the CPU core 2. When the instruction code supplied to the first decoding circuit 240 is a CPU instruction, the DSP decode signal 244 is set to a code meaning invalid.

ここで、マイクロコンピュータ１の命令セットに含まれる前記ＤＳＰ命令のコードを更に詳述する。図１８及び図１９は夫々１６ビットのＤＳＰ命令の命令コードが示され、図２０及び図２１には３２ビットのＤＳＰ命令の命令コードが示される。前述のように、ＤＳＰ命令は、命令コードの最上位側の４ビットが”１１１１”に割り当てられ、命令コードの最上位側の６ビットが”１１１１００”及び”１１１１０１”は１６ビットのＤＳＰ命令、命令コードの最上位側の６ビットが”１１１１１０”の命令は３２ビットのＤＳＰ命令とされる。 Here, the code of the DSP instruction included in the instruction set of the microcomputer 1 will be described in more detail. FIGS. 18 and 19 show instruction codes of 16-bit DSP instructions, respectively, and FIGS. 20 and 21 show instruction codes of 32-bit DSP instructions. As described above, in the DSP instruction, the most significant 4 bits of the instruction code are assigned to “1111”, and the most significant 6 bits of the instruction code are “111100” and “111101” are 16-bit DSP instructions, An instruction whose uppermost 6 bits of the instruction code is “111110” is a 32-bit DSP instruction.

図１８の第１欄（X Side of Data Transfer）に示される１６ビットＤＳＰ命令の命令フォーマットはＸメモリ（Ｘ−ＲＯＭ４，Ｘ−ＲＡＭ６）とＤＳＰエンジン３の内蔵レジスタとの間におけるデータ転送命令であり、第２欄（Y Side of Data Transfer）に示される命令フォーマットはＹメモリ（Ｙ−ＲＯＭ５，Ｙ−ＲＡＭ７）とＤＳＰエンジン３の内蔵レジスタとの間におけるデータ転送命令である。上記命令フォーマットにおいて、Ａｘ，ＡｙはＣＰＵコア２に含まれるレジスタアレイ２０９（図３参照）に含まれるレジスタを指定し、Ａｘ＝”０”はレジスタＲ４を指定し、Ａｘ＝”１”はレジスタＲ５を指定し、Ａｙ＝”０”はレジスタＲ６を指定し、Ａｙ＝”１”はレジスタＲ７を指定する。Ｄｘ，Ｄｙ，ＤａはＤＳＰエンジンに含まれるレジスタを指定し、Ｄｘ＝”０”はレジスタＸ０、Ｄｘ＝”１”はレジスタＸ１、Ｄｙ＝”０”はレジスタＹ０、Ｄｙ＝”１”はレジスタＹ１、Ｄａ＝”０”はレジスタＡ０、Ｄａ＝”１”はレジスタＡ１を夫々指定する。Ｉｘ，Ｉｙはイミディエイト値を意味する。 The instruction format of the 16-bit DSP instruction shown in the first column (X Side of Data Transfer) in FIG. 18 is a data transfer instruction between the X memory (X-ROM 4, X-RAM 6) and the built-in register of the DSP engine 3. The instruction format shown in the second column (Y Side of Data Transfer) is a data transfer instruction between the Y memory (Y-ROM 5, Y-RAM 7) and the built-in register of the DSP engine 3. In the above instruction format, Ax and Ay designate a register included in the register array 209 (see FIG. 3) included in the CPU core 2, Ax = "0" designates the register R4, and Ax = "1" designates the register. R5 is designated, Ay = "0" designates the register R6, and Ay = "1" designates the register R7. Dx, Dy, Da specify registers included in the DSP engine, Dx = "0" is register X0, Dx = "1" is register X1, Dy = "0" is register Y0, Dy = "1" is register Y1, Da = "0" designates the register A0, and Da = "1" designates the register A1. Ix and Iy mean immediate values.

図１９に示される１６ビットＤＳＰ命令の命令フォーマットは、マイクロコンピュータ１の外部に接続された図示しないメモリとＤＳＰエンジン３の内蔵レジスタとの間におけるデータ転送命令である。ＡｓはＣＰＵコア２に内蔵されたレジスタアレイ２０９（図３参照）に含まれるレジスタを指定し、ＤｓはＤＳＰエンジンに内蔵されるレジスタＸ１，Ｘ０，Ｙ１，Ｙ０，Ａ１，Ａ０やレジスタアレイ３０５（図４参照）に含まれるレジスタを指定する。 The instruction format of the 16-bit DSP instruction shown in FIG. 19 is a data transfer instruction between a memory (not shown) connected to the outside of the microcomputer 1 and a built-in register of the DSP engine 3. As designates a register included in a register array 209 (see FIG. 3) built in the CPU core 2, and Ds designates a register X1, X0, Y1, Y0, A1, A0 built in the DSP engine or a register array 305 ( The register included in (see FIG. 4) is designated.

３２ビットＤＳＰ命令のフォーマットは、３２ビットＤＳＰ命令であることを示すコード”１１１１１０”の領域（ビット３１〜ビット２６）、Ａフィールド（ビット２５〜ビット１６）及びＢフィールド（ビット１５〜ビット０）に大別される。図２０はＡフィールドに着目した場合の当該フィールドのコードとそれに対応されるにニーモニックを示し、図２１はＢフィールドに着目した場合の当該フィールドのコードとそれに対応されるにニーモニックを示す。 The format of the 32-bit DSP instruction is a code “111110” area (bits 31 to 26) indicating that it is a 32-bit DSP instruction, an A field (bits 25 to 16), and a B field (bits 15 to 0). It is divided roughly into. FIG. 20 shows a mnemonic corresponding to the code of the field when focusing on the A field, and FIG. 21 shows a mnemonic corresponding to the code of the field when focusing on the B field.

図２０に示されるＡフィールドのコードは、図１８に示される１６ビットＤＳＰ命令のビット９〜ビット０のコードと同一であり、第２０図の第１欄（X Side of Data Transfer）に示されるＡフィールドのコードはＸメモリ（Ｘ−ＲＯＭ４，Ｘ−ＲＡＭ６）とＤＳＰエンジン３の内蔵レジスタとの間におけるデータ転送を規定し、第２欄（Y Side of Data Transfer）に示されるＡフィールドのコードはＹメモリ（Ｙ−ＲＯＭ５，Ｙ−ＲＡＭ７）とＤＳＰエンジン３の内蔵レジスタとの間におけるデータ転送を規定する。当該Ａフィールドに含まれるビットＡｘ，Ａｙ，Ｄｘ，Ｄｙ，Ｄａが指定する内容は図１８と全く同じである。 The code of the A field shown in FIG. 20 is the same as the code of bit 9 to bit 0 of the 16-bit DSP instruction shown in FIG. 18, and is shown in the first column (X Side of Data Transfer) of FIG. The A field code defines the data transfer between the X memory (X-ROM 4, X-RAM 6) and the built-in register of the DSP engine 3, and the A field code shown in the second column (Y Side of Data Transfer). Defines data transfer between the Y memory (Y-ROM5, Y-RAM7) and the built-in register of the DSP engine 3. The contents designated by the bits Ax, Ay, Dx, Dy, Da included in the A field are exactly the same as those in FIG.

図２１に示されるＢフィールドのコードは、ＤＳＰエンジン３の内部で行われる算術演算、論理演算、シフト演算、レジスタ間のロード／ストアなどの処理を規定する。例えば、ＤＳＰエンジン３の内部で行われる乗算（ＰＭＵＬＳ）、減算（ＰＳＵＢ）、加算（ＰＡＤＤ）、丸め（ＰＲＮＤ）、シフト（ＰＳＨＬ）、論理積（ＰＡＮＤ）、排他的論理和（ＸＯＲ）、論理和（ＯＲ）、インクリメント（ＰＩＮＣ）、ディクリメント（ＰＤＥＣ）、クリア（ＣＬＲ）等の演算や、ＤＳＰエンジン３の内部で行われるロード（ＰＬＤＳ）及びストア（ＰＳＴＳ）等を規定する。図２１の第３欄（3 Operand Operation with Condition）は、条件付きのコードであり、その条件（if cc）としては、ＤＣ（データコンプリート）ビット（データの処理完了を示すビット）の論理値又は無視を選択することができる。 The code in the B field shown in FIG. 21 defines processing such as arithmetic operation, logical operation, shift operation, and load / store between registers performed in the DSP engine 3. For example, multiplication (PMULS), subtraction (PSUB), addition (PADD), rounding (PRND), shift (PSHL), logical product (PAND), exclusive logical sum (XOR), logical operation performed in the DSP engine 3 It defines operations such as sum (OR), increment (PINC), decrement (PDEC), clear (CLR), and loads (PLDS) and stores (PSTS) performed inside the DSP engine 3. The third column (3 Operand Operation with Condition) in FIG. 21 is a conditional code, and the condition (if cc) includes a logical value of a DC (data complete) bit (a bit indicating completion of data processing) or You can choose to ignore.

実際の３２ビットＤＳＰ命令は、ＢフィールドのコードとＡフィールドのコードとが任意に組み合わされて記述される。即ち、３２ビットのＤＳＰ命令は、マイクロコンピュータ１の内部又は外部から演算対象とされるオペランドをフェッチし、それをＤＳＰエンジン３の内部で演算する処理を規定する。上述の説明から明らかなように、オペランドフェッチのためのアドレス演算やデータパスの選択はＣＰＵ２によって行われる。３２ビットＤＳＰ命令においてオペランドフェッチを規定するＡフィールドのコードは１６ビットのＤＳＰ命令と同じである。１６ビットＤＳＰ命令は、ＤＳＰエンジン３内部のレジスタに対する初期設定などに利用される。 An actual 32-bit DSP instruction is described by arbitrarily combining a B field code and an A field code. That is, the 32-bit DSP instruction defines a process of fetching an operand to be operated from inside or outside the microcomputer 1 and calculating it inside the DSP engine 3. As is apparent from the above description, the address calculation and data path selection for operand fetch are performed by the CPU 2. The code of the A field that defines the operand fetch in the 32-bit DSP instruction is the same as that of the 16-bit DSP instruction. The 16-bit DSP instruction is used for initial setting for a register in the DSP engine 3.

図１７等に示される構成を参照しても明らかなように、３２ビットＤＳＰ命令のＡフィールドのコードデータは命令レジスタ２５における上位領域ＵＩＲにセットされる。また、Ａフィールドと同一のフォーマットを有する１６ビットＤＳＰ命令も上位領域ＵＩＲにセットされる。したがって、その何れにおいても、ＣＰＵコア２は、必要なアドレス演算及びデータフェッチ（若しくはオペランドフェッチ）に必要なデータパスの選択を同様に行えばよい。換言すれば、３２ビットＤＳＰ命令を実行するためのデータフェッチ（若しくはオペランドフェッチ）と１６ビットＤＳＰ命令を実行するためのデータフェッチ（若しくはオペランドフェッチ）とに必要とされるデコード回路２４０、２４１が共通化され、この点においても、マイクロコンピュータ１の論理規模の縮小に寄与する。３２ビットＤＳＰ命令のＡフィールドが指定するＤＳＰエンジン３の内部レジスタの指定情報や１６ビットＤＳＰ命令が指定するＤＳＰエンジン３の内部レジスタの指定情報は、前記ＤＳＰデコード信号２４４としてＤＳＰエンジン３に与えられる。ＤＳＰデコード信号２４４を有意とするか否かは、前記第１のデコード回路２４０が上位領域ＵＩＲの最上位側の４ビットをデコードして決定する。 As apparent from the configuration shown in FIG. 17 and the like, the code data of the A field of the 32-bit DSP instruction is set in the upper area UIR in the instruction register 25. A 16-bit DSP instruction having the same format as the A field is also set in the upper area UIR. Therefore, in any of them, the CPU core 2 may similarly select a data path necessary for necessary address calculation and data fetch (or operand fetch). In other words, the decoding circuits 240 and 241 required for the data fetch (or operand fetch) for executing the 32-bit DSP instruction and the data fetch (or operand fetch) for executing the 16-bit DSP instruction are common. This also contributes to the reduction of the logical scale of the microcomputer 1. The designation information of the internal register of the DSP engine 3 designated by the A field of the 32-bit DSP instruction and the designation information of the internal register of the DSP engine 3 designated by the 16-bit DSP instruction are given to the DSP engine 3 as the DSP decode signal 244. . Whether the DSP decode signal 244 is significant or not is determined by the first decoding circuit 240 by decoding the most significant 4 bits of the upper area UIR.

次に、本実施例のマイクロコンピュータにおける演算制御の内容を図７乃至図１６の命令実行タイミングチャートを参照しながら説明する。本実施例のマイクロコンピュータ１は、ＩＦ，ＩＤ，ＥＸ，ＭＡ，ＷＢ/ＤＳＰステージの５段パイプライン動作を行なっている。ＩＦは命令フェッチステージ、ＩＤは命令デコードステージ、ＥＸは演算実行ステージ、ＭＡはメモリアクセスステージ、ＷＢ/ＤＳＰはメモリから取得したデータをＣＰＵコア２のレジスタに取り込むステージまたはＤＳＰエンジン３がＤＳＰ命令を実行するステージである。各図においてInstruction/Data Accessは内部バスＩＡＢ，ＩＤＢを介するメモリアクセスを意味し、アクセス対象は内蔵メモリ４〜７の他にマイクロコンピュータ１の外部メモリも可能にされる。X,Y Mem. Accessは内部バスＸＡＢ，ＸＤＢやＹＡＢ，ＹＤＢを介するメモリアクセスを意味し、アクセス対象は内蔵メモリ４〜７に限られる。Isnt.Fetchは命令レジスタ（ＩＲ）２５への命令フェッチタイミング、Fetch.Regは命令レジスタ（ＩＲ）２５、Source Data Outはソースデータ出力、Destination Inはディスティネーションデータの入力タイミング、Destination Registerはディスティネーションレジスタ、をそれぞれ意味する。Pointer Reg.はポインターレジスタ、Address Calc.はアドレス演算、Data Fetchはデータフェッチ、DSP Control signal Decord Timingはデコーダ３４によるＤＳＰ制御信号２０のデコードタイミングを意味する。 Next, the contents of the arithmetic control in the microcomputer of the present embodiment will be described with reference to the instruction execution timing charts of FIGS. The microcomputer 1 of this embodiment performs a five-stage pipeline operation of IF, ID, EX, MA, and WB / DSP stages. IF is an instruction fetch stage, ID is an instruction decode stage, EX is an operation execution stage, MA is a memory access stage, WB / DSP is a stage for fetching data acquired from memory into a register of CPU core 2, or DSP engine 3 receives a DSP instruction The stage to execute. In each figure, Instruction / Data Access means memory access via the internal buses IAB and IDB, and the access target can be an external memory of the microcomputer 1 in addition to the built-in memories 4-7. X, Y Mem. Access means memory access via the internal buses XAB, XDB, YAB, YDB, and the access target is limited to the built-in memories 4-7. Isnt.Fetch is the instruction fetch timing to the instruction register (IR) 25, Fetch.Reg is the instruction register (IR) 25, Source Data Out is the source data output, Destination In is the destination data input timing, and Destination Register is the destination Each means a register. Pointer Reg. Means a pointer register, Address Calc. Means an address operation, Data Fetch means data fetch, and DSP Control signal Decord Timing means a decoding timing of the DSP control signal 20 by the decoder 34.

図７はＣＰＵコア２内部のＡＬＵ演算命令の実行タイムチャートを示す。ここではＡＬＵ演算命令として、ＡＤＤＲｍ, Ｒｎを一例とする。 FIG. 7 shows an execution time chart of the ALU operation instruction in the CPU core 2. Here, ADD Rm, Rn is taken as an example as an ALU operation instruction.

ＩＦステージ直前におけるクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期して、実行すべき命令（ＡＤＤＲｍ, Ｒｎ）が格納されているアドレスがアドレスバスＩＡＢに出力される。Instruction/Data Ｍem. Accessでは、ＩＦステージでメモリアクセス動作が行われる。具体的にはクロック信号Ｃｌｏｃｋ１の立ち上がりからクロック信号Ｃｌｏｃｋ２の立ち上りの期間でアドレスバスＩＡＢで指定されたアドレスのデコードが行われ、ＩＦステージのクロック信号Ｃｌｏｃｋ２の立ち上がりから次のクロック信号Ｃｌｏｃｋ１の立ち上がりの期間で命令アクセスが行われる。そのためＩＦステージのクロック信号Ｃｌｏｃｋ２の立ち上がりからデータバスＩＤＢに命令が出力される。データバスＩＤＢに出力された命令は、ＩＤステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して命令レジスタ（ＩＲ）２５に取り込まれる。ＩＤステージでは命令レジスタ（ＩＲ）２５に取り込まれたデータのデコードが行なわれる。ＥＸステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、ソースデータが格納されているレジスタがアクセスされ、ＣＰＵコア２の内部バスＡ１，Ｂ１にレジスタの値が出力される。命令ＡＤＤＲｍ, Ｒｎでは、ＲｍとＲｎに指定したレジスタがソースレジスタとされる。ＲｍとＲｎはＣＰＵコア２の内部の任意のレジスタ（図３では、レジスタ２０９内の任意のレジスタ、Ａ０ｘ，Ａ１ｘ，Ｉｘ，Ａ０ｙ，Ａ１ｙ，Ｉｙ、ＲｍおよびＲｎとして指定可能）を指定できる。ＣＰＵコア２の内部バスＡ１，Ｂ１に出力されたデータは算術論理演算器（ＡＬＵ）２１３で加算演算が行われ、その結果はＣＰＵコア２の内部バスＣ１に出力される。ＣＰＵコア２の内部バスＣ１に出力された演算結果は、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してディステネーションレジスタ（ディステネーションレジスタは、ＡＤＤＲｍ, Ｒｎ命令でＲｎに指定したレジスタとされる）に格納される。このように、ＣＰＵコア２の内部のＡＬＵ演算命令では、ＩＦ，ＩＤ，ＥＸの３段のパイプラインステージで命令実行動作が完了する。 In synchronization with the rising timing of the clock signal Clock2 immediately before the IF stage, an address in which an instruction (ADD Rm, Rn) to be executed is stored is output to the address bus IAB. In Instruction / Data Mem. Access, a memory access operation is performed in the IF stage. Specifically, the address specified by the address bus IAB is decoded in the period from the rising edge of the clock signal Clock1 to the rising edge of the clock signal Clock2, and the period from the rising edge of the clock signal Clock2 in the IF stage to the next rising edge of the clock signal Clock1. Instruction access is performed at. Therefore, an instruction is output to the data bus IDB from the rise of the clock signal Clock2 of the IF stage. The instruction output to the data bus IDB is taken into the instruction register (IR) 25 in synchronization with the rising timing of the clock signal Clock1 of the ID stage. In the ID stage, the data fetched into the instruction register (IR) 25 is decoded. In synchronization with the rising timing of the clock signal Clock 1 of the EX stage, the register storing the source data is accessed, and the value of the register is output to the internal buses A 1 and B 1 of the CPU core 2. In the instructions ADD Rm and Rn, the registers specified in Rm and Rn are the source registers. Rm and Rn can designate any register in the CPU core 2 (in FIG. 3, any register in the register 209, A0x, A1x, Ix, A0y, A1y, Iy, Rm, and Rn can be designated). The data output to the internal buses A1 and B1 of the CPU core 2 is subjected to an addition operation by an arithmetic logic unit (ALU) 213, and the result is output to the internal bus C1 of the CPU core 2. The calculation result output to the internal bus C1 of the CPU core 2 is a destination register in synchronization with the rising timing of the clock signal Clock2 of the EX stage (the destination register is designated as Rn by an ADD Rm, Rn instruction) Stored in a register). As described above, in the ALU operation instruction in the CPU core 2, the instruction execution operation is completed in the three pipeline stages of IF, ID, and EX.

図８はメモリからＣＰＵコア２へのデータ読み込み動作のタイムチャートを示す。メモリからＣＰＵコア２へのデータ読み込み動作命令の一例として、ＭＯＶ.Ｌ＠Ｒｍ, Ｒｎを例にとって動作説明をする。命令フェッチ（ＩＦ）、命令デコード（ＩＤ）までの動作は図７と同じなのでその部分の詳細な説明は省略する。 FIG. 8 shows a time chart of data reading operation from the memory to the CPU core 2. The operation will be described by taking MOV.L @Rm, Rn as an example of a data read operation instruction from the memory to the CPU core 2. Since the operations up to instruction fetch (IF) and instruction decode (ID) are the same as those in FIG. 7, detailed description thereof will be omitted.

ＥＸステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、アドレスポインタとなるレジスタのデータはＣＰＵコア２の内部バスＡ１に出力される。この例では、アドレスポインタとなるレジスタは、Ｒｍで指定したレジスタになる。Ｒｍに指定できるレジスタは、ＣＰＵコア２に含まれる任意のレジスタ（図３では、Ｒｅｇ.に含まれる任意のレジスタ、Ａ０ｘ，Ａ１ｘ，Ｉｘ，Ａ０ｙ，Ａ１ｙ，ＩｙがＲｍとして指定可能）である。ＣＰＵコア２の内部バスＡ１に出力されたデータは、アドレスバッファ２０５に格納され、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してアドレスバスＩＡＢに出力される。一方ＣＰＵコア２の内部バスＡ１に出力されたデータは算術論理演算器（ＡＬＵ）２１３で演算が行なわれる。この場合、算術論理演算器（ＡＬＵ）２１３は０加算演算を行なう。その結果はＣＰＵコア２の内部バスＣ１に出力される。ＣＰＵコア２の内部バスＣ１に出力された演算結果は、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してポインタレジスタ（この場合、Ｒｍで指定されレジスタ）に格納される。Instruction/Data Mem. Accessでは、ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりからクロック信号Ｃｌｏｃｋ２の立ち上りの期間で、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期して、アドレスバスＩＡＢに出力したアドレスのデコードが行なわれ、ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりから次のクロック信号Ｃｌｏｃｋ１の立ち上がりの期間でデータアクセスを行なう。そのためＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりからデータバスＩＤＢにデータが出力される。データバスＩＤＢに出力されたデータは、ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期してＣＰＵコア２に取り込まれ、ＣＰＵコア２の内部バスＤＷにデータが出力される。ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してＣＰＵコア２の内部バスＤＷ上のデータがディステネーションレジスタに格納されて、動作を終了する。この例では、ディステネーションレジスタはＲｎに指定したレジスタになる。Ｒｎに指定できるレジスタは、ＣＰＵコア２に含まれる任意のレジスタ（図３では、Ｒｅｇ.内の任意のレジスタ、Ａ０ｘ，Ａ１ｘ，Ｉｘ，Ａ０ｙ，Ａ１ｙ，ＩｙがＲｎとして指定可能）である。以上のようにメモリからＣＰＵコア２へのデータ読み込み動作命令では、ＩＦ，ＩＤ，ＥＸ，ＭＡ，ＷＢ/ＤＳＰの５段のパイプラインステージで命令実行動作が完了する。 In synchronization with the rising timing of the clock signal Clock 1 of the EX stage, the register data serving as the address pointer is output to the internal bus A 1 of the CPU core 2. In this example, the register serving as the address pointer is the register specified by Rm. Registers that can be specified as Rm are arbitrary registers included in the CPU core 2 (in FIG. 3, any register included in Reg., A0x, A1x, Ix, A0y, A1y, and Iy can be specified as Rm). The data output to the internal bus A1 of the CPU core 2 is stored in the address buffer 205 and output to the address bus IAB in synchronization with the rising timing of the clock signal Clock2 of the EX stage. On the other hand, the data output to the internal bus A1 of the CPU core 2 is operated by an arithmetic logic unit (ALU) 213. In this case, the arithmetic logic unit (ALU) 213 performs 0 addition operation. The result is output to the internal bus C1 of the CPU core 2. The calculation result output to the internal bus C1 of the CPU core 2 is stored in a pointer register (in this case, a register designated by Rm) in synchronization with the rising timing of the clock signal Clock2 of the EX stage. In Instruction / Data Mem. Access, the address output to the address bus IAB is decoded in synchronization with the rising timing of the EX stage clock signal Clock 2 in the period from the rising edge of the clock signal Clock 1 of the MA stage to the rising edge of the clock signal Clock 2. The data access is performed in the period from the rise of the clock signal Clock2 of the MA stage to the rise of the next clock signal Clock1. Therefore, data is output to the data bus IDB from the rise of the clock signal Clock2 of the MA stage. The data output to the data bus IDB is taken into the CPU core 2 in synchronization with the rising timing of the clock signal Clock 1 of the WB / DSP stage, and the data is output to the internal bus DW of the CPU core 2. The data on the internal bus DW of the CPU core 2 is stored in the destination register in synchronization with the rising timing of the clock signal Clock2 of the WB / DSP stage, and the operation ends. In this example, the destination register is a register designated as Rn. Registers that can be designated as Rn are any registers included in the CPU core 2 (in FIG. 3, any register in Reg., A0x, A1x, Ix, A0y, A1y, and Iy can be designated as Rn). As described above, in the data read operation instruction from the memory to the CPU core 2, the instruction execution operation is completed in the five pipeline stages of IF, ID, EX, MA, and WB / DSP.

図９はＣＰＵコア２からメモリへのデータ書込み動作命令のタイムチャートを示す。ＣＰＵコア２からメモリへのデータ書込み動作命令の一例として、ＭＯＶ.ＬＲｍ, ＠Ｒｎを例にとって動作を説明する。命令フェッチ（ＩＦ）、命令デコード（ＩＤ）の動作は図８と同じなので、その部分の詳細な説明は省略する。 FIG. 9 shows a time chart of a data write operation instruction from the CPU core 2 to the memory. The operation will be described by taking MOV.L Rm, @Rn as an example of a data write operation command from the CPU core 2 to the memory. Since the instruction fetch (IF) and instruction decode (ID) operations are the same as those in FIG. 8, a detailed description thereof will be omitted.

ＥＸステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、アドレスポインタとなるレジスタのデータがＣＰＵコア２の内部バスＡ１に出力される。この例では、アドレスポインタとなるレジスタは、Ｒｎで指定したレジスタになる。Ｒｎに指定できるレジスタは、ＣＰＵコア２に含まれる任意のレジスタ（図３では、Ｒｅｇ.内の任意のレジスタ，Ａ０ｘ，Ａ１ｘ，Ｉｘ，Ａ０ｙ，Ａ１ｙ，ＩｙがＲｎとして指定可能）である。ＣＰＵコア２の内部バスＡ１に出力されたデータは、アドレスバッファ２０５に格納され、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してアドレスバスＩＡＢに出力される。一方ＣＰＵコア２の内部バスＡ１に出力されたデータは算術論理演算器（ＡＬＵ）２１３で演算が行われる。この場合、算術論理演算器（ＡＬＵ）２１３は０加算演算を行なう。その演算結果はＣＰＵコア２の内部バスＣ１に出力される。ＣＰＵコア２の内部バスＣ１に出力された演算結果は、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してポインタレジスタ（この場合、Ｒｎで指定したレジスタ）に格納される。命令ＭＯＶ.ＬＲｍ, ＠Ｒｎの場合、ＥＸステージでアドレス演算を行なうと同時に、メモリへ書き込むべきデータをデータバスＩＤＢに出力する準備が行われる。ＥＸステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、メモリへ書き込むべきデータが格納されているレジスタよりＣＰＵコア２の内部バスＤＲへ値が出力される。この例の場合、メモリへ書き込むべきデータが格納されているレジスタは、Ｒｍで指定したレジスタになる。Ｒｍに指定できるレジスタは、ＣＰＵコア２に含まれる任意のレジスタ（図３では、Ｒｅｇ.内の任意のレジスタ，Ａ０ｘ，Ａ１ｘ，Ｉｘ，Ａ０ｙ，Ａ１ｙ，ＩｙがＲｍとして指定可能）である。ＣＰＵコア２の内部バスＤＲへ出力された値は、ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してデータバスＩＤＢに出力される。Instruction/Data Mem. Accessでは、ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりからクロック信号Ｃｌｏｃｋ２の立ち上りの期間で、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してアドレスバスＩＡＢに出力されたアドレスのデコードが行なわれ、ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してデータバスＩＤＢに出力されたデータが書込まれて、動作を終了する。メモリからＣＰＵコア２へのデータ書込み動作命令では、ＣＰＵコア２としてはデータバスＩＤＢにデータを出力した時点で動作が終了するので、ＩＦ，ＩＤ，ＥＸ，ＭＡの４段のパイプラインステージで動作が完了する。 Data in a register serving as an address pointer is output to the internal bus A1 of the CPU core 2 in synchronization with the rising timing of the clock signal Clock1 of the EX stage. In this example, the register serving as the address pointer is the register specified by Rn. Registers that can be designated as Rn are any registers included in the CPU core 2 (in FIG. 3, any register in Reg., A0x, A1x, Ix, A0y, A1y, and Iy can be designated as Rn). The data output to the internal bus A1 of the CPU core 2 is stored in the address buffer 205 and output to the address bus IAB in synchronization with the rising timing of the clock signal Clock2 of the EX stage. On the other hand, the data output to the internal bus A1 of the CPU core 2 is operated by an arithmetic logic unit (ALU) 213. In this case, the arithmetic logic unit (ALU) 213 performs 0 addition operation. The calculation result is output to the internal bus C1 of the CPU core 2. The calculation result output to the internal bus C1 of the CPU core 2 is stored in a pointer register (in this case, a register designated by Rn) in synchronization with the rising timing of the clock signal Clock2 of the EX stage. In the case of the instruction MOV.L Rm, @Rn, an address calculation is performed at the EX stage, and at the same time, preparation for outputting data to be written to the memory to the data bus IDB is performed. A value is output to the internal bus DR of the CPU core 2 from a register storing data to be written to the memory in synchronization with the rising timing of the clock signal Clock1 of the EX stage. In this example, the register storing data to be written to the memory is a register specified by Rm. Registers that can be designated as Rm are any registers included in the CPU core 2 (in FIG. 3, any register in Reg., A0x, A1x, Ix, A0y, A1y, and Iy can be designated as Rm). The value output to the internal bus DR of the CPU core 2 is output to the data bus IDB in synchronization with the rising timing of the clock signal Clock2 of the MA stage. In the Instruction / Data Mem. Access, the address output to the address bus IAB is decoded in synchronization with the rising timing of the EX stage clock signal Clock 2 in the period from the rising edge of the clock signal Clock 1 of the MA stage to the rising edge of the clock signal Clock 2. The data output to the data bus IDB is written in synchronization with the rising timing of the clock signal Clock2 of the MA stage, and the operation is terminated. In the data write operation command from the memory to the CPU core 2, the operation ends when the data is output to the data bus IDB as the CPU core 2, so that it operates in four pipeline stages of IF, ID, EX, and MA. Is completed.

図１０はＤＳＰ命令を実行するときのタイムチャートを示す。ＤＳＰ命令の一例として、ＰＡＤＤＣＳｘ, Ｓｙ, ＤｚＮＯＰＸＮＯＰＹを例にとって動作説明を行う。この命令は、ＤＳＰエンジン３内のレジスタに格納されているデータの加算を行ない、ＤＳＰエンジン３とＸ-ＲＯＭ４やＸ-ＲＡＭ６、及びＹ-ＲＯＭ５やＹ-ＲＡＭ７との間でのデータ転送は行なわないという命令である。 FIG. 10 shows a time chart when the DSP instruction is executed. As an example of the DSP instruction, the operation will be described using PADDC Sx, Sy, Dz NOPX NOPY as an example. This instruction adds the data stored in the register in the DSP engine 3 and transfers data between the DSP engine 3 and the X-ROM 4 and X-RAM 6 and the Y-ROM 5 and Y-RAM 7. There is no order.

命令フェッチの動作は図７と同じなのでその部分の詳細な説明は省略する。ＩＤステージでは、クロック信号Ｃｌｏｃｋ１からクロック信号Ｃｌｏｃｋ２の期間でＣＰＵコア２で取り込んだ命令コードのデコードが行なわれ、ＩＤステージのクロック信号Ｃｌｏｃｋ２のタイミングで命令コードをデコードした結果がＤＳＰ制御信号２０としてＤＳＰエンジン３に出力される。ＤＳＰエンジン３では、ＣＰＵコア２よりＤＳＰ制御信号２０を入力すると、ＭＡステージまでの期間で入力したＤＳＰ制御信号２０をデコードする。ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、ソースデータが格納されているレジスタがアクセスされ、ＤＳＰエンジン３の内部バスＡ２，Ｂ２にレジスタの値が出力される。この例では、ソースデータが格納されているレジスタは、ＳｘおよびＳｙで指定したレジスタになる。ＳｘおよびＳｙに指定できるレジスタは、ＤＳＰエンジン３内部の任意のレジスタ（図４では、Ｒｅｇ.内の任意のレジスタがＳｘおよびＳｙとして指定可能）である。ＤＳＰエンジン３の内部バスＡ２，Ｂ２に出力されたデータは算術論理演算器（ＡＬＵ）３０２で演算が行なわれ、その結果はＤＳＰエンジン３の内部バスＣ２に出力される。ＤＳＰエンジン３の内部バスＣ２に出力された演算結果は、ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してディステネーションレジスタに格納される。この例では、ディステネーションレジスタは、Ｄｚで指定されたレジスタになる。Ｄｚに指定できるレジスタは、ＤＳＰエンジン３内部の任意のレジスタ（図４では、Ｒｅｇ.内の任意のレジスタ）である。以上のようなＤＳＰ命令では、ＩＦ，ＩＤ，ＥＸ，ＭＡ，ＷＢ/ＤＳＰの５段のパイプラインステージで動作が完了する。 Since the instruction fetch operation is the same as in FIG. 7, a detailed description thereof is omitted. In the ID stage, the instruction code fetched by the CPU core 2 is decoded during the period from the clock signal Clock 1 to the clock signal Clock 2, and the result of decoding the instruction code at the timing of the clock signal Clock 2 of the ID stage is the DSP control signal 20 as a DSP. It is output to the engine 3. In the DSP engine 3, when the DSP control signal 20 is input from the CPU core 2, the DSP control signal 20 input in the period up to the MA stage is decoded. In synchronization with the rising timing of the clock signal Clock 1 of the WB / DSP stage, the register storing the source data is accessed, and the value of the register is output to the internal buses A 2 and B 2 of the DSP engine 3. In this example, the register storing the source data is a register specified by Sx and Sy. Registers that can be designated as Sx and Sy are arbitrary registers within the DSP engine 3 (in FIG. 4, any register in Reg. Can be designated as Sx and Sy). The data output to the internal buses A 2 and B 2 of the DSP engine 3 is operated by an arithmetic logic unit (ALU) 302, and the result is output to the internal bus C 2 of the DSP engine 3. The calculation result output to the internal bus C2 of the DSP engine 3 is stored in the destination register in synchronization with the rising timing of the clock signal Clock2 of the WB / DSP stage. In this example, the destination register is a register specified by Dz. A register that can be designated as Dz is an arbitrary register in the DSP engine 3 (an arbitrary register in Reg. In FIG. 4). With the DSP instruction as described above, the operation is completed in five pipeline stages of IF, ID, EX, MA, and WB / DSP.

図１１はＸ，Ｙメモリ４〜７からＤＳＰエンジン３へのデータ読み込み動作命令のタイムチャートを示す。Ｘ，Ｙメモリ４〜７からＤＳＰエンジン３へのデータ読み込み動作命令の一例として、ＭＯＶＸ.Ｗ＠Ａｘ, ＤｘＭＯＶＹ.Ｗ＠Ａｙ, Ｄｙを例にとってその動作を説明する。この命令は、ＡｘおよびＡｙで指定したアドレスに格納されているデータをＤｘおよびＤｙで指定したレジスタに転送するという命令である。命令フェッチ、命令デコードの動作は図１０と同じなのでその部分の詳細な説明は省略する。 FIG. 11 shows a time chart of data read operation commands from the X and Y memories 4 to 7 to the DSP engine 3. As an example of a data read operation command from the X, Y memories 4 to 7 to the DSP engine 3, the operation will be described by taking MOVX.W @Ax, Dx MOVY.W @Ay, Dy as an example. This instruction is an instruction to transfer data stored at an address designated by Ax and Ay to a register designated by Dx and Dy. Since the instruction fetch and instruction decode operations are the same as those in FIG. 10, a detailed description thereof will be omitted.

Ｘ，Ｙメモリ４〜７からＤＳＰエンジン３へのデータ読み込み動作命令を実行する場合、アクセスするメモリのアドレスはＣＰＵコア２が生成する。そのためＥＸステージにおけるクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、アクセスすべきアドレスが格納されているレジスタがアクセスされ、ＣＰＵコア２の内部バスＡ１〜Ａ２にレジスタの値が出力される。この例では、アクセスすべきアドレスが格納されているレジスタは、Ａｘ，Ａｙで指定したレジスタになる。Ａｘに指定できるレジスタはＣＰＵコア２に含まれるレジスタＡ０ｘ，Ａ１ｘであり、Ａｙに指定できるレジスタはＣＰＵコア２に含まれるレジスタＡ０ｙ，Ａ１ｙである。ＣＰＵコア２の内部バスＡ１〜Ａ２に出力されたデータは、メモリアドレスバッファ（ＭＡＢＸ，ＭＡＢＹ）に格納され、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してアドレスバスＸＡＢ，ＹＡＢに出力される。一方ＣＰＵコア２の内部バスＡ１〜Ａ２に出力されたデータはＡＬＵ２１３，ＰＡＵ２１２でアドレス演算が行なわれる。この場合、ＡＬＵ２１３およびＰＡＵ２１２は０加算演算を行なう。その演算結果はＣＰＵコア２の内部バスＣ１及びＣ２に出力される。ＣＰＵコア２の内部バスＣ１及びＣ２に出力された演算結果は、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してポインタレジスタ（この場合、ＡｘおよびＡｙで指定したレジスタ）に格納される。Ｘ，Ｙメモリ４〜７では、ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりからクロック信号Ｃｌｏｃｋ２の立ち上りの期間で、ＥＸステージクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングでアドレスバスＸＡＢ，ＹＡＢに出力されたアドレスのデコードが行なわれ、ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりから次のクロック信号Ｃｌｏｃｋ１の立ち上がりの期間でデータアクセスが行なわれる。そのためＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりからデータバスＸＤＢ，ＹＤＢにデータが出力される。データバスＸＤＢ，ＹＤＢに出力されたデータは、ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期してＤＳＰエンジン３に取り込まれ、ＤＳＰエンジン３の内部バスＤ１，Ｄ２にデータが供給される。ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してＤＳＰエンジン３の内部バスＤ１，Ｄ２上のデータがディステネーションレジスタに格納されて、動作を終了する。この例では、ディステネーションレジスタはＤｘおよびＤｙに指定したレジスタになる。Ｄｘに指定できるレジスタは、ＤＳＰエンジン３に含まれるレジスタＸ０，Ｘ１であり、Ｄｙに指定できるレジスタは、ＤＳＰエンジン３に含まれるレジスタＹ０，Ｙ１である。以上のようにメモリからＤＳＰエンジン３へのデータ読み込み動作命令では、ＩＦ，ＩＤ，ＥＸ，ＭＡ，ＷＢ/ＤＳＰの５段のパイプラインステージで動作が完了する。斯る並列的なデータ読込み動作は、相互に独立したバスＸＡＢ，ＸＤＢとＹＡＢ，ＹＤＢとを介してＣＰＵコア２がＸ，Ｙメモリ４〜７をアクセスできるようになっているからである。 When a data read operation instruction from the X, Y memories 4 to 7 to the DSP engine 3 is executed, the CPU core 2 generates an address of the memory to be accessed. Therefore, in synchronization with the rising timing of the clock signal Clock 1 in the EX stage, the register storing the address to be accessed is accessed, and the value of the register is output to the internal buses A 1 to A 2 of the CPU core 2. In this example, the register storing the address to be accessed is a register designated by Ax and Ay. Registers that can be specified as Ax are registers A0x and A1x included in the CPU core 2, and registers that can be specified as Ay are registers A0y and A1y included in the CPU core 2. Data output to the internal buses A1 to A2 of the CPU core 2 is stored in the memory address buffer (MABX, MABY) and output to the address buses XAB, YAB in synchronization with the rising timing of the clock signal Clock2 of the EX stage. The On the other hand, the data output to the internal buses A1 to A2 of the CPU core 2 is subjected to address calculation by the ALU 213 and PAU 212. In this case, the ALU 213 and the PAU 212 perform 0 addition operation. The calculation result is output to the internal buses C1 and C2 of the CPU core 2. The calculation results output to the internal buses C1 and C2 of the CPU core 2 are stored in a pointer register (in this case, registers designated by Ax and Ay) in synchronization with the rising timing of the clock signal Clock2 of the EX stage. In the X and Y memories 4 to 7, the addresses output to the address buses XAB and YAB are decoded at the rising timing of the EX stage clock signal Clock2 in the period from the rising edge of the clock signal Clock1 of the MA stage to the rising edge of the clock signal Clock2. The data access is performed during the period from the rise of the clock signal Clock2 of the MA stage to the rise of the next clock signal Clock1. Therefore, data is output to the data buses XDB and YDB from the rise of the clock signal Clock2 of the MA stage. The data output to the data buses XDB and YDB is taken into the DSP engine 3 in synchronization with the rising timing of the clock signal Clock 1 of the WB / DSP stage, and the data is supplied to the internal buses D 1 and D 2 of the DSP engine 3. . Data on the internal buses D1 and D2 of the DSP engine 3 are stored in the destination register in synchronization with the rising timing of the clock signal Clock2 of the WB / DSP stage, and the operation ends. In this example, the destination register is a register designated as Dx and Dy. Registers that can be specified as Dx are registers X0 and X1 included in the DSP engine 3, and registers that can be specified as Dy are registers Y0 and Y1 included in the DSP engine 3. As described above, in the data read operation instruction from the memory to the DSP engine 3, the operation is completed in five pipeline stages of IF, ID, EX, MA, and WB / DSP. This is because such a parallel data reading operation allows the CPU core 2 to access the X and Y memories 4 to 7 via the mutually independent buses XAB and XDB and YAB and YDB.

図１２はＤＳＰエンジン３からＸ，Ｙメモリ６，７へのデータ書込み動作のタイムチャートを示す。ＤＳＰエンジン３からＸ，Ｙメモリ６，７へのデータ書込み動作命令の一例として、ＭＯＶＸ.ＷＤａ, ＠ＡｘＭＯＶＹ.ＷＤａ, ＠Ａｙを例にとってその動作を説明をする。この命令は、Ｄａで指定したレジスタに格納されているデータをＡｘおよびＡｙで指定したレジスタに格納されているアドレスに転送するという命令である。 FIG. 12 shows a time chart of a data write operation from the DSP engine 3 to the X and Y memories 6 and 7. As an example of a data write operation instruction from the DSP engine 3 to the X, Y memories 6 and 7, the operation will be described by taking MOVX.W Da, @Ax MOVY.W Da, @Ay as an example. This instruction is an instruction to transfer the data stored in the register designated by Da to the address stored in the register designated by Ax and Ay.

命令フェッチ、命令デコードの動作は図１１と同じなのでその部分の詳細な説明は省略する。ＤＳＰエンジン３からＸ，Ｙメモリ６，７へのデータ書込み動作命令を実行する場合、アクセスされるべきメモリアドレスはＣＰＵコア２が生成する。そのためＥＸステージにおけるクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、アクセスすべきアドレスが格納されているレジスタがアクセスされ、ＣＰＵコア２の内部バスＡ１〜Ａ２にレジスタの値が出力される。この例では、アクセスすべきアドレスが格納されているレジスタは、Ａｘ，Ａｙで指定したレジスタになる。Ａｘに指定できるレジスタはＣＰＵコア２に含まれるレジスタＡ０ｘ，Ａ１ｘであり、Ａｙに指定できるレジスタはＣＰＵコア２に含まれるレジスタＡ０ｙ，Ａ１ｙである。ＣＰＵコア２の内部バスＡ１，Ａ２に出力されたデータは、メモリアドレスバッファ（ＭＡＢＸ，ＭＡＢＹ）に格納され、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してアドレスバスＸＡＢ，ＹＡＢに出力される。 Since the instruction fetch and instruction decode operations are the same as those in FIG. 11, detailed description thereof will be omitted. When executing a data write operation instruction from the DSP engine 3 to the X and Y memories 6 and 7, the CPU core 2 generates a memory address to be accessed. Therefore, in synchronization with the rising timing of the clock signal Clock 1 in the EX stage, the register storing the address to be accessed is accessed, and the value of the register is output to the internal buses A 1 to A 2 of the CPU core 2. In this example, the register storing the address to be accessed is a register designated by Ax and Ay. Registers that can be specified as Ax are registers A0x and A1x included in the CPU core 2, and registers that can be specified as Ay are registers A0y and A1y included in the CPU core 2. The data output to the internal buses A1 and A2 of the CPU core 2 is stored in the memory address buffer (MABX, MABY) and output to the address buses XAB and YAB in synchronization with the rising timing of the clock signal Clock2 of the EX stage. The

ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、転送されるべきデータが格納されているＤＳＰエンジン３の内部レジスタがアクセスされ、ＤＳＰエンジン３の内部バスＤ１，Ｄ２に当該レジスタの値が出力され、それらがメモリデータバッファ（ＭＤＢＸ，ＭＤＢＹ）に格納される。この例の場合、転送されるべきデータが格納されているＤＳＰエンジン３の内部レジスタはＤａで指定されたレジスタになる。Ｄａで指定できるレジスタは、ＤＳＰエンジン３に含まれるレジスタＡ０及びＡ１である。ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期して、メモリデータバッファ（ＭＤＢＸ，ＭＤＢＹ）に格納されたデータはデータバスＸＤＢ，ＹＤＢに出力される。Ｘ，Ｙメモリ６，７では、ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりからクロック信号Ｃｌｏｃｋ２の立ち上りの期間で、ＥＸステージクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングでアドレスバスＸＡＢ，ＹＡＢに出力されたアドレスのデコードが行なわれ、ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりから次のクロック信号Ｃｌｏｃｋ１の立ち上がりの期間でデータアクセスが行なわれる。そのため、データバスＸＤＢ，ＹＤＢに出力されたデータはＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりから書込まれる。以上のようにＤＳＰエンジン３からＸ，Ｙメモリ６，７へのデータ書込み動作命令では、ＩＦ，ＩＤ，ＥＸ，ＭＡの４段のパイプラインステージで動作が完了する。斯る並列的なデータ書込み動作は、相互に独立したバスＸＡＢ，ＸＤＢとＴＡＢ，ＹＤＢとを介してＣＰＵコア２がＸ，Ｙメモリ４，６をアクセスできるようになっているからである。 In synchronization with the rise timing of the clock signal Clock 1 of the MA stage, the internal register of the DSP engine 3 storing the data to be transferred is accessed, and the value of the register is stored in the internal buses D 1 and D 2 of the DSP engine 3. They are output and stored in memory data buffers (MDBX, MDBY). In this example, the internal register of the DSP engine 3 in which data to be transferred is stored is a register designated by Da. Registers that can be designated by Da are the registers A0 and A1 included in the DSP engine 3. The data stored in the memory data buffers (MDBX, MDBY) are output to the data buses XDB, YDB in synchronization with the rising timing of the clock signal Clock2 of the MA stage. In the X and Y memories 6 and 7, the addresses output to the address buses XAB and YAB are decoded at the rising timing of the EX stage clock signal Clock2 during the rising period of the clock signal Clock2 from the rising edge of the clock signal Clock1 of the MA stage. The data access is performed during the period from the rise of the clock signal Clock2 of the MA stage to the rise of the next clock signal Clock1. Therefore, the data output to the data buses XDB and YDB is written from the rising edge of the clock signal Clock2 of the MA stage. As described above, in the data write operation instruction from the DSP engine 3 to the X and Y memories 6 and 7, the operation is completed in four pipeline stages of IF, ID, EX, and MA. This is because such parallel data writing operation allows the CPU core 2 to access the X and Y memories 4 and 6 via the mutually independent buses XAB and XDB and TAB and YDB.

図１３はメモリからＤＳＰエンジン３へのデータ読み込み動作のタイムチャートを示す。メモリからＤＳＰエンジン３へのデータ読み込み動作命令の一例として、ＭＯＶＳ.Ｌ＠Ａｓ, Ｄｓを例にとってその動作を説明をする。この命令は、Ａｓで指定したアドレスに格納されているデータをＤｓで指定したレジスタに転送するという命令である。 FIG. 13 shows a time chart of the data reading operation from the memory to the DSP engine 3. As an example of a data read operation command from the memory to the DSP engine 3, the operation will be described by taking MOVS.L @As, Ds as an example. This instruction is an instruction to transfer data stored at an address designated by As to a register designated by Ds.

基本動作は、図１１に示したＸ，Ｙメモリ４〜７からＤＳＰエンジン３へのデータ読み込み動作と同じである。図１１と図１３の違いは、図１１では対象となるメモリがＸ，Ｙメモリ４〜７なのでＸバス，Ｙバスを使用するのに対し、図１３では対象となるメモリはマイクロコンピュータ１がサポートする空間に接続されているメモリなので、バスＩＡＢ，ＩＤＢを使用するということである。ＥＸステージクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、アクセスすべきアドレスを保有しているレジスタがアクセスされ、ＣＰＵコア２の内部バスＡ１にレジスタの値が出力される。この例では、アクセスすべきアドレスが格納されているレジスタは、Ａｓで指定したレジスタになる。Ａｓで指定可能なレジスタはＣＰＵコア２に含まれるＲｅｇ.内の任意のレジスタである。ＣＰＵコア２の内部バスＡ１に出力されたデータは、アドレスバッファ２０５に格納され、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してアドレスバスＩＡＢに出力される。一方ＣＰＵコア２の内部バスＡ１に出力されたデータは算術論理演算器（ＡＬＵ）２１３でアドレス演算が行なわれる。この場合、算術論理演算器（ＡＬＵ）２１３は０加算演算を行なう。その演算結果はＣＰＵコア２の内部バスＣ１に出力される。ＣＰＵコア２の内部バスＣ１に出力された演算結果は、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してポインタレジスタ（この場合、Ａｓで指定したレジスタ）に格納される。アクセス対象となるメモリでは、ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりからクロック信号Ｃｌｏｃｋ２の立ち上りの期間で、ＥＸステージクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングでアドレスバスＩＡＢに出力されたアドレスのデコードが行なわれ、ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりから次のクロック信号Ｃｌｏｃｋ１の立ち上がりの期間でデータアクセスが行なわれる。そのためＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりからデータバスＩＤＢにデータが出力される。データバスＩＤＢに出力されたデータは、ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期してＤＳＰエンジン３に取り込まれ、当該データがＤＳＰエンジン３の内部バスＤ１に供給される。ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してＤＳＰエンジン３の内部バスＤ１上のデータがディステネーションレジスタに格納されて、動作を終了する。この例では、ディステネーションレジスタはＤｓに指定したレジスタになる。Ｄｓに指定できるレジスタは、ＤＳＰエンジン３内の任意のレジスタである。以上のようにメモリからＤＳＰエンジン３へのデータ読み込み動作命令では、ＩＦ，ＩＤ，ＥＸ，ＭＡ，ＷＢ/ＤＳＰの５段のパイプラインステージで動作が完了する。 The basic operation is the same as the data reading operation from the X, Y memories 4 to 7 to the DSP engine 3 shown in FIG. The difference between FIG. 11 and FIG. 13 is that the target memory is X and Y memories 4 to 7 in FIG. 11 and the X bus and Y bus are used, whereas in FIG. 13, the target memory is supported by the microcomputer 1. This means that the buses IAB and IDB are used because the memory is connected to the space. In synchronization with the rising timing of the EX stage clock signal Clock 1, the register having the address to be accessed is accessed, and the value of the register is output to the internal bus A 1 of the CPU core 2. In this example, the register storing the address to be accessed is the register specified by As. A register that can be designated by As is an arbitrary register in Reg. Included in the CPU core 2. The data output to the internal bus A1 of the CPU core 2 is stored in the address buffer 205 and output to the address bus IAB in synchronization with the rising timing of the clock signal Clock2 of the EX stage. On the other hand, the data output to the internal bus A1 of the CPU core 2 is subjected to an address operation by an arithmetic logic unit (ALU) 213. In this case, the arithmetic logic unit (ALU) 213 performs 0 addition operation. The calculation result is output to the internal bus C1 of the CPU core 2. The calculation result output to the internal bus C1 of the CPU core 2 is stored in a pointer register (in this case, a register designated by As) in synchronization with the rising timing of the clock signal Clock2 of the EX stage. In the memory to be accessed, the address output to the address bus IAB is decoded at the rising timing of the EX stage clock signal Clock2 during the rising period of the clock signal Clock2 from the rising edge of the clock signal Clock1 of the MA stage. Data access is performed during the period from the rise of the clock signal Clock2 of the stage to the rise of the next clock signal Clock1. Therefore, data is output to the data bus IDB from the rise of the clock signal Clock2 of the MA stage. The data output to the data bus IDB is taken into the DSP engine 3 in synchronization with the rising timing of the clock signal Clock1 of the WB / DSP stage, and the data is supplied to the internal bus D1 of the DSP engine 3. Data on the internal bus D1 of the DSP engine 3 is stored in the destination register in synchronization with the rising timing of the clock signal Clock2 of the WB / DSP stage, and the operation ends. In this example, the destination register is a register designated as Ds. A register that can be designated as Ds is an arbitrary register in the DSP engine 3. As described above, in the data read operation instruction from the memory to the DSP engine 3, the operation is completed in five pipeline stages of IF, ID, EX, MA, and WB / DSP.

図１４はＤＳＰエンジン３からメモリへのデータ書込み動作のタイムチャートを示す。ＤＳＰエンジン３からメモリへのデータ書込み動作命令の一例として、ＭＯＶＳ.ＬＤｓ, ＠Ａｓを例にとってその動作を説明する。この命令は、Ｄｓで指定したレジスタに格納されているデータをＡｓで指定したアドレスに転送するという命令である。 FIG. 14 shows a time chart of the data write operation from the DSP engine 3 to the memory. As an example of a data write operation command from the DSP engine 3 to the memory, the operation will be described by taking MOVS.L Ds, @As as an example. This instruction is an instruction to transfer the data stored in the register designated by Ds to the address designated by As.

基本動作は図１２に示したＤＳＰエンジン３からＸ，Ｙメモリへのデータ書込み動作と同じである。図１２と図１４の違いは、図１２では対象となるメモリがＸ，ＹメモリであるのでバスＸＡＢ，ＸＤＢ、バスＹＡＢ，ＹＤＢを使用するのに対し、図１４では対象となるメモリがマイクロコンピュータ１がサポートする空間に接続されているメモリなので、バスＩＡＢ，ＩＤＢを使用するということである。 The basic operation is the same as the data write operation from the DSP engine 3 to the X and Y memories shown in FIG. The difference between FIG. 12 and FIG. 14 is that in FIG. 12, since the target memory is an X, Y memory, the buses XAB, XDB, and the buses YAB, YDB are used, whereas in FIG. 14, the target memory is a microcomputer. This means that the buses IAB and IDB are used because 1 is a memory connected to a space supported by 1.

ＥＸステージクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、転送先のアドレスを保有しているレジスタがアクセスされ、ＣＰＵコア２の内部バスＡ１にレジスタの値が出力される。この例では、アクセスすべきアドレスが格納されているレジスタは、Ａｓで指定したレジスタになる。Ａｓで指定可能なレジスタはＣＰＵコア２に含まれるレジスタＲｅｇ.内の任意のレジスタである。ＣＰＵコア２の内部バスＡ１に出力されたデータは、アドレスバッファ２０５に格納され、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してアドレスバスＩＡＢに出力される。一方ＣＰＵコア２の内部バスＡ１に出力されたデータは算術論理演算器（ＡＬＵ）２１３でアドレス演算が行なわれる。この場合、算術論理演算器（ＡＬＵ）２１３は０加算演算を行なう。その演算結果はＣＰＵコア２の内部バスＣ１に出力される。ＣＰＵコア２のバスＣ１に出力された演算結果は、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してポインタレジスタ（この場合、Ａｓで指定したレジスタ）に格納される。 In synchronization with the rise timing of the EX stage clock signal Clock 1, the register holding the transfer destination address is accessed, and the value of the register is output to the internal bus A 1 of the CPU core 2. In this example, the register storing the address to be accessed is the register specified by As. The register that can be specified by As is an arbitrary register in the register Reg. Included in the CPU core 2. The data output to the internal bus A1 of the CPU core 2 is stored in the address buffer 205 and output to the address bus IAB in synchronization with the rising timing of the clock signal Clock2 of the EX stage. On the other hand, the data output to the internal bus A1 of the CPU core 2 is subjected to an address operation by an arithmetic logic unit (ALU) 213. In this case, the arithmetic logic unit (ALU) 213 performs 0 addition operation. The calculation result is output to the internal bus C1 of the CPU core 2. The calculation result output to the bus C1 of the CPU core 2 is stored in a pointer register (in this case, a register designated by As) in synchronization with the rising timing of the clock signal Clock2 of the EX stage.

ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、転送すべきデータを格納しているＤＳＰエンジン３内部のレジスタの値がＤＳＰエンジン３の内部バスＤ１に出力され、メモリデータバッファ（ＭＤＢＩ）に格納される。ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期して、メモリデータバッファ（ＭＤＢＩ）に格納されたデータがデータバスＩＤＢに出力される。この例では、転送すべきデータを格納しているＤＳＰエンジン３内部のレジスタはＤｓに指定したレジスタになる。Ｄｓに指定できるレジスタは、ＤＳＰエンジン３内の任意のレジスタである。アクセス対象となるメモリでは、ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりからクロック信号Ｃｌｏｃｋ２の立ち上りの期間で、ＥＸステージクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングでアドレスバスＩＡＢに出力したアドレスのデコードが行なわれ、ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりから次のクロック信号Ｃｌｏｃｋ１の立ち上がりの期間でデータアクセスが行なわれる。そのためＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりタイミングで、ＤＳＰエンジン３より出力されたデータがメモリに書込まれる。以上のようにＤＳＰエンジン３から外部メモリへのデータ書込み動作命令では、ＩＦ，ＩＤ，ＥＸ，ＭＡの４段のパイプラインステージで動作が完了する。 In synchronism with the rising timing of the clock signal Clock1 of the MA stage, the value of the register in the DSP engine 3 storing the data to be transferred is output to the internal bus D1 of the DSP engine 3, and the memory data buffer (MDBI) Stored in Data stored in the memory data buffer (MDBI) is output to the data bus IDB in synchronization with the rising timing of the clock signal Clock2 of the MA stage. In this example, a register in the DSP engine 3 storing data to be transferred is a register designated by Ds. A register that can be designated as Ds is an arbitrary register in the DSP engine 3. In the memory to be accessed, the address output to the address bus IAB is decoded at the rising timing of the EX stage clock signal Clock2 during the rising period of the clock signal Clock2 from the rising edge of the clock signal Clock1 of the MA stage, and the MA stage. Data access is performed during the period from the rising edge of the clock signal Clock2 to the rising edge of the next clock signal Clock1. Therefore, the data output from the DSP engine 3 is written into the memory at the rising timing of the clock signal Clock2 of the MA stage. As described above, in the data write operation instruction from the DSP engine 3 to the external memory, the operation is completed in four pipeline stages of IF, ID, EX, and MA.

次に、ＤＳＰ演算命令の一例として、ＰＡＤＤＳｘ, Ｓｙ, Ｄu ＰＭＵＬＳe, Ｓf, Ｄg ＭＯＶＸ.Ｗ＠Ａｘ, ＤｘＭＯＶＹ.Ｗ＠Ａｙ, Ｄｙを例にとり、図１５を用いてその動作説明をする。この命令は、ＤＳＰエンジン３内のレジスタに格納されているデータの加算、乗算を行ない、Ｘ-ＲＯＭ４やＸ-ＲＡＭ６及びＹ-ＲＯＭ５やＹ-ＲＡＭ７からＤＳＰエンジン３へのデータ転送を行なうという命令であり、図１０と図１１の動作を合わせた動作である。命令フェッチ、命令デコードの動作は図１０と同じなのでその部分の詳細な説明は省略する。 Next, as an example of the DSP operation instruction, PADD Sx, Sy, Du PMUL Se, Sf, Dg MOVX.W @Ax, Dx MOVY.W @Ay, Dy will be described as an example and the operation will be described with reference to FIG. . This instruction adds and multiplies data stored in a register in the DSP engine 3 and transfers data from the X-ROM 4, X-RAM 6, Y-ROM 5, and Y-RAM 7 to the DSP engine 3. The operation is a combination of the operations shown in FIGS. Since the instruction fetch and instruction decode operations are the same as those in FIG. 10, a detailed description thereof will be omitted.

Ｘ，ＹメモリからＤＳＰエンジン３へのデータ読み込み動作命令を実行する場合、アクセスすべきメモリのアドレスはＣＰＵコア２が生成する。そのためＥＸステージにおけるクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、アクセスすべきアドレスを保有するレジスタがアクセスされ、ＣＰＵコア２の内部バスＡ１，Ａ２にレジスタの値が出力される。この例では、アクセスすべきアドレスが格納されているレジスタは、Ａｘ，Ａｙで指定したレジスタになる。Ａｘに指定できるレジスタはＣＰＵコア２に含まれるレジスタＡ０ｘ，Ａ１ｘであり，Ａｙに指定できるレジスタはＣＰＵコア２に含まれるレジスタＡ０ｙ，Ａ１ｙである。ＣＰＵコア２の内部バスＡ１，Ａ２に出力されたデータは、メモリアドレスバッファ（ＭＡＢＸ，ＭＡＢＹ）に格納され、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してアドレスバスＸＡＢ，ＹＡＢに出力される。一方ＣＰＵ内部バスＡ１，Ａ２に出力されたデータはＡＬＵ２１３，ＰＡＵ２１２でアドレス演算が行なわれ（この場合、ＡＬＵ２１３およびＰＡＵ２１２は０加算演算を行なう）、その結果はＣＰＵコア２の内部バスＣ１及びＣ２に出力される。ＣＰＵコア２の内部バスＣ１及びＣ２に出力された演算結果は、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してポインタレジスタ（この場合、ＡｘおよびＡｙで指定したレジスタ）に格納される。Ｘ，Ｙメモリでは、ＭＡステージのクロック信号Ｃｌｏｃｋ１の立ち上がりからクロック信号Ｃｌｏｃｋ２の立ち上りの期間で、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングでアドレスバスＸＡＢ，ＹＡＢに出力されたアドレスのデコードが行なわれ、ＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりから次のクロック信号Ｃｌｏｃｋ１の立ち上がりの期間でデータアクセスが行なわれる。そのためＭＡステージのクロック信号Ｃｌｏｃｋ２の立ち上がりからデータバスＸＤＢ，ＹＤＢにデータが出力される。データバスＸＤＢ，ＹＤＢに出力されたデータは、ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期してＤＳＰエンジン３に取り込まれ、ＤＳＰエンジン３の内部バスＤ１，Ｄ２にデータが出力される。ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してＤＳＰエンジン３の内部バスＤ１，Ｄ２上のデータがディステネーションレジスタ（Destination Reg.）に格納されて、動作を終了する。この例では、ディステネーションレジスタはＤｘおよびＤｙに指定したレジスタになる。Ｄｘに指定できるレジスタは、ＤＳＰエンジン３内のＸ０，Ｘ１、Ｄｙに指定できるレジスタは、ＤＳＰエンジン３内のＹ０，Ｙ１である。 When a data read operation instruction from the X, Y memory to the DSP engine 3 is executed, the CPU core 2 generates an address of the memory to be accessed. Therefore, in synchronization with the rising timing of the clock signal Clock 1 in the EX stage, the register having the address to be accessed is accessed, and the value of the register is output to the internal buses A 1 and A 2 of the CPU core 2. In this example, the register storing the address to be accessed is a register designated by Ax and Ay. Registers that can be specified as Ax are registers A0x and A1x included in the CPU core 2, and registers that can be specified as Ay are registers A0y and A1y included in the CPU core 2. The data output to the internal buses A1 and A2 of the CPU core 2 is stored in the memory address buffer (MABX, MABY) and output to the address buses XAB and YAB in synchronization with the rising timing of the clock signal Clock2 of the EX stage. The On the other hand, the data output to the CPU internal buses A1 and A2 is subjected to an address operation in the ALU 213 and PAU 212 (in this case, the ALU 213 and PAU 212 perform 0 addition operation), and the result is sent to the internal buses C1 and C2 of the CPU core 2. Is output. The calculation results output to the internal buses C1 and C2 of the CPU core 2 are stored in a pointer register (in this case, registers designated by Ax and Ay) in synchronization with the rising timing of the clock signal Clock2 of the EX stage. In the X and Y memories, the addresses output to the address buses XAB and YAB are decoded at the timing of the rise of the clock signal Clock2 of the EX stage during the period of rise of the clock signal Clock2 from the rise of the clock signal Clock1 of the MA stage. Data access is performed in the period from the rise of the clock signal Clock2 of the MA stage to the rise of the next clock signal Clock1. Therefore, data is output to the data buses XDB and YDB from the rise of the clock signal Clock2 of the MA stage. The data output to the data buses XDB and YDB is taken into the DSP engine 3 in synchronization with the rising timing of the clock signal Clock1 of the WB / DSP stage, and the data is output to the internal buses D1 and D2 of the DSP engine 3. . Data on the internal buses D1 and D2 of the DSP engine 3 are stored in the destination register (Destination Reg.) In synchronization with the rising timing of the clock signal Clock2 of the WB / DSP stage, and the operation ends. In this example, the destination register is a register designated as Dx and Dy. Registers that can be specified for Dx are X0 and X1 in the DSP engine 3, and registers that can be specified for Dy are Y0 and Y1 in the DSP engine 3.

上記データ転送に並行して、ＤＳＰ演算動作も同時に行なわれる。ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ１の立ち上がりのタイミングに同期して、ソースデータが格納されているレジスタがアクセスされ、ＤＳＰエンジン３の内部バスＡ１，Ａ２、Ｂ１，Ｂ２にレジスタの値が出力される。この例では、ソースデータが格納されているレジスタは、ＡＤＤ（加算）動作の場合はＳｘおよびＳｙで指定したレジスタになり、ＭＵＬ（乗算）動作の場合はＳeおよびＳfで指定したレジスタになる。Ｓｘ，Ｓｙ，Ｓe及びＳfに指定できるレジスタは、ＤＳＰエンジン３内部の任意のレジスタである。ＤＳＰエンジン３の内部バスＡ１，Ｂ１に出力されたデータはＭＡＣ３０４で乗算演算が行なわれ、その結果はＤＳＰエンジン３内部バスＣ１に出力される。ＤＳＰエンジン３の内部バスＡ２，Ｂ２に出力されたデータはＡＬＵ３０２で加算演算が行なわれ、その結果はＤＳＰエンジン３内部バスＣ２に出力される。ＤＳＰエンジン３の内部バスＣ１およびＣ２に出力された演算結果は、ＷＢ/ＤＳＰステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングに同期してディステネーションレジスタに格納される。この例のディステネーションレジスタは、ＡＤＤ動作の場合はＤu，ＭＵＬ動作の場合はＤgで指定したレジスタになる。ＤuおよびＤgに指定できるレジスタは、ＤＳＰエンジン３内部の任意のレジスタである。 In parallel with the data transfer, a DSP calculation operation is also performed simultaneously. In synchronization with the rising timing of the clock signal Clock 1 of the WB / DSP stage, the register storing the source data is accessed, and the register value is output to the internal buses A 1, A 2, B 1, B 2 of the DSP engine 3. . In this example, the register storing the source data is a register designated by Sx and Sy in the case of an ADD (addition) operation, and a register designated by Se and Sf in the case of a MUL (multiplication) operation. Registers that can be designated as Sx, Sy, Se, and Sf are arbitrary registers in the DSP engine 3. The data output to the internal buses A1 and B1 of the DSP engine 3 is multiplied by the MAC 304, and the result is output to the DSP engine 3 internal bus C1. The data output to the internal buses A2 and B2 of the DSP engine 3 are added by the ALU 302, and the result is output to the DSP engine 3 internal bus C2. The calculation results output to the internal buses C1 and C2 of the DSP engine 3 are stored in the destination register in synchronization with the rising timing of the clock signal Clock2 of the WB / DSP stage. The destination register in this example is a register designated by Du for the ADD operation and Dg for the MUL operation. Registers that can be specified for Du and Dg are arbitrary registers in the DSP engine 3.

以上のように、ＤＳＰエンジン３内のレジスタに格納されているデータの加算、乗算を行ない、Ｘ-ＲＯＭ４やＸ-ＲＡＭ６及びＹ-ＲＯＭ５やＹ-ＲＡＭ７からＤＳＰエンジン３へのデータ転送を行なう命令では、ＩＦ，ＩＤ，ＥＸ，ＭＡ，ＷＢ/ＤＳＰの５段のパイプラインステージで動作が完了する。 As described above, an instruction for adding and multiplying data stored in a register in the DSP engine 3 and transferring data from the X-ROM 4, the X-RAM 6, the Y-ROM 5, and the Y-RAM 7 to the DSP engine 3. The operation is completed in five pipeline stages of IF, ID, EX, MA, and WB / DSP.

ＤＳＰ演算命令の第２の例として、
Ｉｎｓｔ１: ＰＡＤＤＡ０, Ｍ０, Ａ０ＰＭＵＬＡ１, Ｘ０, Ａ１ＭＯＶＸ.Ｗ＠Ｒ４, Ｘ１ＭＯＶＹ.Ｗ＠Ｒ６, Ｙ０
Ｉｎｓｔ２: ＡＤＤＲ８, Ｒ９
Ｉｎｓｔ３: ＡＤＤＲ１０, Ｒ１１
Ｉｎｓｔ４: ＡＤＤＲ１２, Ｒ１３
の４つの連続する命令を例にとり、図１６を用いてその動作説明をする。この４つの命令は、アドレスバスＩＡＢ，ＸＡＢ、及びＹＡＢを同時に使用することで、同一クロックサイクルに異なる動作を実現する例である。Ｉｎｓｔ１からＩｎｓｔ４までの命令動作は、図７及び図１５と同じなのでその部分の詳細な説明は省略する。 As a second example of a DSP operation instruction,
Inst1: PADD A0, M0, A0 PMUL A1, X0, A1 MOVX.W @ R4, X1 MOVY.W @ R6, Y0
Inst2: ADD R8, R9
Inst3: ADD R10, R11
Inst4: ADD R12, R13
As an example, the operation will be described with reference to FIG. These four instructions are examples in which different operations are realized in the same clock cycle by using the address buses IAB, XAB, and YAB simultaneously. Since the instruction operations from Inst1 to Inst4 are the same as those in FIGS. 7 and 15, detailed description thereof will be omitted.

始めにＩｎｓｔ１のＩＦステージで、Ｉｎｓｔ１の命令フェッチが行われる。Ｉｎｓｔ１のＩＤステージ時に、Ｉｎｓｔ２ではＩＦステージになるため、命令フェッチが行われる。 First, an instruction fetch of Inst1 is performed at the IF stage of Inst1. At the time of Inst1 ID stage, Inst2 is in the IF stage, so instruction fetch is performed.

Ｉｎｓｔ１のＥＸステージでは、Ｘ，Ｙメモリへのアクセスを行うためのアドレス演算を行っているときに、Ｉｎｓｔ２ではＩＤステージのため命令デコードを行い、Ｉｎｓｔ３ではＩＦステージのため命令フェッチを行う。 In the EX stage of Inst1, instruction calculation is performed for the ID stage in Inst2 and instruction fetch is performed for the IF stage in Inst3 while performing address calculation for accessing the X and Y memories.

Ｉｎｓｔ１のＭＡステージでは、ＥＸステージで演算されたアドレスがアドレスバスＸＡＢ、およびＹＡＢに出力され（実際にアドレスを出力するタイミングは、ＥＸステージのクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングからである）、データバスＸＤＢ及びＹＤＢよりデータが取り込まれる。このときＩｎｓｔ２ではＥＸステージのためＲ８とＲ９のＡＤＤ演算を行って動作が完了され、Ｉｎｓｔ３はＩＤステージのため命令デコードを行う。そしてＩｎｓｔ４はＩＦステージのため、Ｉｎｓｔ４が格納されているアドレスをアドレスバスＩＡＢへ出力する。実際にアドレスバスＩＡＢへ出力するタイミングは、Ｉｎｓｔ４のＩＦステージの半サイクル前のクロック信号Ｃｌｏｃｋ２の立ち上がりのタイミングからである。このタイミングは、Ｉｎｓｔ１においてアドレスバスＸＡＢ，ＹＡＢにアドレスを出力するタイミング（ＥＸステージの後半及びＭＡステージの前半）と同じタイミングである。すなわちアドレスバスＸＡＢ及びＹＡＢはデータ転送のために使用され、アドレスバスＩＡＢでは命令フェッチのために使用される。マイクロコンピュータ１では、それぞれＣＰＵコア２に接続された内部アドレスバスＩＡＢ，ＸＡＢ，ＹＡＢと内部データバスＩＤＢ，ＸＤＢ，ＹＤＢがあるために、当該３種類の内部バスを使用して同一クロックサイクルで異なるメモリアクセス動作を実行することが可能である。 In the Inst1 MA stage, the address calculated in the EX stage is output to the address buses XAB and YAB (the timing at which the address is actually output is from the rising timing of the clock signal Clock2 of the EX stage). Data is taken in from XDB and YDB. At this time, since Inst2 is an EX stage, the ADD operation of R8 and R9 is performed and the operation is completed. Inst3 is an ID stage and instruction decoding is performed. Since Inst4 is an IF stage, the address where Inst4 is stored is output to the address bus IAB. The timing for actually outputting to the address bus IAB is from the rising timing of the clock signal Clock2 half a cycle before the IF stage of Inst4. This timing is the same as the timing of outputting addresses to the address buses XAB and YAB in Inst 1 (the second half of the EX stage and the first half of the MA stage). That is, the address buses XAB and YAB are used for data transfer, and the address bus IAB is used for instruction fetch. In the microcomputer 1, since there are internal address buses IAB, XAB, YAB and internal data buses IDB, XDB, YDB respectively connected to the CPU core 2, these three types of internal buses are used to be different in the same clock cycle. Memory access operations can be performed.

この後Ｉｎｓｔ１は、ＷＢ/ＤＳＰステージにおいてＤＳＰ演算を行って動作を完了し、Ｉｎｓｔ２はすでに動作完了、Ｉｎｓｔ３はＥＸステージのためＲ１０とＲ１１のＡＤＤ演算を行って動作完了し、Ｉｎｓｔ４ではＩＤステージのため命令デコードを行う。 Thereafter, Inst1 completes the operation by performing the DSP operation in the WB / DSP stage, Inst2 has already completed the operation, and Inst3 has completed the operation by performing the ADD operation of R10 and R11, and inst4 has completed the operation. Therefore, instruction decoding is performed.

次のサイクルでは、Ｉｎｓｔ４のＥＸステージのみが行われ、Ｒ１２とＲ１３のＡＤＤ演算を行い動作を完了する。 In the next cycle, only the EX stage of Inst4 is performed, and the ADD operation of R12 and R13 is performed to complete the operation.

本実施例によれば以下の作用効果を得る。内蔵メモリは、ＤＳＰエンジン３による積和演算を考慮してＹメモリ５，７とＸメモリ４，６に２面化され、ＣＰＵコア２がＹメモリ５，７とＸメモリ４，６を内部バスＸＡＢ，ＸＤＢと内部バスＹＡＢ，ＹＤＢによってそれぞれ並列的にアクセス可能にされている。これにより、内蔵メモリ４〜７から２個のデータを同時にＤＳＰエンジン３に転送可能にされる。さらに、内部バスＸＡＢ，ＸＤＢと内部バスバスＹＡＢ，ＹＤＢは、外部にインタフェースされる内部バスＩＡＢ，ＩＤＢとも個別化されているので、ＣＰＵコア２はＸメモリ４，６とＹメモリ５，７のアクセスに並行して外部メモリアクセスも可能にされる。このように、それぞれＣＰＵコア２に接続された３種類のアドレスバスＩＡＢ，ＸＡＢ，ＹＡＢ及びデータバスＩＤＢ，ＸＤＢ，ＹＤＢがあるために、当該３種類の内部バスを使用して同一クロックサイクルで異なるメモリアクセス動作を実行することが可能である。したがって、プログラムやデータが外部メモリに存在する場合にも容易に対応して演算処理の高速化を実現できる。 According to the present embodiment, the following effects are obtained. The built-in memory is divided into Y memory 5 and 7 and X memory 4 and 6 in consideration of the product-sum operation by the DSP engine 3, and the CPU core 2 uses the Y memory 5 and 7 and X memory 4 and 6 as an internal bus. XAB and XDB and internal buses YAB and YDB are respectively accessible in parallel. As a result, two pieces of data from the built-in memories 4 to 7 can be simultaneously transferred to the DSP engine 3. Further, since the internal buses XAB and XDB and the internal bus buses YAB and YDB are also individually separated from the internal buses IAB and IDB interfaced to the outside, the CPU core 2 accesses the X memories 4 and 6 and the Y memories 5 and 7. In parallel, external memory access is also possible. Thus, since there are three types of address buses IAB, XAB, YAB and data buses IDB, XDB, YDB respectively connected to the CPU core 2, these three types of internal buses are used to be different in the same clock cycle. Memory access operations can be performed. Therefore, it is possible to easily cope with the case where the program and data exist in the external memory and to speed up the arithmetic processing.

前記Ｘメモリ４，６とＹメモリ５、７の夫々をＲＡＭとＲＯＭから構成することにより、マイクロコンピュータの使い勝手を更に向上させることができる。 By configuring each of the X memories 4 and 6 and the Y memories 5 and 7 from a RAM and a ROM, the usability of the microcomputer can be further improved.

上述のように、内蔵メモリはＸメモリ４、６とＹメモリ５，７に２面化され、２面化された各メモリはＲＯＭとＲＡＭを供え、ＲＡＭをデータメモリ、ＲＯＭをプログラムメモリとすることにより、データメモリとプログラムメモリの分離も可能になり、ＤＳＰエンジン３に２個のデータを並列的に転送し、また、命令フェッチ、データ転送、及び演算を並列パイプライン処理にて能率的に行うことができる。 As described above, the built-in memory is divided into the X memory 4 and 6 and the Y memory 5 and 7, and each of the divided memories is provided with the ROM and the RAM. The RAM is the data memory and the ROM is the program memory. This makes it possible to separate the data memory and the program memory, transfer two pieces of data to the DSP engine 3 in parallel, and efficiently perform instruction fetch, data transfer, and computation by parallel pipeline processing. It can be carried out.

ＣＰＵコア２がモジュロアドレス出力部２００を備えることにより、ＣＰＵコア２における積和演算などの繰返し演算のためのアドレス生成を高速化することができる。 Since the CPU core 2 includes the modulo address output unit 200, it is possible to speed up address generation for repetitive operations such as a product-sum operation in the CPU core 2.

ＣＰＵ命令は命令コードの最上位４ビットが”００００”〜”１１１０”までの空間に命令が割り当てられている。ＤＳＰ命令は、命令コードの最上位４ビットが”１１１１”に全て割り当てられている。さらに命令コードの最上位６ビットが”１１１１００”及び”１１１１０１”の空間に割り当てられた命令は、ＤＳＰ命令でも１６ビット長の命令コードになっている。命令コードの最上位６ビットが”１１１１１０”の命令は、３２ビット長の命令コードになっている。命令コードの最上位６ビットが”１１１１１１”の空間には命令を割り当てておらず、未使用領域となっている。このように、最大３２ビットの命令に対するコード割り当てに上記のような規則を設けることにより、命令コードの最上位側６ビットをデコードすれば、当該命令がＣＰＵ命令であるか、１６ビット長のＤＳＰ命令であるか、３２ビット長のＤＳＰ命令であるかを、小さな論理規模のデコーダで判定することができ、常に３２ビット全部を一度にデコードすることを要しない。 In the CPU instruction, the most significant 4 bits of the instruction code are assigned to a space from “0000” to “1110”. In the DSP instruction, the most significant 4 bits of the instruction code are all assigned to “1111”. Further, the instruction assigned to the space where the most significant 6 bits of the instruction code are “111100” and “111101” is a 16-bit instruction code even in a DSP instruction. An instruction in which the most significant 6 bits of the instruction code are “111110” is an instruction code having a 32-bit length. No instruction is assigned to the space in which the most significant 6 bits of the instruction code are “111111”, which is an unused area. In this way, by providing the above-described rules for code allocation for instructions of a maximum of 32 bits, if the most significant 6 bits of the instruction code are decoded, the instruction is a CPU instruction or a 16-bit DSP Whether it is an instruction or a 32-bit DSP instruction can be determined by a small logic scale decoder, and it is not always necessary to decode all 32 bits at once.

図１７に基づいて説明したように、命令フェッチタイミングの後には、まだ処理されていない命令コードデータが命令レジスタ２５にセットされ、このとき、実行されるべき命令が、１６ビットＣＰＵ命令、１６ビットＤＳＰ命令又は３２ビットＤＳＰ命令の何れであっても、その上位側１６ビットを必ず第１のデコード回路２４０に供給することができる。 As described with reference to FIG. 17, after the instruction fetch timing, instruction code data not yet processed is set in the instruction register 25. At this time, an instruction to be executed is a 16-bit CPU instruction, 16-bit CPU instruction. Whether the DSP instruction or the 32-bit DSP instruction, the upper 16 bits can be supplied to the first decoding circuit 240 without fail.

３２ビットＤＳＰ命令のＡフィールドのコードは命令レジスタ２５における上位領域ＵＩＲにセットされ、Ａフィールドと同一のフォーマットを有する１６ビットＤＳＰ命令も上位領域ＵＩＲにセットされる。したがって、その何れにおいても、ＣＰＵコア２は、必要なアドレス演算及びデータフェッチに必要なデータパスの選択を同様に行うことができる。すなわち、３２ビットＤＳＰ命令を実行するためのデータフェッチと１６ビットＤＳＰ命令を実行するためのデータフェッチとのためにデコード回路２４０、２４１を共通化でき、この点においても、マイクロコンピュータ１の論理規模を縮小することができる。 The code of the A field of the 32-bit DSP instruction is set in the upper area UIR in the instruction register 25, and the 16-bit DSP instruction having the same format as the A field is also set in the upper area UIR. Accordingly, in any of them, the CPU core 2 can similarly perform a necessary address calculation and a data path selection necessary for data fetching. In other words, the decode circuits 240 and 241 can be shared for data fetch for executing a 32-bit DSP instruction and data fetch for executing a 16-bit DSP instruction. Can be reduced.

以上本発明者によってなされた発明を実施例に基づいて具体的に説明したが、本発明はそれに限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは言うまでもない。例えばＣＰＵ命令、１６ビットＤＳＰ命令、３２ビットＤＳＰ命令の識別は命令の最上位６ビットを利用することに限定されず、命令コードの数に応じて増減できる。また、命令レジスタに対する下位１６ビットを上位へシフトさせる機能は別の機能に置き換え可能である。また、ＣＰＵコアやＤＳＰエンジンに含まれるレジスタ本数や演算器の種類は上記実施例に限定されず適宜変更可能である。また、メモリの数を２個に限定せずに増加させることが可能である。そしてメモリの数に合わせてメモリに接続されるアドレスバス、データバスの本数を増加させることは可能である。例えば、Ｘ，Ｙメモリの他に新たにＺメモリを設ける。それに合わせてＣＰＵとＺメモリの間にアドレスバスＺＡＢ，ＤＳＰエンジンとＺメモリの間にデータバスＺＤＢを接続する。このような構成にすれば、積和演算時にＸ，ＹメモリからデータをＤＳＰエンジンに取り込むだけでなく、現在実行中の命令以前に演算終了しているデータをＺバスを介してＺメモリ回路に同時に書き込むことが可能となる。１つの命令で演算データの取り込み、メモリへの書き込みが可能となるのでマイクロコンピュータ全体のスループットがさらに向上する。本発明は、移動体通信機器における情報の圧縮伸張処理やフィルタリング処理、サーボ制御、プリンタにおける画像処理等に適用される機器組み込み制御用マイクロコンピュータとしての利用に最適である。 Although the invention made by the present inventor has been specifically described based on the embodiments, it is needless to say that the present invention is not limited thereto and can be variously modified without departing from the gist thereof. For example, identification of a CPU instruction, a 16-bit DSP instruction, and a 32-bit DSP instruction is not limited to using the most significant 6 bits of the instruction, and can be increased or decreased according to the number of instruction codes. Further, the function of shifting the lower 16 bits for the instruction register to the upper side can be replaced with another function. Further, the number of registers and the types of arithmetic units included in the CPU core and the DSP engine are not limited to the above embodiments, and can be changed as appropriate. Further, the number of memories can be increased without being limited to two. The number of address buses and data buses connected to the memory can be increased according to the number of memories. For example, a Z memory is newly provided in addition to the X and Y memories. Accordingly, an address bus ZAB is connected between the CPU and the Z memory, and a data bus ZDB is connected between the DSP engine and the Z memory. With such a configuration, not only the data from the X and Y memories is taken into the DSP engine during the product-sum operation, but also the data that has been calculated before the currently executing instruction is transferred to the Z memory circuit via the Z bus. It becomes possible to write simultaneously. Since the calculation data can be fetched and written into the memory with one instruction, the throughput of the entire microcomputer is further improved. The present invention is most suitable for use as a device-embedded control microcomputer applied to information compression / decompression processing and filtering processing in a mobile communication device, servo control, image processing in a printer, and the like.

本発明の一実施例に係るマイクロコンピュータの全体ブロック図である。1 is an overall block diagram of a microcomputer according to an embodiment of the present invention. マイクロコンピュータの一例アドレスマップである。It is an example address map of a microcomputer. モジュロアドレス出力部を詳細に示したＣＰＵコアのブロック図である。It is a block diagram of a CPU core showing a modulo address output unit in detail. ＤＳＰエンジンの一例ブロック図である。It is an example block diagram of a DSP engine. マイクロコンピュータの命令フォーマット及び命令コードに関する一例説明図である。It is an example explanatory drawing regarding the instruction format and instruction code of a microcomputer. ＣＰＵコアのデコーダとＤＳＰエンジンのデコーダとの接続構成を示すブロック図である。It is a block diagram which shows the connection structure of the decoder of a CPU core, and the decoder of a DSP engine. ＣＰＵコア内部でのＡＬＵ演算命令の実行タイムチャートである。It is an execution time chart of an ALU operation instruction in the CPU core. メモリからＣＰＵコアへデータを読込む命令の実行タイムチャートである。It is an execution time chart of the instruction which reads data from a memory to a CPU core. ＣＰＵコアからメモリへデータを書込み命令の実行タイムチャートである。6 is an execution time chart of a command for writing data from a CPU core to a memory. ＤＳＰ命令を実行するときの一例タイムチャートである。It is an example time chart when a DSP command is executed. Ｘ，ＹメモリからＤＳＰエンジンへデータを読込む命令の実行タイムチャートである。It is an execution time chart of an instruction for reading data from an X, Y memory to a DSP engine. ＤＳＰエンジンからＸ，Ｙメモリへデータを書込む命令の実行タイムチャートである。6 is an execution time chart of an instruction for writing data from a DSP engine to X and Y memories. メモリからＤＳＰエンジンへデータを読込む命令の実行タイムチャートである。It is an execution time chart of the instruction which reads data from a memory to a DSP engine. ＤＳＰエンジンからメモリへデータを書込む命令の実行タイムチャートである。It is an execution time chart of the instruction which writes data from a DSP engine to memory. ＤＳＰ演算命令の一例実行タイムチャートである。It is an example execution time chart of a DSP operation instruction. ＤＳＰ演算命令を連続して実行するときの一例タイムチャートである。It is an example time chart when a DSP operation instruction is executed continuously. 図６に対応される別の実施例を示すブロック図である。It is a block diagram which shows another Example corresponding to FIG. マイクロコンピュータの内蔵メモリとＤＳＰエンジン３の内蔵レジスタとの間におけるデータ転送を規定する１６ビットＤＳＰ命令のコードを示す命令フォーマット図である。3 is an instruction format diagram showing codes of a 16-bit DSP instruction that defines data transfer between a built-in memory of the microcomputer and a built-in register of the DSP engine 3. FIG. マイクロコンピュータの外部メモリとＤＳＰエンジン３の内蔵レジスタとの間におけるデータ転送を規定する１６ビットＤＳＰ命令のコードを示す命令フォーマット図である。4 is an instruction format diagram showing codes of a 16-bit DSP instruction that defines data transfer between an external memory of a microcomputer and a built-in register of the DSP engine 3. FIG. ３２ビットＤＳＰ命令のＡフィールドに着目した場合における当該フィールドのコードとそれに対応されるにニーモニックなどを示す命令フォーマット図である。When attention is paid to the A field of a 32-bit DSP instruction, it is an instruction format diagram showing a code of the field and a mnemonic corresponding to the code. ３２ビットＤＳＰ命令のＢフィールドに着目した場合における当該フィールドのコードとそれに対応されるにニーモニックなどを示す命令フォーマット図である。When attention is paid to the B field of a 32-bit DSP instruction, it is an instruction format diagram showing a code of the field and a mnemonic corresponding to the code.

Explanation of symbols

１マイクロコンピュータ
２ＣＰＵコア（セントラルプロセッシングユニット）
２０ＤＳＰ制御信号
２４デコーダ
２４０第１のデコード回路
２４１第２のデコード回路
２４２コード変換回路
２４３ＣＰＵデコード信号
２４４ＤＳＰデコード信号
２４５コード変換制御信号
２４７ＣＰＵ制御信号
２５命令レジスタ
２５０，２５１命令プリフェッチバッファ
２００モジュロアドレス出力部
２０６，２０７メモリアドレスバッファ
２１２アドレス演算器
２１３算術論理演算器
２１４モジュロスタートアドレスレジスタ
２１５モジュロエンドアドレスレジスタ
２１６、２２６モジュロアドレスレジスタ
３ＤＳＰエンジン（ディジタル信号処理ユニット）
３４デコーダ
３０２算術論理演算器
３０４乗算器
３０９，３１０，３１１メモリデータバッファ
４Ｘ−ＲＯＭ（第２のメモリ）
５Ｙ−ＲＯＭ（第１のメモリ）
６Ｘ−ＲＡＭ（第２のメモリ）
７Ｙ−ＲＡＭ（第１のメモリ）
１２外部メモリインタフェース 1 Microcomputer 2 CPU core (Central processing unit)
20 DSP control signal 24 Decoder 240 First decode circuit 241 Second decode circuit 242 Code conversion circuit 243 CPU decode signal 244 DSP decode signal 245 Code conversion control signal 247 CPU control signal 25 Instruction register 250, 251 Instruction prefetch buffer 200 Modulo Address output unit 206, 207 Memory address buffer 212 Address operator 213 Arithmetic logic operator 214 Modulo start address register 215 Modulo end address register 216, 226 Modulo address register 3 DSP engine (digital signal processing unit)
34 Decoder 302 Arithmetic Logic Operator 304 Multiplier 309, 310, 311 Memory Data Buffer 4 X-ROM (Second Memory)
5 Y-ROM (first memory)
6 X-RAM (second memory)
7 Y-RAM (first memory)
12 External memory interface

Claims

A first processor having an address generator;
A second processor;
First to third address buses connected to the first processor;
First to third data buses connected to the second processor;
A first memory connected to the first and second address buses and the first and second data buses;
A second memory connected to the first and third address buses and the first and third data buses;
The first processor is connected to the first data bus;
The address generator can generate first to third address signals to be output to the first to third address buses;
The first processor is capable of reading the first data designated by the first address signal via the first data bus,
The first data includes an instruction,
When the first processor executes the instruction, the address generator can generate the second and third address signals to be output to the second and third address buses.
The second processor can read the second data of the first memory designated by the second address signal output by the first processor via the second data bus, and the third processor A microprocessor capable of reading out third data of the second memory designated by an address signal through the third data bus.

In claim 1,
The address generator includes a first address generation unit and a second address generation unit,
The first address generation unit can generate the first address signal to be output to the first address bus.
The microprocessor according to claim 2, wherein the second address generation unit is capable of generating the second and third address signals for accessing the first and second memories.

In claim 2,
The second processor can acquire the second data and the third data at the same time using the second address signal and the third address signal output from the first processor. A microprocessor to do.

In claim 3,
The microprocessor, wherein the second processor is capable of executing arithmetic processing using the second data and the third data signal acquired via the second data bus and the third data bus. .

In claim 2,
The second processor can read data from at least one of the first memory and the second memory using the second address signal and the third address signal, and performs an operation using the read data. A microprocessor capable of executing processing.

In claim 3,
The second processor has an execution unit,
The microprocessor, wherein the execution unit is capable of executing an operation using the second data and the third data.

A CPU having an address generator;
A DSP whose operation can be controlled by the CPU;
First to third address buses connected to the CPU;
First to third data buses connected to the DSP;
A first memory connected to the first and second address buses and the first and second data buses;
A second memory connected to the first and third address buses and the first and third data buses;
The address generator can generate a first address output to the first address bus, a second address output to the second address bus, and a third address output to the third address bus. so,
The CPU can read the first data designated by the first address signal via the first data bus,
The DSP can read the second data from the first memory specified by the second address, and can read the third data from the second memory specified by the third address,
The microprocessor, wherein the DSP is capable of performing arithmetic processing using the second data and the third data.

In claim 7,
The CPU is
An address register;
An address output unit for generating the address from the start address to the end address by outputting the value in the address register to the second or third address bus and repeatedly updating the value in the address register And a microcomputer.

In claim 7,
The microcomputer includes an address output circuit for supplying the second and third addresses to the second and third address buses.

In claim 9,
The address output circuit includes a first address buffer connected to the second address bus, a second address buffer connected to the third address bus, and address information to be supplied to the first and second address buffers. And a computing means for computing.

In claim 7,
The CPU generates a control signal for controlling the DSP by decoding an instruction included in the first data,
The microcomputer, wherein the address generator generates the second address or the third address in response to the control signal.

A central processing unit;
A digital signal processing unit operating in synchronization with the central processing unit;
First to third address buses connected to the central processing unit;
A first data bus connected to the central processing unit and the digital signal processing unit;
Second and third data buses connected to the digital signal processing unit;
A first memory connected to the first and second address buses and the first and second data buses;
A second memory connected to the first and third address buses and the first and third data buses;
The central processing unit obtains first data specified by a first address signal via an interface circuit connected to the first address bus and the first data bus;
The central processing unit includes address supply means capable of supplying second and third address signals for accessing the first and second memories in parallel to the second and third address buses, respectively. ,
The digital signal processing unit includes first and second data for receiving the data output from the first and second memories according to the second and third address signals via the second and third data buses, respectively. A microcomputer comprising second data buffer means.

In claim 12,
The central processing unit includes an instruction register that stores an instruction, and an instruction decoding circuit that decodes the instruction stored in the instruction register and supplies a control signal based on the decoding result;
The address supply means supplies the second and third address signals to the corresponding second and third address buses in response to the control signal,
The digital signal processing unit includes first and second data buffer means for taking in the second and third data output from the first and second memories via the corresponding second and third data buses. A microcomputer comprising: a multiplier capable of calculating the second and third data supplied from the first and second data buffer means; and an arithmetic logic operation means.

In claim 13,
The digital signal processing unit includes a decoding circuit capable of supplying an internal control signal for controlling the multiplier and arithmetic logic means in response to a control signal output from the instruction decoding circuit. A microcomputer characterized by.

In claim 14,
The microcomputer according to claim 1, wherein the central processing unit includes a general-purpose register in which the second and third address signals are stored.

CPU,
A DSP that operates according to the CPU and includes a multiplier;
First to third address buses to which addresses are selectively supplied from the CPU;
A first data bus connected to the CPU and the DSP;
Second and third data buses connected to the DSP;
A first memory connected to the first and second address buses and the first and second data buses and accessed by an address supplied from the CPU;
A second memory connected to the first and third address buses and the first and third data buses and accessed by an address supplied from the CPU;
The CPU outputs an address signal to the first address bus, fetches an instruction from the first data bus, and supplies a first control signal for controlling a DSP operation to the DSP based on the fetched instruction. Is possible,
The CPU generates an address signal for outputting data held in at least one of the first memory and the second memory to the second data bus or the third data bus based on the fetched instruction, and generates the generated address A signal can be output to at least one of the second address bus and the third address bus;
The microcomputer according to claim 1, wherein the DSP is capable of executing arithmetic processing using data acquired from at least one of the second data bus or the third data bus based on the first control signal.