JPH10149341A

JPH10149341A - Asymmetric single-chip dual multiprocessor matching and synchronization

Info

Publication number: JPH10149341A
Application number: JP22410597A
Authority: JP
Inventors: A Mohammed Moatazu; モアタズ・エー・モハメド; Churu Park Hiion; ヒーオン・チュル・パク; Toron Nguyen Le; ル・トロン・ングイェン
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 1996-08-26
Filing date: 1997-08-20
Publication date: 1998-06-02

Abstract

(57)【要約】【課題】多重処理装置の同期化を単純化させる統合マ
ルチプロセッサ構造を開示する。【解決手段】多重処理装置は、一般用プロセッサ及び
単一命令複数データ方式を有するベクトルプロセッサ１
２０から構成されている。前記ベクトルプロセッサ１２
０内のすべての多重並列処理装置は、命令を同時的に処
理する。一般用プロセッサ１１０は、ベクトルプロセッ
サ１２０を制御し、また前記ベクトルプロセッサ１２０
を作動させることによって、プログラムフロー内にポー
クを形成させる。前記２個のプロセッサ１１０，１２０
は、制御プロセッサがベクトルプロセッサ１２０を停止
させるまで、または例外事項が発生する時まで、または
前記ベクトルプロセッサ１２０がそのプログラムを随行
し遊休状態に入るまで、並列的に分離されたプログラム
を実行する。 An integrated multiprocessor structure for simplifying synchronization of a multiprocessor is disclosed. A multiprocessor includes a general processor and a vector processor having a single instruction multiple data system.
20. The vector processor 12
All multi-parallel processors in 0 process instructions simultaneously. The general-purpose processor 110 controls the vector processor 120 and the vector processor 120.
Is activated to form pork in the program flow. The two processors 110 and 120
Executes the parallel separated program until the control processor stops the vector processor 120, or when an exception occurs, or the vector processor 120 follows the program and enters an idle state.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本願発明は、１９９６年８月
１９日付で米国に出願され、現在係留中の“マルチメデ
ィア信号プロセッサの、単一命令複数データ処理”の部
分連続出願である。本発明はマルチプロセッサに関し、
特に並列的に実行される別個プログラムスレッド(progr
am thread)を整合化または同期化させるためのシステム
及び方法に関する。FIELD OF THE INVENTION The present invention is a partial continuation application for "single instruction, multiple data processing of a multimedia signal processor", filed in the United States on August 19, 1996 and currently pending. The present invention relates to a multiprocessor,
In particular, separate program threads (progr
systems and methods for coordinating or synchronizing am thread).

【０００２】[0002]

【従来の技術】マルチプロセッサは、作業を完成するた
めお互いに協力し作動する複数のプロセッサを含む。相
対的に単純なマルチプロセッサシステムは、８０３８６
プロセッサのようなコプロセッサ(co-processor)及び、
８０３８７数学コプロセッサのような浮動小数点プロセ
ッサ(floating point processor)を含んでいる。このよ
うなシステムにおいて、プロセッサが浮動小数点作動を
必要とする命令を受けた時、前記プロセッサは、前記命
令を実行するコプロセッサを作動させる。８０３８７プ
ロセッサのような数学コプロセッサは、命令が指示され
るか、もしくは浮動小数点命令の間で遊休状態にある場
合、単一命令を実行することに多くの限界がある。ま
た、前記コプロセッサによって提供される処理能力（pr
ocessing power)を増大させることにも多い限界があ
る。2. Description of the Related Art A multiprocessor includes a plurality of processors working together to complete a task. A relatively simple multiprocessor system is 80386
A co-processor, such as a processor, and
Includes a floating point processor such as the 80387 math coprocessor. In such a system, when the processor receives an instruction that requires floating point operation, the processor activates a coprocessor that executes the instruction. Math coprocessors such as the 80387 processor have many limitations in executing a single instruction when the instruction is directed or idle between floating point instructions. Also, the processing capacity (pr) provided by the coprocessor
There are many limitations to increasing the ocessing power.

【０００３】分離され実行されるが、プログラムスレッ
ドを整合させる２個またはそれ以上の同一プロセッサを
含んでいる他のマルチプロセッサ方式も使用されてい
る。しかし、このようなマルチプロセッサ方式は、プロ
グラムスレッドの一部分を完成するために必要とする時
間が可変的であるので、プログラムスレッドを整合また
は同期化状態に維持すること、例えばプログラムスレッ
ドの間に結果を通過させることが困るようになる。例え
ば、キャッシュヒット(cache hit)及びミス(miss)、そ
して命令従属の変数が実行経路を遅延でき、命令を随行
することに必要なサイクル数が変化できる。従って、他
のプログラムスレッドでの命令シーケンスがたまに相互
に同期化されない問題が生じられる。[0003] Other multiprocessor schemes have also been used that execute separately but include two or more identical processors that coordinate program threads. However, such multiprocessor schemes vary the time required to complete a portion of the program thread, thus maintaining the program threads in a consistent or synchronized state, e.g., To pass through. For example, cache hits and misses, and instruction dependent variables can delay the execution path and change the number of cycles required to follow an instruction. Therefore, there is a problem that the instruction sequences in other program threads are sometimes not synchronized with each other.

【０００４】適切な同期化を維持するため、プログラム
スレッドが相互に同期化され得るように、プロセッサ間
に連結されたハードウェアがプロセッサを遅延させたり
遊休させる。複数個の同一プロセッサを有するシステム
において、それぞれのプログラムスレッドは、自分を遅
延させたり、他のプログラムを遅延させる役割を果た
す。このようなシステムは、複雑な同期化ハードウェア
を有している場合が多く、プログラムスレッドの同期化
及び整合を維持するため、複雑なソフトウェアを必要と
する。このような複雑な同期化方式を有するハードウェ
アは、統合システム内のチップのサイズを大型化させ
る。また、複雑な同期化は、ソフトウェアの発展を一層
長期化させて困難にする。[0004] To maintain proper synchronization, hardware coupled between the processors delays or idles the processors so that the program threads can be synchronized with each other. In a system having a plurality of identical processors, each program thread plays a role of delaying itself or delaying another program. Such systems often have complex synchronization hardware and require complex software to maintain program thread synchronization and alignment. Hardware having such a complicated synchronization scheme increases the size of a chip in an integrated system. Also, complex synchronization makes software development longer and more difficult.

【０００５】[0005]

【発明が解決しようとする課題】従って、本発明は従来
の問題点を解消するためのもので、その目的は、高度の
処理動力を提供し、内蔵された多数の分離されたプログ
ラムスレッドが互換性を有し、単純な方式でプログラム
を同期化且つ整合させることができるマルチプロセッサ
システムを提供することにある。SUMMARY OF THE INVENTION Accordingly, the present invention is directed to overcoming the problems in the prior art, which aims to provide a high degree of processing power and to allow multiple embedded program threads to be compatible. It is an object of the present invention to provide a multiprocessor system which has a characteristic and can synchronize and match programs by a simple method.

【０００６】[0006]

【課題を解決するための手段】本発明の一実施例によれ
ば、２個のプロセッサを含む非対称プログラム制御型の
統合マルチプロセッサが提供される。制御プロセッサと
命名される第１プロセッサは連続的にプログラムスレッ
ドを実行し、コプロセッサと命名される第２プロセッサ
上の第２プログラムスレッドを実行または停止させる。
両プロセッサは、同期化のための通信を容易にする拡張
レジスタセットを共用する。前記制御プロセッサは、後
で作動されるプログラムスレッドのためコプロセッサの
初期化が可能になるように、コプロセッサのレジスタに
接近する。前記コプロセッサは回路を持たず、また第１
プロセッサの制御または接近のために回路を要しない。According to one embodiment of the present invention, there is provided an integrated asymmetric program-controlled multiprocessor including two processors. The first processor, named the control processor, runs the program thread continuously, causing the second program thread on the second processor, named the coprocessor, to run or stop.
Both processors share an extension register set that facilitates communication for synchronization. The control processor accesses the coprocessor registers so that initialization of the coprocessor is possible for a program thread to be activated later. The coprocessor has no circuitry and the first
No circuitry is required to control or access the processor.

【０００７】前記プロセッサは、非対称命令セット及び
構造を有する。例えば、制御プロセッサはコプロセッサ
のレジスタを読み取ったり書き込むめの命令語、前記拡
張レジスタに接近するための命令語、コプロセッサを開
始させるための命令語、及び前記コプロセッサを中断さ
せるための命令語を実行する。前記コプロセッサは、前
記拡張レジスタに接近するための命令を実行するが、前
記制御プロセッサのレジスタには接近できない。プログ
ラムスレッドを終了するコプロセッサ命令は前記拡張レ
ジスタ内において前記コプロセッサが遊休状態であるこ
とを示す状態フラグをセットさせることによって、プロ
グラムスレッドの信号移行をインタラプトを介して前記
制御プロセッサに伝送する。[0007] The processor has an asymmetric instruction set and structure. For example, the control processor may read or write a register of the coprocessor, a command to access the extension register, a command to start the coprocessor, and a command to interrupt the coprocessor. Execute The coprocessor executes an instruction to access the extension register, but cannot access the control processor register. The coprocessor instruction to terminate the program thread transmits a signal transition of the program thread to the control processor via an interrupt by setting a status flag in the extension register indicating that the coprocessor is idle.

【０００８】前記制御プロセッサは、前記拡張レジスタ
をポーリング(polling)するか、または前記コプロセッ
サが移行されたタスク(task)を有する時を判断するた
め、インタラプトメカニズム(interrupt mechanism)を
使用する。また、前記拡張レジスタ内の同期化フラグ
は、前記制御プロセッサまたは前記コプロセッサによっ
てポーリングされることにより、他のプロセッサがプロ
グラムスレッドを連続的に実行するための必要条件を備
えているか否かを判断できるようにする。ポーリングを
使用することによって、前記制御プロセッサ及びコプロ
セッサは、前記コプロセッサを停止させずに同期化が可
能である。拡張レジスタ内のフラグをテスト及びセット
可能な特殊制御プロセッサ命令は、同期化のためのポー
リングを容易にする。[0008] The control processor polls the extension register or uses an interrupt mechanism to determine when the coprocessor has a transitioned task. The synchronization flag in the extension register is polled by the control processor or the coprocessor to determine whether or not another processor has a necessary condition for continuously executing a program thread. It can be so. By using polling, the control processor and coprocessor can be synchronized without stopping the coprocessor. Special control processor instructions that can test and set flags in the extension register facilitate polling for synchronization.

【０００９】本発明の一実施例によると、前記制御プロ
セッサは、一般用プロセッサであり、前記コプロセッサ
は単一命令複数データ方式を有するベクトルプロセッサ
である。本発明は、同期化機能を実行する場合に非効率
的に表れた計算能力を、前記ベクトルプロセッサによっ
て高度化でき、同期化を実現させる制御プロセッサがベ
クトルプロセッサが有しているデータ経路より狭いデー
タ経路を有することができるので、非常に効率的であ
る。本発明による実施例の二重プロセッサ構造は、ベク
トルプロセッサ内の多重処理装置を使用し、広範囲のデ
ータ経路も処理できる高度の処理能力を提供し、２個の
別個のプログラムスレッドが互換性を有することがで
き、制御プロセッサを介して優先的に供給されるソフト
ウェア同期化を単純化させることができる。According to one embodiment of the present invention, the control processor is a general purpose processor, and the coprocessor is a vector processor having a single instruction multiple data system. According to the present invention, the computational power that appears inefficiently when performing the synchronization function can be enhanced by the vector processor, and the control processor that realizes the synchronization has a smaller data path than the data path of the vector processor. It is very efficient because it can have a path. The dual processor structure of an embodiment according to the present invention uses a multiprocessor in a vector processor, provides a high degree of processing power capable of processing a wide range of data paths, and is compatible with two separate program threads. Software synchronization, which is preferentially provided via the control processor, can be simplified.

【００１０】[0010]

【発明の実施の形態】以下、添付図面に基づき本発明の
実施の形態について説明し、図面全体を通し同一部分に
は同一な符号を使用する。本発明の一実施の形態として
示したマルチプロセッサは、並列的にそれぞれ分離され
たプログラムスレッドを実行する一組のプロセッサを含
む。実行の制御及び同期化は非対称的に遂行され、一つ
のプロセッサは主プロセッサまたは制御プロセッサとな
り、他のプロセッサは従プロセッサまたは被制御プロセ
ッサとなる。前記制御プロセッサは、連続的なプログラ
ムスレッドを実行するが、前記連続的なプログラムスレ
ッドは、前記被制御プロセッサ上の第２並列プログラム
スレッドを実行させることによって、並列処理命令を下
すようになる。前記被制御プロセッサが第２プログラム
スレッドを完了し遊休状態にある時、前記第２プログラ
ムスレッドは、前記プログラムスレッドに組み合わせら
れる。前記被制御プロセッサの命令セットは、実行時に
第２プログラムスレッドを終了し、割り込み要求を制御
プロセッサに伝送することによって、第２プログラムス
レッドが完了されたことを示す命令を含む。Embodiments of the present invention will be described below with reference to the accompanying drawings, and the same reference numerals are used for the same portions throughout the drawings. The multiprocessor described as an embodiment of the present invention includes a set of processors executing program threads that are separated in parallel. Execution control and synchronization is performed asymmetrically, with one processor becoming the main or control processor and the other becoming the slave or controlled processor. The control processor executes a continuous program thread, and the continuous program thread issues a parallel processing instruction by executing a second parallel program thread on the controlled processor. When the controlled processor has completed the second program thread and is idle, the second program thread is combined with the program thread. The instruction set of the controlled processor includes instructions indicating that the second program thread has been completed by terminating the second program thread during execution and transmitting an interrupt request to the control processor.

【００１１】情報は流通経路を通しプログラムスレッド
の間を通過することができる。前記流通経路は下記のと
おりである：プロセッサの共用アドレス空間及びメモ
リ、被制御プロセッサが遊休状態の時、制御プロセッサ
が接近できる被制御プロセッサのレジスタセット及び両
プロセッサに接近できる“拡張”レジスタ。前記拡張レ
ジスタの一例としては、前記被制御プロセッサによって
セッティングされ、固有の作動が完了されたことを表わ
す１つ以上のフラグビットがある。また他のフラグビッ
トは、前記被制御プロセッサが優先的に始まったタスク
を実行中の状態であるか、または遊休状態であるかを表
わす。前記フラグビットを使用することによって、前記
制御プロセッサのプログラムスレッドは、待機ループ(w
ait loop)を含むことができる。前記待機ループは、前
記被制御プロセッサからの結果が備えられているか否か
を判断するため、フラグビットをポーリングする。前記
被制御プロセッサは、一般的に自分の固有プログラムス
レッド内にソフトウェア同期化を必要としない。従っ
て、ソフトウェアの同期化は、最も小さいオーバーヘッ
ド(overhead)のみが必要となる。[0011] Information can pass between program threads through a distribution channel. The distribution paths are as follows: a shared address space and memory of the processor, a register set of the controlled processor which the control processor can access when the controlled processor is idle, and an "extended" register which can access both processors. One example of the extension register is one or more flag bits set by the controlled processor to indicate that a specific operation has been completed. The other flag bits indicate whether the controlled processor is executing a task started with priority or is in an idle state. By using the flag bit, the control processor's program thread
ait loop). The wait loop polls a flag bit to determine if a result from the controlled processor is available. The controlled processor generally does not require software synchronization within its own program thread. Thus, software synchronization requires only minimal overhead.

【００１２】図１は、本発明の一実施の形態として示し
たマルチプロセッサ１００のブロックダイアグラムを示
している。マルチプロセッサ１００は、一般用プロセッ
サ１１０及びモノリチック(monolithic)半導体チップに
統合されるベクトルプロセッサ１２０とを含む。一般用
プロセッサ１１０とベクトルプロセッサ１２０とは、Ｓ
ＲＡＭ１６０、１９０、ＲＯＭ１７０及び、キャッシュ
制御部１８０とを有するキャッシュサブシステム１３０
を通し、マルチプロセッサ１００の他のオンチップ(on-
chip)要素に連結されている。マルチプロセッサ１００
は、ＳＲＡＭ１６０を一般用プロセッサ１１０用の命令
キャッシュ１６２及びデータキャッシュ１６４の形態に
配列し、ＳＲＡＭ１９０を、ベクトルプロセッサ１２０
用１９２及び１９４に配列する。ＳＲＡＭ１６０、１９
０の部品は、スクラッチパッドメモリを、一般用プロセ
ッサ１１０及びベクトルプロセッサ１２０の共用アドレ
ス空間内に交叉的に形成させる。FIG. 1 shows a block diagram of a multiprocessor 100 shown as an embodiment of the present invention. The multiprocessor 100 includes a general-purpose processor 110 and a vector processor 120 integrated on a monolithic semiconductor chip. The general processor 110 and the vector processor 120
Cache subsystem 130 including RAMs 160 and 190, ROM 170, and cache control unit 180
Through the other on-chip (on-
chip) element. Multiprocessor 100
Arranges the SRAM 160 in the form of an instruction cache 162 and a data cache 164 for the general-purpose processor 110, and stores the SRAM 190 in the vector processor 120.
192 and 194. SRAM 160, 19
The zero component causes the scratch pad memory to be formed alternately in the shared address space of the general purpose processor 110 and the vector processor 120.

【００１３】オンチップＲＯＭ１７０は、一般用プロセ
ッサ１１０及びベクトルプロセッサ１２０のためのデー
タ及び、ファームウェア(firmware)を有し、キャッシュ
によって接近され得る。ＲＯＭ１７０は、一般的にリセ
ット及び初期化順次、自己テスト診断順序、そしてイン
タラプト及び例外処理物を含んでいる。本発明の一実施
例において、マルチプロセッサ１００は、マルチメディ
アで信号処理のために使用されるもので、本明細書で
は、マルチメディア信号プロセッサまたはＭＳＰとして
も命名される。本発明の一実施の形態によると、ＲＯＭ
１７０は、サウンドカードエミュレーションのためのサ
ブルーチン、モデム信号処理のためのサブルーチン、一
般電話機能のためのサブルーチン、２−Ｄ及び３−Ｄグ
ラフィックサブルーチンライブラリー、及びＭＰＥＧ−
１、ＭＰＥＧ−２、Ｈ．２６１、Ｈ．２６３、Ｇ．７２
８、そしてＧ．７２３のような、オーディオ及びビデオ
エンコーディング及びデコーディング規格のためのサブ
ルーチンライブラリーを追加で含む。The on-chip ROM 170 has data and firmware for the general purpose processor 110 and the vector processor 120 and can be accessed by a cache. ROM 170 generally includes a reset and initialization sequence, a self-test diagnostic sequence, and interrupt and exception handling. In one embodiment of the present invention, the multiprocessor 100 is used for signal processing in multimedia, and is also referred to herein as a multimedia signal processor or MSP. According to one embodiment of the present invention, a ROM
170 is a subroutine for sound card emulation, a subroutine for modem signal processing, a subroutine for general telephone functions, a 2-D and 3-D graphic subroutine library, and an MPEG-
1, MPEG-2, H.264. 261, H .; 263, G.R. 72
8, and G. It additionally includes a subroutine library for audio and video encoding and decoding standards, such as 723.

【００１４】１９９６年８月１９日付で米国の特許庁に
出願された、“マルチメディア信号プロセッサにおける
マルチプロセッサの作動(Multiprocessor Operation in
a Multimedia Signal Processor”という名称の発明
は、マルチメディアでマルチプロセッサを使用すること
を追加で記載している。前記発明の全体的な内容は、本
明細書で参照として記載される。キャッシュサブシステ
ム１３０は、一般用プロセッサ１１０とベクトルプロセ
ッサ１２０とを２個のシステムバス１４０、１５０に連
結し、一般用プロセッサ１１０、ベクトルプロセッサ１
２０、及び前記２つのシステムバス１４０、１５０に結
合された装置のための、キャッシュ及びスイッチングス
テーションとして作動する。システムバス１５０は、シ
ステムバス１４０より高いクロック周波数で作動する。
システムバス１５０は、それぞれ外部ローカルメモリの
ためのインターフェースを提供するメモリコントローラ
ー１５８、ローカルバスインターフェース１５６、ＤＭ
Ａ(direct memory access)コントローラー１５４、及び
装置インターフェース１５２、そしてホストコンピュー
ターのローカルバス、直接メモリー接近、及びＡ／Ｄ、
Ｄ／Ａ変換機のような高速装置に連結されている。シス
テムタイマー１４２、ＵＡＲＴ(universal asynchronou
s receiver transceiver)１４４、ビットストリームプ
ロセッサ１４６、及びインタラプトコントローラー１４
８のような低速装置は、システムバス１４０に連結され
ている。本発明で参照として記載される“マルチメディ
ア信号プロセッサにおけるマルチプロセッサの作動”と
いう名称の発明には、前記キャッシュサブシステム１３
０の作動そして、一般用プロセッサ１１０及びベクトル
プロセッサ１２０がキャッシュサブシステム１３０とシ
ステムバス１４０及びシステムバス１５０を通して接近
する例示装置について詳述されている。[0014] filed with the United States Patent Office on August 19, 1996, entitled "Multiprocessor Operation in Multimedia Signal Processors."
The invention entitled "a Multimedia Signal Processor" additionally describes the use of a multiprocessor in multimedia. The general contents of said invention are described herein by reference. 130 connects the general processor 110 and the vector processor 120 to two system buses 140 and 150, and connects the general processor 110 and the vector processor 1 to each other.
20 and acts as a cache and switching station for the devices coupled to the two system buses 140,150. System bus 150 operates at a higher clock frequency than system bus 140.
The system bus 150 includes a memory controller 158 that provides an interface for an external local memory, a local bus interface 156, and a DM bus.
A (direct memory access) controller 154 and device interface 152, and a local bus of the host computer, direct memory access, and A / D,
It is connected to a high-speed device such as a D / A converter. System timer 142, UART (universal asynchronou)
s receiver transceiver) 144, bit stream processor 146, and interrupt controller 14
A low speed device such as 8 is connected to the system bus 140. The invention entitled "Multiprocessor Operation in a Multimedia Signal Processor", which is described by reference in the present invention, includes the cache subsystem 13
The operation of the general processor 110 and the vector processor 120 will now be described in detail with respect to an exemplary device in which the general purpose processor 110 and the vector processor 120 access the cache subsystem 130 via the system bus 140 and the system bus 150.

【００１５】１９９６年８月１９日付で米国に出願され
た“ビデオデータ処理用の装置及び方法”という題下の
発明は、ＭＰＥＧ規格に一致し、可変的な長さを有する
ビットストリームを、エンコーディング及びデコーディ
ングするためのビットストリームプロセッサ１４６を開
示している。前記発明もまた、その全体的な内容が本明
細書に参照として記載される。一般用プロセッサ１１０
及びベクトルプロセッサ１２０は、分離されたプログラ
ムスレッドを実行し、固有のタスクをより効果的に実行
できるように、相互に異なる構造を有している。一般用
プロセッサ１１０は、実時間(real-time)作動システ
ム、一般用プロセッサ１１０及びベクトルプロセッサ１
２０のための例外ルーチン、及び多い反復計算を必要と
しない工程を優先的に実行する。一般用プロセッサ１１
０はまた、ベクトルプロセッサ１２０の初期化、開始、
及び停止を制御する。ベクトルプロセッサ１２０は、マ
ルチメディア工程に常用されるデータブロック上で反復
処理することを含む、数処理作業(number crunching)を
主に実行する。The invention filed in the United States on August 19, 1996, entitled "Apparatus and Method for Video Data Processing" encodes a bit stream having a variable length in accordance with the MPEG standard. And a bitstream processor 146 for decoding. Said invention is also described in its entirety herein by reference. General-purpose processor 110
The vector processor 120 and the vector processor 120 have different structures so as to execute the separated program threads and execute the specific tasks more effectively. The general processor 110 includes a real-time operating system, a general processor 110 and a vector processor 1.
Exceptional routines for 20 and those steps that do not require a lot of iterative calculations are preferentially performed. General processor 11
0 is also the initialization, start,
And stop. The vector processor 120 mainly performs number crunching, including iterative processing on data blocks commonly used in multimedia processes.

【００１６】図２は、一般用プロセッサ１１０とベクト
ルプロセッサ１２０間の相互作用を例示するブロックダ
イアグラムを示している。一般用プロセッサ１１０は、
制御ロジックを有する命令デコーダー２６０、実行デー
タ経路２７０、記録レジスタ２８０、及び読出レジスタ
２９０を含む。一般用プロセッサ１１０は、一般的なス
カラーデータ値を有する。図２の実行データ経路２７０
で、レジスタファイル２７２は、３２ビットデータレジ
スタセット及び状態レジスタセットを含んでおり、処理
装置２７６は３２ビットまでの大きさを有するオペラン
ドを操作するための３２ビットバスを有する。実施例に
おいて、一般用プロセッサ１１０は、４０ＭＨｚで作動
し、ＡＲＭ７プロセッサの構造に符合する３２ビットＲ
ＩＳＣプロセッサである。ＡＲＭ７ＲＩＳＣプロセッサ
の構造及び命令セットは、アドバンスリスクマシーン
株式会社(Advance RISC Machines Ltd.)から入手可能
な、文書番号ＡＲＭＤＤＩ００１０Ｇである“ＡＲＭ
７ＤＭデータシート”に詳細に記載されている。前記Ａ
ＲＭ７ＤＭデータシートは、本明細書に参照として記載
される。別添Ａは、一般用プロセッサ１１０とベクトル
プロセッサ１２０との間の相互作用または、実施例に記
載されたキャッシュサブシステム１３０のためのＡＲＭ
７命令の拡張子を表している。FIG. 2 shows a block diagram illustrating the interaction between general purpose processor 110 and vector processor 120. The general processor 110 includes:
It includes an instruction decoder 260 having control logic, an execution data path 270, a recording register 280, and a read register 290. The general purpose processor 110 has a general scalar data value. Execution data path 270 in FIG.
The register file 272 includes a 32-bit data register set and a status register set, and the processing unit 276 has a 32-bit bus for operating operands having a size of up to 32 bits. In an embodiment, the general purpose processor 110 operates at 40 MHz and has a 32-bit R that conforms to the architecture of an ARM7 processor.
An ISC processor. The structure and instruction set of the ARM7 RISC processor is "ARM DDI 0010G," document number ARM DDI 0010G, available from Advance RISC Machines Ltd.
7DM data sheet ". A.
The RM7DM data sheet is described herein by reference. Appendix A describes the interaction between the general purpose processor 110 and the vector processor 120 or the ARM for the cache subsystem 130 described in the embodiment.
Represents the extension of 7 instructions.

【００１７】図２の実施の形態において、ベクトルプロ
セッサ１２０は、ＳＩＤＭ(singleinstruction multipl
e data)構造を有し、命令取出し装置(instruction fetc
h unit)２１０、デコーダー２２０、スケジューラー２
３０、実行データ経路２４０、及び積載／貯蔵装置（Ｌ
ＳＵ）２５０とを含む。命令取出し装置２１０は、命令
及びブランチのようなプロセスフロー制御命令を引出
す。デコーダー２２０は、命令取出し装置２１０から到
着する順序によって、サイクル当り１つの命令をデコー
ドし、スケジューラー２３０内のＦＩＦＯ２３４に伝達
される命令によってデコードされるフィールド値を記録
する。スケジューラー２３０内のイッシュコントロール
ロジック２３２は、作動を完了するために実行データ経
路２４０及び、積載／貯蔵装置（ＬＳＵ；２５０）に記
録されるフィールド値を選別する。実行データ経路２４
０は、ベクトルデータまたはスカラーデータを操作する
論理／算術命令を実行する。積載／貯蔵装置（ＬＳＵ）
２５０は、一般用プロセッサ１１０及びベクトルプロセ
ッサ１２０の共用アドレス空間に接近する積載／貯蔵命
令を実行する。例外コントロールロジック２１５は、命
令取出し装置２１０、デコーダー２２０、及びスケジュ
ーラー２３０に結合されており、ベクトルプロセッサ命
令のデコーディングまたは実行が例外事項を引き起こす
場合、一般用プロセッサ１１０にインタラプトを印加す
る。In the embodiment shown in FIG. 2, the vector processor 120 has a single instruction multipl
e data) structure, and an instruction fetch unit (instruction fetc
h unit) 210, decoder 220, scheduler 2
30, the execution data path 240, and the loading / storage device (L
SU) 250. The instruction fetch unit 210 fetches process flow control instructions such as instructions and branches. The decoder 220 decodes one instruction per cycle according to the order of arrival from the instruction fetch unit 210, and records the field values decoded by the instruction transmitted to the FIFO 234 in the scheduler 230. The issue control logic 232 in the scheduler 230 selects the execution data path 240 and the field values recorded in the load / store unit (LSU; 250) to complete the operation. Execution data path 24
0 executes logical / arithmetic instructions that operate on vector data or scalar data. Loading / Storage Unit (LSU)
The 250 executes a load / store instruction that accesses the shared address space of the general processor 110 and the vector processor 120. The exception control logic 215 is coupled to the instruction fetch unit 210, the decoder 220, and the scheduler 230, and applies an interrupt to the general processor 110 when decoding or execution of the vector processor instruction causes an exception.

【００１８】実施の形態において、実行データ経路２４
０は、８個の３２ビット浮動小数点装置、８個の３６ビ
ット整数乗算器、及び８個の３６ビット算出論理装置
（ＡＬＵｓ）を含む並列処理装置２４６を有する。それ
ぞれの３６ビット整数乗算器は、３６ビットデータ要素
上で１回の演算を遂行することができ、１６ビットデー
タ要素上で２回の演算を同時的に遂行することができ、
８ビットまたは９ビットデータ要素上で４回の演算を同
時的に遂行することができる。並列処理装置２４６は、
２８８ビットベクトルオペランド及び、３２ビットスカ
ラーオペランドを処理する。実行データ経路２４０のた
めのレジスタファイル２４２は、ベクトルレジスタ２４
４を有している。ベクトルプロセッサ１２０の内の大部
分のデータ経路は、３２個の８ビットまたは９ビットデ
ータ要素、１６個の１６ビット要素または、８個の３２
ビット要素の同時的な操作を支持するため、２８８また
は５７６ビットの幅を有する。ＳＩＭＤ構造により、ベ
クトルプロセッサ１２０内の並列処理装置２４６は、同
一な命令を同時的に実行し完了できる。In the embodiment, the execution data path 24
0 has a parallel processing unit 246 that includes eight 32-bit floating point units, eight 36-bit integer multipliers, and eight 36-bit arithmetic logic units (ALUs). Each 36-bit integer multiplier can perform one operation on the 36-bit data element, and can perform two operations on the 16-bit data element simultaneously,
Four operations can be performed simultaneously on 8-bit or 9-bit data elements. The parallel processing device 246 includes:
Process 288 bit vector operands and 32 bit scalar operands. The register file 242 for the execution data path 240 stores the vector register 24
Four. Most of the data paths within the vector processor 120 are 32 8 bit or 9 bit data elements, 16 16 bit elements or 8 32 bit elements.
It has a width of 288 or 576 bits to support simultaneous operation of bit elements. With the SIMD structure, the parallel processing unit 246 in the vector processor 120 can execute and complete the same instruction simultaneously.

【００１９】ベクトルプロセッサ１２０は、実施例で８
０ＭＨｚで作動するパイプライン式ＲＩＳＣエンジンか
ら構成されている。ベクトルプロセッサ１２０のレジス
タは、特殊用レジスタ２４５、復帰アドレススタック
（図示されていない）、３２ビットスカラーレジスタ２
４３、二バンクからなるベクトルレジスタ２４４、及び
２個のダブル−サイズ（すなわち、５７６ビット）ベク
トルアキュムレーターレジスタ（図示せず）を含む。レ
ジスタファイル２４２は、０乃至３１の範囲を有する５
ビットレジスタ数により命令語内で識別される３２個の
スカラーレジスタ、及び３２個のベクトルレジスタの二
バンク内に編制される６４個の２８８ビットベクトルレ
ジスタを含む。The vector processor 120 has eight
It consists of a pipeline RISC engine operating at 0 MHz. The registers of the vector processor 120 include a special register 245, a return address stack (not shown), a 32-bit scalar register 2
43, two banks of vector registers 244, and two double-sized (ie, 576 bits) vector accumulator registers (not shown). Register file 242 has a range of 0 to 31
It includes 32 scalar registers identified in the instruction word by the number of bit registers, and 64 288 bit vector registers organized in two banks of 32 vector registers.

【００２０】それぞれのベクトルレジスタは、１ビット
バンク数（０または１）及び、０乃至３１の範囲を有す
る５ビットベクトルレジスタ数により識別される。大部
の命令語は、ベクトルプロセッサ１２０の特殊用レジス
タ（ＶＣＳＲ）に貯蔵されている制御ビット（ＣＢＡＮ
Ｋ）により表示されるとおり、作動中のバンク内のベク
トルレジスタのみに接近可能である。第２制御ビット
（ＶＥＣ６４）は、省略時(default)によるレジスタ数
がそれぞれのバンクから１つのレジスタを含むダブルサ
イズベクトルレジスタを表しているか否かを表示する。
命令語構文は、ベクトルレジスタを表わすレジスタ数
を、スカラーレジスタを表すレジスタ数から区別する。
また命令語構文は、ベクトルレジスタを表わすレジスタ
数を、特殊用レジスタを表わすレジスタ数から区別す
る。Each vector register is identified by a 1-bit bank number (0 or 1) and a 5-bit vector register number ranging from 0 to 31. Most of the instructions are controlled by control bits (CBAN) stored in a special register (VCSR) of the vector processor 120.
As indicated by K), only the vector registers in the active bank are accessible. The second control bit (VEC64) indicates whether the default number of registers represents a double size vector register containing one register from each bank.
The instruction syntax distinguishes the number of registers representing a vector register from the number of registers representing a scalar register.
The instruction syntax distinguishes the number of registers representing a vector register from the number of registers representing a special register.

【００２１】ベクトルレジスタ２４４は、それぞれプロ
グラミングできる大きさのデータ要素に分割され得る。
例えば、２８８ビットベクトルレジスタは、３２個の８
ビットまたは９ビットの整数データ要素、１６個の３２
ビット整数データ要素、または８個の３２ビット整数デ
ータ要素、または３２個の浮動小数点データ要素とを保
有することができる。ダブル−サイズレジスタの形態で
論理結合されている２個のベクトルレジスタは、２倍程
度のデータ要素を有するベクトルの貯蔵ができる。本発
明の実施の形態によると、セッティング制御ビットＶＥ
Ｃ６４は、ベクトルプロセッサ１２０を、ダブルサイズ
（５７６ビット）がベクトルレジスタ用の設定値となる
モードＶＥＣ６４に位置させる。１９９６年８月１９日
付の米国の特許庁に出願された“マルチメディア信号プ
ロセッサにおける単一命令複数データ処理方式”という
名称の発明は、本実施例のベクトルプロセッサ１２０に
関連した構造及び命令語セットを追加で説明している。
前記発明は、本明細書で参考として記載される。The vector register 244 may be divided into data elements of a programmable size.
For example, a 288 bit vector register has 32 8
Bit or 9 bit integer data element, 16 32
It may have a bit integer data element, or eight 32-bit integer data elements, or 32 floating point data elements. Two vector registers logically connected in the form of a double-size register can store a vector having about twice as many data elements. According to the embodiment of the present invention, the setting control bit VE
The C64 positions the vector processor 120 in the mode VEC64 in which the double size (576 bits) is the set value for the vector register. The invention entitled "Single Instruction Multiple Data Processing Method in Multimedia Signal Processor" filed with the U.S. Patent Office on Aug. 19, 1996 has a structure and instruction word set related to the vector processor 120 of this embodiment. Is additionally explained.
The invention is described herein by reference.

【００２２】マルチプロセッサ１００はまた、一般用プ
ロセッサ１１０とベクトルプロセッサ１２０とに接近可
能な、３２ビット拡張レジスタ１１５セットを含む。拡
張レジスタ１１５は、特権拡張レジスタ及び使用者拡張
レジスタを含む。前記特権拡張レジスタは、マルチプロ
セッサ１００のための一般的な作動モードを制御したり
表示したりする。前記使用者拡張レジスタは、一般用プ
ロセッサ１１０及びベクトルプロセッサ１２０とによっ
て実行されるプログラムスレッドを同期化させるレジス
タを含む。実施例において、前記使用者拡張レジスタ
は、ベクトルプロセッサ状態フラグ（ＶＰＳＴＡＴＥ）
及び同期化フラグ（ＶＡＳＹＮＣ）を含む。前記状態フ
ラグ（ＶＰＳＴＡＴＥ）は、２個の値（ＶＰ＿ＲＵＮ及
びＶＰ＿ＩＤＬＥ）を有し、ベクトルプロセッサ１２０
がプログラムスレッドを実行しているかまたは、遊休状
態であるか否かを表わす。実施例において、ベクトルプ
ロセッサ１２０は、拡張レジスタ（ＶＡＳＹＮＣ）をそ
の自分の特殊用レジスタと見なしており、ベクトルプロ
セッサ命令語、例えばＶＭＯＶのような命令語は、レジ
スタ（ＶＡＳＹＮＣ）を読出または記録するための接近
を可能にする。他の命令語は、特殊用拡張レジスタに必
ず接近するようになっている。Multiprocessor 100 also includes a set of 32-bit extension registers 115 accessible to general purpose processor 110 and vector processor 120. The extension register 115 includes a privilege extension register and a user extension register. The privilege extension register controls and indicates a general operation mode for the multiprocessor 100. The user extension register includes a register that synchronizes a program thread executed by the general processor 110 and the vector processor 120. In one embodiment, the user extension register includes a vector processor status flag (VPSTATE).
And a synchronization flag (VASYNC). The status flag (VPSTATE) has two values (VP_RUN and VP_IDLE), and the vector processor 120
Is running a program thread or is idle. In an embodiment, the vector processor 120 regards the extension register (VASYNC) as its own special register, and a vector processor instruction, for example, an instruction such as VMOV, reads or records the register (VASYNC). Allows access. Other instruction words always approach the special extension register.

【００２３】例えば、ＶＣＩＮＴまたはＶＣＪＯＩＮの
ような命令語、または例外(EXCEPTION)を発生させるそ
の他の命令語は、例外のためにプログラムスレッドが完
了、停止された場合、状態フラグ（ＶＰＳＴＡＴＥ）を
ＶＰ＿ＩＤＬＥ状態に変化させる。拡張レジスタのベク
トルプロセッサ状態フラグ（ＶＰＳＴＡＴＥ）及び同期
化フラグ（ＶＡＳＹＮＣ）は、一般用プロセッサ１１０
及びベクトルプロセッサ１２０がレジスタを同時的に読
出できるように、２個のポートを有している。ベクトル
プロセッサ１２０がＶＰ＿ＩＤＬＥ状態にある時、一般
用プロセッサ１１０は、ベクトルプロセッサ１２０のス
カラーレジスタ及び、特殊用レジスタを読出または記録
することができる。しかし、ベクトルプロセッサ１２０
がＶＰ＿ＲＵＮ状態の間、ベクトルプロセッサ１２０の
レジスタを読出または記録する一般用プロセッサ１１０
の結果値は明確でない。一般用プロセッサ１１０用ＡＲ
Ｍ７命令語セットの拡張子は、拡張レジスタ１１５に接
近する命令語（ＭＦＥＲ及びＭＴＥＲ）及び、ベクトル
プロセッサ１２０のスカラーレジスタ及び特殊用レジス
タに接近する命令語（ＭＦＶＰ及びＭＴＶＰ）を含む。
条件命令語（ＴＥＳＴＳＥＴ）は拡張レジスタを読出
し、既存条件が満足された状態の場合、拡張レジスタの
ビット３０を１にセッティングする。命令語（ＴＥＳＴ
ＳＥＴ）は、同期化ポイントを表し得るように、ベクト
ルプロセッサ１２０により消去される値を読出し、次の
同期化ポイントを備え得るようにビット３０をリセッテ
ィングさせることによって、使用者／生産者の同期化を
容易にする。For example, a command such as VCINT or VCJOIN, or another command that causes an exception (EXCEPTION), sets the status flag (VPSTATE) to the VP_IDLE state when the program thread is completed or stopped due to the exception. To change. The vector processor status flag (VPSTATE) and the synchronization flag (VASYNC) of the extension register are stored in the general processor 110.
And two ports so that the vector processor 120 can read the registers simultaneously. When the vector processor 120 is in the VP_IDLE state, the general processor 110 can read or record the scalar register and the special register of the vector processor 120. However, the vector processor 120
Reads or records the registers of the vector processor 120 during the VP_RUN state.
Is not clear. AR for general processor 110
The extensions of the M7 instruction set include the instructions (MFER and MTER) approaching the extension register 115 and the instructions (MFVP and MTVP) approaching the scalar and special registers of the vector processor 120.
The condition command (TESTSET) reads the extension register and sets bit 30 of the extension register to 1 if the existing condition is satisfied. Command word (TEST
SET) reads the value erased by the vector processor 120 so that it can represent a synchronization point, and resets bit 30 to provide the next synchronization point, thereby synchronizing the user / producer. To facilitate.

【００２４】一般用プロセッサ１１０は、命令語ＳＴＡ
ＲＴＶＰ及びＩＮＴＶＰを実行し、ベクトルプロセッサ
１２０を作動及び停止でき、前記において言及したとお
り、ベクトルプロセッサ１２０のスカラーレジスタ及び
特殊用レジスタに接近できる。これとは対照的に、ベク
トルプロセッサ１２０は、一般用プロセッサ１１０を作
動及び停止させることができず、一般用プロセッサ１１
０のレジスタに接近することができない。このような一
般用プロセッサ１１０及びベクトルプロセッサ１２０間
の非対称的な制御分割は、一般用プロセッサ１１０及び
ベクトルプロセッサ１２０の同期化を単純にする。本発
明の一面によると、ベクトルプロセッサ１２０は遊休状
態になる毎に、一般用プロセッサ１１０にインタラプト
要請信号を発生させる。例えば、一般的なプログラムス
レッドの末尾でベクトルプロセッサ１２０は、インタラ
プト要請信号を発生させ、ベクトルプロセッサ１２０を
ＶＰ＿ＩＤＬＥ状態にする命令語（ＶＣＪＯＩＮまたは
ＶＣＩＮＴ）を実行する。一般用プロセッサ１１０は結
果を伝送し、ベクトルプロセッサ１２０を再作動させる
ために、インタラプト操作ルーチンを使用することがで
きる。従って、インタラプト操作者は、一般用プロセッ
サ１１０及びベクトルプロセッサ１２０を同期化でき
る。The general-purpose processor 110 has a command word STA.
It can perform RTVP and INTVP, activate and deactivate the vector processor 120, and access the scalar and special registers of the vector processor 120, as mentioned above. In contrast, the vector processor 120 cannot activate and deactivate the general purpose processor 110 and the general purpose processor 11
Cannot access the zero register. Such asymmetric control division between the general purpose processor 110 and the vector processor 120 simplifies the synchronization of the general purpose processor 110 and the vector processor 120. According to an aspect of the present invention, the vector processor 120 generates an interrupt request signal to the general purpose processor 110 every time the vector processor 120 enters an idle state. For example, at the end of a general program thread, the vector processor 120 generates an interrupt request signal and executes a command (VCJOIN or VCINT) for setting the vector processor 120 to the VP_IDLE state. The general purpose processor 110 can use the interrupt handling routine to transmit the results and restart the vector processor 120. Accordingly, the interrupt operator can synchronize the general processor 110 and the vector processor 120.

【００２５】１９９６年８月１９日付で米国の特許庁に
出願された“引き数の授受(Argument Passing)及び、ソ
フトウェアインタラプトを操縦するためのシステム及び
方法”(System and Method for Handling Software Int
errupts with Argument Passingという名称の発明と、
“非対称マルチプロセッサで、インタラプト及び例外事
象(event)を操縦するためのシステム及び方法(System a
nd Method for Handling Interrupts and Exception Ev
ents in an Asymmetric Multiprocessor Architectur
e”という名称の発明は、本発明の実施例に記載されて
いるとおりの例外及びインタラプトの処理に関して記述
している。前記発明の全体的な内容は、本明細書で参照
として記載される。ポーリングプロセスは、インタラプ
ト操作に代わって同期化することに使用される。図３の
本発明の一実施例による二重スレッドプロセス３００の
一例を表わす流れ図である。ベクトルプロセッサ１２０
でプログラムスレッドを始める前、前記一般用プロセッ
サ１１０は、レジスタ（ＶＰＳＴＡＴＥ）値を決定する
段階３１０及び３２０とを備える待機ループ３１５を実
行することによって、前記ベクトルプロセッサ１２０が
作動状態であるか、または遊休状態であるかを決定す
る。前記マルチプロセッサ１００が作動し始まるかリセ
ットされてから、または前記ベクトルプロセッサ１２０
が例外状況を発生させる命令を実行してから、前記ベク
トルプロセッサ１２０は遊休状態となる。"System and Method for Handling Software Int" filed with the United States Patent Office on August 19, 1996, "Argument Passing and Software Interrupts".
an invention named errupts with Argument Passing,
“System and method for handling interrupts and events in asymmetric multiprocessors
nd Method for Handling Interrupts and Exception Ev
ents in an Asymmetric Multiprocessor Architectur
The invention entitled "e" describes the handling of exceptions and interrupts as described in the embodiments of the invention. The general content of said invention is described herein by reference. The polling process is used to synchronize on behalf of an interrupt operation, and is a flow diagram illustrating an example of a dual threaded process 300 according to one embodiment of the present invention of Figure 3. Vector processor 120.
Before starting the program thread at, the general processor 110 executes a waiting loop 315 comprising steps 310 and 320 for determining a register (VPSTATE) value, so that the vector processor 120 is active or Determine if you are idle. After the multiprocessor 100 is started or reset, or the vector processor 120
Executes an instruction that causes an exceptional situation, then the vector processor 120 goes into an idle state.

【００２６】“マルチプロセッサで効果的に内容を貯蔵
及び復旧する方法”(Efficient Saving and Restoring
in Multiprocessors)という名称で、１９９６年８月１
９日付で米国に特許出願された発明は、プロセッサ１１
０、１２０で使用可能な内容変更方法を開示し、全体的
に本願で参照する。前記内容の貯蔵方法において、ベク
トルプロセッサ１２０は、拡張レジスタ（ＶＩＭＳＫ）
のフラグビット（ＣＳＥ）がセットされると、内容貯蔵
サブルーチンに進行する条件付けの内容貯蔵命令語（Ｖ
ＣＣＳ）を周期的に実行する。前記内容貯蔵サブルーチ
ンが終結される時、前記ベクトルプロセッサ１２０は、
命令語（ＶＳＣＩＮＴまたはＶＣＪＯＩＮ）を実行し、
ＶＰ＿ＩＤＬＥ状態に入ることができる。従って、好ま
しい内容貯蔵方法では、一般用プロセッサ１１０がフラ
グビット（ＣＳＥ）をセッティングし、ベクトルプロセ
ッサ１２０が待機ループ３１５でのように、遊休状態に
なることを待つ。"Efficient Saving and Restoring"
in Multiprocessors), August 1, 1996
The invention filed in the United States on the 9th is a processor 11
The content modification methods usable at 0, 120 are disclosed and are referred to throughout the present application. In the method of storing the contents, the vector processor 120 includes an extension register (VIMSK).
When the flag bit (CSE) is set, the content storage command (V
CCS) is executed periodically. When the content storage subroutine is terminated, the vector processor 120
Execute a command (VSCINT or VCJOIN),
The VP_IDLE state can be entered. Thus, in the preferred content storage method, the general purpose processor 110 sets the flag bit (CSE) and waits for the vector processor 120 to enter an idle state, as in the wait loop 315.

【００２７】一般用プロセッサ１１０は、待機ループ３
１５で待機し、前記ベクトルプロセッサ１２０が遊休状
態になるまで、拡張レジスタ（ＶＰＳＴＡＴＥ）をポー
リングし続ける。一応、前記ベクトルプロセッサ１２０
が遊休状態になると、一般用プロセッサ１１０は段階３
３０を実行し、新しいプログラムスレッド用のベクトル
プロセッサ１２０をセットアップする。段階３３０で、
前記一般用プロセッサ１１０は、前記ベクトルプロセッ
サのプログラムカウンターの特殊レジスタ（ＶＰＣ）
に、プログラムアドレスを記録する。前記プロセッサ１
１０はまた、パラメータをベクトルプロセッサ１２０に
送るために、ベクトルプロセッサ１２０の他のスカラー
レジスタ及び特殊レジスタに記録できる。前記ベクトル
プロセッサ１２０が初期化されると、前記一般用プロセ
ッサ１１０は、命令語（ＳＴＡＲＴＶＰ）が段階３４５
でプログラムスレッドを実行するベクトルプロセッサ１
２０を作動させる段階３４０を実行する。命令語（ＳＴ
ＡＲＴＶＰ）は、レジスタ（ＶＰＳＴＡＴＥ）をＶＰ＿
ＲＵＮ状態にセッティングし、前記ベクトルプロセッサ
１２０は、ＶＰ＿ＲＵＮ値を読出し命令を引出し実行す
る。この時、一般用プロセッサ１１０は、前記プロセッ
サ１１０、１２０が並列で作動できるように、そのプロ
グラムスレッドを実行し続ける。The general-purpose processor 110 executes the standby loop 3
Wait at 15 and continue polling the extension register (VPSTATE) until the vector processor 120 is idle. First, the vector processor 120
Is idle, the general purpose processor 110 proceeds to step 3
Execute 30 to set up the vector processor 120 for the new program thread. At step 330,
The general processor 110 is a special register (VPC) of a program counter of the vector processor.
Record the program address. The processor 1
The 10 can also record parameters in other scalar and special registers of the vector processor 120 for sending to the vector processor 120. When the vector processor 120 is initialized, the general processor 110 sends an instruction (STARTVP) to step 345.
Processor 1 that executes a program thread by using
Step 340 of activating 20 is performed. Command word (ST
ARTVP) sets the register (VPSTATE) to VP_
After setting to the RUN state, the vector processor 120 reads out the VP_RUN value, extracts and executes the instruction. At this time, the general-purpose processor 110 continues to execute its program thread so that the processors 110 and 120 can operate in parallel.

【００２８】過程３００で、一般用プロセッサ１１０
は、続けてそのプログラムスレッドを実行するが段階３
８０は、ベクトルプロセッサ１２０からの結果と同期化
されなければならない。そのプログラムスレッドを実行
するベクトルプロセッサ１２０は、段階３５５における
結果を完了し、段階３７５における実行を中止させる段
階３６５における命令語（ＶＣＩＮＴまたはＶＣＪＯＩ
Ｎ）を実行する。段階３５５の後で段階３８０が発生し
得るようにするために、前記一般用プロセッサ１１０
は、段階３６０及び３７０とを含む待機ループを実行
し、ベクトルプロセッサ１２０が遊休状態となるまで待
機する。前記ベクトルプロセッサ１２０が段階３６５を
実行する前、前記一般用プロセッサ１１０が段階３６０
に至ることも、１つの可能な順序である。この場合、前
記一般用プロセッサ１１０は、前記ベクトルプロセッサ
１２０が遊休状態となるまで、反復的に命令３６０及び
３７０とを実行する。また、前記一般用プロセッサ１１
０が段階３６０に至る前、前記プロセッサ１２０が段階
３６５を遂行することも可能な順序である。この場合
は、前記一般用プロセッサ１１０が段階３６０及び３７
０とを同時に遂行する。At step 300, the general purpose processor 110
Continues to execute its program thread, but in step 3
80 must be synchronized with the result from the vector processor 120. The vector processor 120 executing the program thread completes the result in step 355 and causes the instruction (VCINT or VCJOI) in step 365 to abort execution in step 375.
N). In order for step 380 to occur after step 355, the general processor 110
Executes a wait loop including steps 360 and 370, and waits until the vector processor 120 is idle. Before the vector processor 120 performs step 365, the general purpose processor 110 performs step 360.
Is also one possible order. In this case, the general processor 110 repeatedly executes the instructions 360 and 370 until the vector processor 120 enters an idle state. Further, the general processor 11
Before 0 reaches step 360, the processor 120 may perform step 365. In this case, the general processor 110 determines in steps 360 and 37
And 0 are performed simultaneously.

【００２９】段階３８０では、前記一般用プロセッサ１
１０が結果を操作し、他のＳＴＡＲＴＶＰ命令で、前記
ベクトルプロセッサ１２０を単純に初期化し、再開始さ
せる。その以前のプログラムスレッド用の内容スイッチ
がある場合は、前記一般用プロセッサ１１０は、以前に
貯蔵された内容を復旧し、インタラプトされたプログラ
ムスレッドを再開始させるサブルーチンの住所として、
前記ベクトルプロセッサ１２０を初期化することができ
る。In step 380, the general processor 1
10 manipulates the result and simply initializes and restarts the vector processor 120 with another STARTVP instruction. If there is a content switch for the previous program thread, the general processor 110 may restore the previously stored content and restart the interrupted program thread as a subroutine address:
The vector processor 120 can be initialized.

【００３０】図４は、本発明の一実施の形態として示し
た同期化のためにポーリングを使用する二重スレッド過
程４００のまた他の例を図示する。前記過程４００は、
図３に図示された過程３００で記載されたとおり始ま
る。特に、一般用プロセッサ１１０は、前記ベクトルプ
ロセッサ１２０が遊休状態となるまで待機ループ３１５
で待機し、いずれベクトルプロセッサ１２０が段階３４
５でプログラムスレッドを実行させ始めるかによって、
段階３４０で前記ベクトルプロセッサ１２０を開始させ
る。過程４００で、前記一般用プロセッサ１１０は、前
記一般用プロセッサ１１０が段階４８０を実行する前、
前記ベクトルプロセッサ１２０が段階３５５の結果を完
成することを要請する。過程４００は、前記ベクトルプ
ロセッサ１２０が段階４５５の実行に従う遊休状態に入
らず、レジスタ（ＶＰＳＴＡＴＥ）が同期化に使用され
ないという点で、過程３００とは異なる。その代わり
に、ループ４６０は、段階４８０及び３５５を同期化す
るために、拡張レジスタ（ＶＡＳＹＮＣ）をポーリング
する。FIG. 4 illustrates yet another example of a dual threading process 400 that uses polling for synchronization, as shown in one embodiment of the present invention. The step 400 includes:
It begins as described in the process 300 illustrated in FIG. In particular, the general-purpose processor 110 executes a standby loop 315 until the vector processor 120 enters an idle state.
And the vector processor 120 eventually returns to step 34.
Depending on whether to start running the program thread at 5
In step 340, the vector processor 120 is started. In step 400, before the general processor 110 performs step 480,
The vector processor 120 requests that the result of step 355 be completed. Step 400 differs from step 300 in that the vector processor 120 does not enter an idle state according to the execution of step 455 and the register (VPSTATE) is not used for synchronization. Instead, loop 460 polls the extension register (VASYNC) to synchronize stages 480 and 355.

【００３１】ループ４６０で、第１段階４６２は拡張レ
ジスタ（ＶＡＳＹＮＣ）を読出する。好ましい実施の形
態で、前記一般用プロセッサ１１０は拡張レジスタを読
出し、前記レジスタのフラグビット（ビット＜３０＞）
をセッティングする命令（ＴＥＳＴＳＥＴ）を利用し、
拡張レジスタを読出することができる。レジスタ数１５
が命令（ＴＥＳＴＳＥＴ）の受信先として利用される場
合、前記フラグビットは、前記一般用プロセッサ１１０
の状態レジスタにあるＺビット（ゼロビット）に伝送さ
れる。過程４００は、段階３５５が完了されたことを知
らせるため、前記フラグビットを消去するベクトルプロ
セッサに依存するので、段階３５５は、命令４６８が状
態ビットＺが０（ｚｅｒｏ）でない場合、段階４６２に
再ブランチングすることによって完了されたか否かを決
定する。前記フラグビットが０の場合は、前記一般用プ
ロセッサ１１０は、段階４８０に移動し過程を遂行し続
ける。段階４６２で命令語（ＴＥＳＴＳＥＴ）を利用す
ると、拡張レジスタ（ＶＡＳＹＮＣ）内のフラグビット
がループ４６０と同一な将来の同期化ループのために自
動にリセットされる長所がある。In a loop 460, a first step 462 reads the extension register (VASYNC). In a preferred embodiment, the general processor 110 reads an extension register and sets a flag bit (bit <30>) of the extension register.
Using the instruction to set (TESTSET),
The extension register can be read. Number of registers 15
Is used as the destination of the instruction (TESTSET), the flag bit is
Is transmitted to the Z bit (zero bit) in the status register. Since the process 400 relies on a vector processor to clear the flag bit to signal that step 355 has been completed, step 355 returns to step 462 if the instruction 468 does not have the status bit Z equal to zero. Determine whether it is completed by branching. If the flag bit is 0, the general processor 110 moves to step 480 and continues to perform the process. Using the command (TESTSET) in step 462 has the advantage that the flag bit in the extension register (VASYNC) is automatically reset for the same future synchronization loop as the loop 460.

【００３２】制御プロセッサ命令語（ＶＰＳＴＡＲＴ）
は、ベクトルプロセッサ命令語（ＶＣＪＯＩＮ）を利用
して、図１のマルチプロセッサ１００は、典型的な並列
実行と順次実行プログラミングを支持する。図５及び図
６は、典型的な並列実行の２つの場合を示している。図
５で、一般用プロセッサ１１０は、ベクトルプロセッサ
１２０用の実行スレッド５２０を分岐させる命令ストリ
ーム５１０を実行する。命令語（ＳＴＡＲＴＶＰ；５１
２）は、前記ベクトルプロセッサ１２０が命令の引出を
始めるターゲットアドレスを規定する。従って、ベクト
ルプロセッサ１２０によって実行される第１命令５２２
が命令５１２の後を従い、一般用プロセッサン１１０に
よって実行される命令に対して並列に遂行される。一般
用プロセッサ１１０は、図３及び図４とを参照し、上述
したとおりの待機ループ５１４に到着するまで、その命
令ストリームを実行し続ける。前記ベクトルプロセッサ
１２０はレジスタ（ＶＡＳＹＮＣ）をクリアーし、前記
ベクトルプロセッサ１２０を遊休状態に置く命令（ＶＣ
ＪＯＩＮ；５２４）に到着するまで、その命令ストリー
ムを実行する。（また他の実施例で）、前記ベクトルプ
ロセッサ１２０は、前記一般用プロセッサ１１０にイン
タラプトを要請し、前記一般用プロセッサ１１０によっ
て実行されるインタラプトハンドラーは、レジスタ（Ｖ
ＰＳＴＡＴＥ）をクリアーする。図５で、前記一般用プ
ロセッサ１１０は、前記ベクトルプロセッサ１２０が命
令語ＶＣＪＯＩＮ５２４に到達する前、待機ループ５１
４に到達し、前記一般用プロセッサ１１０のスピンは、
前記ベクトルプロセッサ１２０がその与えられた仕事を
完了するまで待機する。Control processor instruction word (VPSTART)
Utilizing the vector processor instruction word (VCJOIN), the multiprocessor 100 of FIG. 1 supports typical parallel execution and sequential execution programming. 5 and 6 show two cases of typical parallel execution. In FIG. 5, the general-purpose processor 110 executes an instruction stream 510 that causes the execution thread 520 for the vector processor 120 to branch. Command word (STARTVP; 51)
2) specifies a target address at which the vector processor 120 starts fetching instructions. Therefore, the first instruction 522 executed by the vector processor 120
Follows the instruction 512 and is performed in parallel with the instruction executed by the general purpose processor 110. The general purpose processor 110 continues to execute its instruction stream until it reaches the wait loop 514 as described above with reference to FIGS. The vector processor 120 clears a register (VASYNC) and places an instruction (VC
Execute the instruction stream until it arrives at JOIN (524). In another embodiment, the vector processor 120 requests an interrupt from the general processor 110, and the interrupt handler executed by the general processor 110 stores a register (V
Clear (PSTATE). In FIG. 5, before the vector processor 120 reaches the command VCJOIN 524, the general processor 110 waits for a standby loop 51.
4 and the general processor 110 spins:
Wait until the vector processor 120 completes its assigned task.

【００３３】選択的に、図６の図示のように、前記ベク
トルプロセッサ１２０は命令ストリーム５４０を完了
し、前記一般用プロセッサ１１０が待機ループ５３４に
到達する前、命令５４２から始まって命令ＶＣＪＯＩＮ
に終結する。この場合、一般用プロセッサ１１０は、ス
ピン待機することに時間を消費するがその代わりに待機
ループ５２４を通過する。しかし、前記ベクトルプロセ
ッサ１２０は、前記一般用プロセッサ１１０がベクトル
プロセッサを再開始させる時、命令ＶＣＪＯＩＮ５６４
が待機ループ５２４の後に実行される時点から遊休状態
となる。Optionally, as shown in FIG. 6, the vector processor 120 completes the instruction stream 540 and before the general processor 110 reaches the wait loop 534, starting at instruction 542 and executing the instruction VCJOIN.
Ends in In this case, the general-purpose processor 110 spends time waiting for the spin, but instead passes through the waiting loop 524. However, when the general processor 110 restarts the vector processor, the vector processor 120 executes the instruction VCJOIN564.
Is idle from the point in time that is executed after the standby loop 524.

【００３４】並列プログラミングパラダイムは、前記一
般用プロセッサ１１０上で、スカラー部分に対し並列に
計算を実行する間、マルチスレッド型の並列プログラム
がベクトルプロセッサ１２０の力を利用するので、高度
の性能を提供する。プロセッサ１１０、１２０間のデー
タ交換は、前記一般用プロセッサ１１０によって遂行さ
れる待機ループによって表示される同期化ポイントから
発生する。前記ベクトルプロセッサ１２０にスピン待機
は要求されない。図７は、図１のマルチプロセッサ１０
０用の典型的な順次実行プログラミングを示している。
典型的な順次実行プログラミングでは、一般用プロセッ
サ１１０が命令語ＳＴＡＲＴＶＰ５５２として実行スレ
ッドを分岐し、すぐ待機ループ５５４に入る。前記一般
用プロセッサ１１０は、前記ベクトルプロセッサ１２０
が命令５６２から命令ＶＣＪＯＩＮ５６４へのプログラ
ムシーケンス５６０を完了するまで、ループ５５４で待
機する。前記ベクトルプロセッサ１２０が命令ＶＣＪＯ
ＩＮ５６４を遂行し遊休状態となると、前記一般用プロ
セッサ１１０は、待機ループ５５４から抜け出て、命令
５６２から命令５６４に順次に従う実行命令を始める。
前記典型的な順次実行プログラミングは、典型的な並列
実行プログラミングに比して効果的でないが、論理的に
より簡単である。The parallel programming paradigm provides a high level of performance because a multi-threaded parallel program utilizes the power of the vector processor 120 while performing calculations on the scalar portion in parallel on the general purpose processor 110. I do. The data exchange between the processors 110, 120 originates from a synchronization point indicated by a waiting loop performed by the general purpose processor 110. No spin waiting is required for the vector processor 120. FIG. 7 shows the multiprocessor 10 of FIG.
1 illustrates a typical sequential programming for zero.
In typical sequential execution programming, the general purpose processor 110 branches the execution thread as the instruction word STARTVP 552 and immediately enters the waiting loop 554. The general processor 110 includes the vector processor 120.
Waits in a loop 554 until it completes the program sequence 560 from instruction 562 to instruction VCJOIN 564. The vector processor 120 executes the instruction VCJO
After performing IN564 and entering the idle state, the general purpose processor 110 exits the standby loop 554 and starts executing instructions sequentially following instructions 562 to 564.
The typical sequential execution programming is less effective than the typical parallel execution programming, but is logically simpler.

【００３５】図７の典型的な順次実行プログラミングま
たは、図６の典型的な並列実行上の変化は、前記一般用
プロセッサ１１０の雄一な機能が前記ベクトルプロセッ
サ１２０を開始させ、インタラプトと例外状況を操作す
る一方、前記ベクトルプロセッサ１２０が全プログラム
を実行することができるようにする。このような変化
は、前記ベクトルプロセッサ１２０が前記一般用プロセ
ッサ１１０の２倍の作動周波数で作動し、前記一般用プ
ロセッサ１１０よりずっと強力な典型的な実施例で有用
である。本発明を、特定の好ましい実施例に関連して図
示し説明したが以下の特許請求の範囲による本発明の精
神や分野を離脱しない限度内で、本発明が多様に改造及
び変化され得るということは、当業界の通常の知識を有
する者は容易に分かる。A change in the exemplary sequential execution programming of FIG. 7 or the exemplary parallel execution of FIG. 6 is that the unique function of the general processor 110 causes the vector processor 120 to start, causing interrupts and exceptional situations. While operating, the vector processor 120 is able to execute the entire program. Such a change is useful in exemplary embodiments where the vector processor 120 operates at twice the operating frequency of the general purpose processor 110 and is much more powerful than the general purpose processor 110. While the invention has been illustrated and described with respect to certain preferred embodiments, it will be understood that the invention is capable of various modifications and changes within the spirit and scope of the invention as defined by the following claims. Is readily apparent to those of ordinary skill in the art.

【００３６】[0036]

【実施例１】実施例において、プロセッサ１１０は、Ａ
ＲＭ７プロセッサ用の標準規格に適合した一般用プロセ
ッサである。ＡＲＭ７プロセッサの詳細な内容及び命令
語セットに関しては、ＡＲＭ７データシート（文書番号
ＡＲＭ７ＤＤＩ００２０Ｃ、１９９４年１２月発刊）
を参照する。ＡＲＭ７命令語セットに対する拡張子は、
ベクトルプロセッサ１２０を作動及び停止させ；同期化
のための状態のようなベクトルプロセッサの状態をテス
トし；ベクトルプロセッサ１２０内のレジスト間にある
データを、一般用プロセッサ１１０内のレジスタまたは
拡張レジスタに伝送することによって、一般用プロセッ
サ１１０をベクトルプロセッサ１２０と相互作用させ
る。一般レジスタとベクトルレジスタ間の伝送を可能に
するためには、ローカルメモリのような媒介貯蔵所が必
要である。表１は、ベクトルプロセッサ１２０及びキャ
ッシュサブシステム１３０との相互作用を開示するため
の、一般用プロセッサ１１０用のＡＲＭ７命令語セット
に対する拡張子を示している。Embodiment 1 In an embodiment, the processor 110 is
This is a general processor conforming to the standard for the RM7 processor. For details of the ARM7 processor and the instruction set, refer to the ARM7 data sheet (document number ARM7 DDI 0020C, published in December 1994).
See The extension for the ARM7 instruction set is
Activate and deactivate the vector processor 120; test the state of the vector processor, such as the state for synchronization; transmit data between registers in the vector processor 120 to registers or extension registers in the general purpose processor 110 By doing so, the general processor 110 interacts with the vector processor 120. To enable transmission between the general register and the vector register, an intermediate storage such as a local memory is required. Table 1 shows the extensions to the ARM7 instruction set for the general purpose processor 110 to disclose the interaction with the vector processor 120 and the cache subsystem 130.

【００３７】[0037]

【表１】 [Table 1]

【００３８】表２はＡＲＭ７例外目録を示している。こ
れは間違った命令語(faulting instruction)を実行する
前、検出及び記録される。例外ベクトルアドレスは、１
６進法で与えられる。Table 2 shows the ARM7 exception list. This is detected and recorded before executing the wrong faulting instruction. The exception vector address is 1
Given in hexadecimal.

【００３９】[0039]

【表２】 [Table 2]

【００４０】次に、ＡＲＭ７命令語セットに対する拡張
子構文を説明する。ＡＲＭ７構造は、コプロセッサイン
ターフェースのための３個の命令語形式を提供する：１．コプロセッサデータ（ＣＤＰ）形式２．コプロセッサデータ伝送（ＬＤＣ／ＳＴＣ）形式３．コプロセッサレジスタ伝送（ＭＲＣ／ＭＣＲ）形式ＣＤＰ形式命令語はＡＲＭ７プロセッサであって、バッ
ク(back)通信しない作業に使用される。表３は、前記Ｃ
ＤＰ形式命令語のフィールドを定義している。Next, the extension syntax for the ARM7 instruction set will be described. The ARM7 structure provides three instruction formats for the coprocessor interface: 1. Coprocessor data (CDP) format 2. Coprocessor data transmission (LDC / STC) format Coprocessor Register Transfer (MRC / MCR) Format The CDP format command is an ARM7 processor and is used for tasks that do not communicate back. Table 3 shows that the C
Defines the field of DP format command.

【００４１】[0041]

【表３】 [Table 3]

【００４２】コプロセッサデータ伝送形式（ＬＤＣ／Ｓ
ＴＣ）は、コプロセッサレジスタのサブセットを、メモ
リからまたはメモリに直接積載または貯蔵する。ＡＲＭ
７プロセッサは、単語住所を供給し、前記コプロセッサ
はデータを供給または受け入れ、伝送される単語の数を
制御する。表４はＬＤＣ／ＳＴＣ形式内のフィールドを
定義している。 Coprocessor data transmission format (LDC / S
TC) loads or stores a subset of coprocessor registers from or directly to memory. ARM
The seven processors supply word addresses, and the coprocessor supplies or accepts data and controls the number of words transmitted. Table 4 defines the fields in the LDC / STC format.

【００４３】[0043]

【表４】 [Table 4]

【００４４】コプロセッサレジスタ伝送形式（ＭＲＣ、
ＭＣＲ）は、ＡＲＭ７レジスタとコプロセッサレジスタ
間に直接に情報を通信させるために使用される。表５は
ＭＲＣ／ＭＣＲ形式を有する命令語のフィールドを定義
している。 Coprocessor register transmission format (MRC,
MCR) is used to communicate information directly between ARM7 registers and coprocessor registers. Table 5 defines the fields of the instruction word having the MRC / MCR format.

【００４５】[0045]

【表５】 [Table 5]

【００４６】拡張ＡＲＭ命令語拡張ＡＲＭ命令語は下記のとおりである：キャッシュ（キャッシュ作業）形式：ＬＤＣ／ＳＴＣＬ＝０；ＣＲｎ＝ＯＰｃ；ＣＰ＃＝１１１１．アッセンブリー構文：ＳＴＣ｛ｃｏｎｄ｝ｐ１５、ｃＯｐｃ、＜Ａｄｄｒ
ｅｓｓ＞キャッシュ｛ｃｏｎｄ｝Ｏｐｃ、＜Ａｄｄｒｅｓｓ
＞ここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝であり、Ｏｐｃ＝｛０，１，
２，３｝である。命令キャッシュのため、ＬＤＣ／ＳＴ
Ｃ形式のフィールドＣＲｎは、オペコード（ｏｐｃｏｄ
ｅ；Ｏｐｃ）を指定する。従って、前記オペコードの十
進法表記を第１構文内で行う場合は、文字‘ｃ’が数字
の前に添加される（すなわち、０の代わりにｃ０を使
用）。これに関しては、アドレスモード構文用ＡＲＭ７
データシートを参照すればよい。説明：命令キャッシュは、Ｃｏｎｄが真(true)の場合の
み実行される。フィールドＯｐｃ＜３：０＞は下記の作
業を表わす。 Extended ARM Command The extended ARM command is as follows: Cache (cache work) type: LDC / STC L = 0; CRn = OPc; CP # = 1111. Assembly syntax: STC {cond} p15, cOpc, <Addr
ess> Cache \ cond \ Opc, <Address
> Where cond = {eq, ne, cs, cc,
mi, pl, vs, vc, hi, ls, ge, lt, g
t, le, al, nv} and Opc = {0, 1,
It is 2,3｝. LDC / ST for instruction cache
The C format field CRn contains an opcode (opcode).
e; Opc). Thus, when the decimal notation of the opcode is performed in the first syntax, the letter 'c' is added before the number (ie, use c0 instead of 0). In this regard, ARM7 for address mode syntax
Refer to the data sheet. Description : The instruction cache is only executed if Cond is true. Field Opc <3: 0> represents the following operations.

【００４７】[0047]

【表６】 [Table 6]

【００４８】ＥＡの計算法に関しては、ＡＲＭ７データ
シートを参照する。例外：ＡＲＭ７保護違反ＩＮＴＶＰ(Interrupt Vector Processor；インタラプ
トベクトルプロセッサ）形式：ＣＤＰここにおいて、Ｏｐｃ＝０００１；０；ＣＰ＃＝０１
１１；そしてＣＲｎ、ＣＲｄ、ＣＰ及びＣＲｍは使用さ
れない。アッセンブラー構文：ＣＤＰ｛ｃｏｎｄ｝ｐ７、１、ｃ
０、ｃ０、ｃ０ここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝である。ビット１９：１２、
７：１５及び３：０はリザーブされる。For the calculation method of EA, refer to the ARM7 data sheet. Exception: ARM7 protection violation INTVP (Interrupt Vector Processor)
Vector processor) format: CDP where Opc = 0001; 0; CP # = 01
11; and CRn, CRd, CP and CRm are not used. Assembler syntax: CDP \ cond \ p7, 1, c
0, c0, c0 where cond = {eq, ne, cs, cc,
mi, pl, vs, vc, hi, ls, ge, lt, g
t, le, al, nv}. Bits 19:12,
7:15 and 3: 0 are reserved.

【００４９】説明：Ｃｏｎｄが真の場合、命令ＩＮＴＶ
Ｐは、ベクトルプロセッサに停止命令を伝達する。一実
施例において、命令ＩＮＴＶＰは、ビットＣＳＥをセッ
ティングさせる。このようなビットＣＳＥのセッティン
グは、ベクトルプロセッサがその現在の文脈を貯蔵して
から停止し、次いでベクトルプロセッサが条件文脈貯蔵
命令ＶＣＣＳを実行せねばならないことを表わすもので
ある。ＡＲＭ７プロセッサは、ベクトルプロセッサの停
止状態を待たず、継続的に次の命令を実行する。ＡＲＭ
７プロセッサは、ＭＦＥＲ使用−待機ループを実行する
ことによって、ベクトルプロセッサが前記ＩＮＴＶＰ命
令以降停止されたか否かを判断する。前記ＩＮＴＶＰ命
令は、ベクトルプロセッサが既にＶＰ＿ＩＤＬＥ状態に
ある場合は、何等の影響を及ぼさない。Description: If Cond is true, the instruction INTV
P transmits a stop instruction to the vector processor. In one embodiment, instruction INTVP causes bit CSE to be set. The setting of such a bit CSE indicates that the vector processor must store its current context and then stop, and then execute the store conditional context instruction VCCS. The ARM7 processor continuously executes the next instruction without waiting for the halt state of the vector processor. ARM
The 7 processor executes the MFER use-wait loop to determine whether the vector processor has been stopped since the INTVP instruction. The INTVP instruction has no effect if the vector processor is already in the VP_IDLE state.

【００５０】例外：ベクトルプロセッサの使用不能ＭＦＥＲ（拡張レジスタからの移動）形式：ＭＲＣここにおいて、Ｏｐｃ＝０１０；Ｌ＝１ＣＲｎ＝Ｃ
Ｐ；ＣＰ＃＝０１１１；ＣＰは使用されない；ＣＲｍ
＝ＥＲ。アッセンブラー構文：ＭＲＣ｛ｃｏｎｄ｝ｐ７、２、Ｒｄ、ｃＰ、ｃＥＲ、
ＯＭＦＥＲ｛ｃｏｎｄ｝Ｒｄ、ＲＮＡＭＥここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝、Ｒｄ＝｛ｒ０，．．．，ｒ１
５｝、Ｐ｛０，１｝、ＥＲ＝｛０，．．．．，１５｝、
そしてＲＮＡＭＥは拡張レジスタを言及する。Exception: Unusable MFER (movement from extension register) of vector processor Form: MRC where Opc = 010; L = 1 CRn = C
P; CP # = 0111; CP not used; CRm
= ER. Assembler syntax: MRC {cond} p7, 2, Rd, cP, cER,
O MFER ｛cond｝ Rd, RNAME where cond = ｛eq, ne, cs, cc,
mi, pl, vs, vc, hi, ls, ge, lt, g
t, le, al, nv}, Rd = {r0,. . . , R1
5}, P {0,1}, ER = {0,. . . . , 15｝,
RNAME then refers to the extension register.

【００５１】説明：命令語ＭＦＥＲはＣｏｎｄが真の場
合のみ実行される。以下、表７に記載されたＰ：ＥＲ＜
３：０＞により定義される拡張レジスタ（ＥＲ）から出
力されるデータは、ＡＲＭ７レジスタ（Ｒｄ）に移動さ
れる。Description: The instruction word MFER is executed only when Cond is true. Hereinafter, P: ER <described in Table 7
The data output from the extension register (ER) defined by 3: 0> is moved to the ARM7 register (Rd).

【００５２】[0052]

【表７】 [Table 7]

【００５３】例外：使用者モード途中でＰＥＲｘに接近
しようとする場合の保護違反ＭＦＶＰ（ベクトルレジスタからの移動）形式：ＭＲＣ／ＭＣＲここにおいて、Ｏｐｃ＝００１；Ｌ＝１；ＣＰ＃＝０１
１１；そしてＣＰは使用されない。アッセンブラー構文：ＭＲＣ｛ｃｏｎｄ｝ｐ７、Ｒｄ、ＣＲｎ、ｃＲｍ、ＯＭＦＶＰ｛ｃｏｎｄ｝Ｒｄ、ＲＮＡＭＥここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝、Ｒｄ＝｛ｒ０，．．．，ｒ１
５｝、ＣＲｎ＝｛ｃ０，．．．．，ｃ１５｝、ＣＲｍ＝
｛ｃ０，．．．．，ｃ１５｝そして、ＲＮＡＭＥはベク
トルプロセッサ内のスカラーまたは特殊用レジスタを言
及する。Exception: Protection violation MFVP (movement from vector register) when trying to approach PERx in the middle of user mode: MRC / MCR where Opc = 001; L = 1; CP # = 01
11; and no CP is used. Assembler syntax: MRC ｛cond｝ p7, Rd, CRn, cRm, OMFVP ｛cond｝ Rd, RNAME where cond = ｛eq, ne, cs, cc,
mi, pl, vs, vc, hi, ls, ge, lt, g
t, le, al, nv}, Rd = {r0,. . . , R1
5}, CRn = {c0,. . . . , C15}, CRm =
{C0,. . . . , C15} and RNAME refers to a scalar or special purpose register in the vector processor.

【００５４】説明：命令語ＭＦＶＰは、Ｃｏｎｄが真の
場合のみ実行される。ＣＲｎ＜１：０＞：ＣＲｍ＜３：
０＞により定義されるベクトルプロセッサのスカラーレ
ジスタ、または特殊用レジスタから出力されるデータ
は、ＡＲＭ７レジスタ（Ｒｄ）に移動される。ビットＣ
Ｒｎ＜１：０＞：ＣＲｍ＜３：０＞はリザーブされる。
表８は、ＣＲｎ＜１：０＞：ＣＲｍ＜３：０＞から、ベ
クトルプロセッサ内のスカラーレジスタ（ＳＲ０−ＳＲ
１５）及び、特殊用レジスタ（ＳＰ０−ＳＰ１５）への
マッピングを示している。Description: The instruction MFVP is executed only when Cond is true. CRn <1: 0>: CRm <3:
The data output from the scalar register of the vector processor defined by 0> or the special register is moved to the ARM7 register (Rd). Bit C
Rn <1: 0>: CRm <3: 0> are reserved.
Table 8 shows that, from CRn <1: 0>: CRm <3: 0>, a scalar register (SR0-SR
15) and mapping to special registers (SP0-SP15).

【００５５】[0055]

【表８】 [Table 8]

【００５６】ＳＲ０は、常に３２ビットのゼロを読出
し、ＳＲ０に記録することは無視される。例外：ベクトルプロセッサ使用不能ＭＴＥＲ（拡張レジスタへの移動）形式：ＭＲＣ／ＭＣＲここにおいて、Ｏｐｃ＝０１０；Ｌ＝０；ＣＲｎ＝ｃ
Ｐ；ＣＰ＃＝０１１１；ＣＰは使用されない。ＣＲｍ＝
ＥＲ。アッセンブラー構文：ＭＲＣ｛ｃｏｎｄ｝ｐ７、２、Ｒｄ、Ｃｐ、ｃＥＲ、ＯＭＦＥＲ｛ｃｏｎｄ｝Ｒｄ、ＲＮＡＭＥここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝、Ｒｄ＝｛ｒ０，．．．，ｒ１
５｝、Ｐ＝｛０，１｝、ＥＲ＝｛０，．．．．，１
５｝、そしてＲＮＡＭＥはレジスタ連想記号（すなわ
ち、ＰＥＲ０）を言及する。SR0 always reads 32 bits of zero, and recording in SR0 is ignored. Exception: Vector processor unavailable MTER (move to extension register) Format: MRC / MCR where Opc = 010; L = 0; CRn = c
P; CP # = 0111; CP is not used. CRm =
ER. Assembler syntax: MRC ｛cond｝ p7,2, Rd, Cp, cER, O MFER ｛cond｝ Rd, RNAME where cond = ｛eq, ne, cs, cc,
mi, pl, vs, vc, hi, ls, ge, lt, g
t, le, al, nv}, Rd = {r0,. . . , R1
5}, P = {0,1}, ER = {0,. . . . , 1
5}, and RNAME refers to the register associative symbol (ie, PER0).

【００５７】説明：命令語ＭＦＥＲは、Ｃｏｎｄが真の
場合のみ実行される。ＡＲＭ７レジスタ（Ｒｄ）からの
データは、前記表Ａ９の図示のとおり、Ｐ；ＥＲ＜３：
０＞によって定義される拡張レジスタ（ＥＲ）に移動す
る。例外：使用者モード途中でＰＥＲｘに接近しようとする
場合の保護違反ＭＴＶＰ（ベクトルレジスタへの移動）形式：ＭＲＣ／ＭＣＲここにおいて、Ｏｐｃ＝１；Ｌ＝０；ＣＰ＃＝０１１
１；そしてＣＰは使用されない。Description: The instruction word MFER is executed only when Cond is true. The data from the ARM7 register (Rd) is P; ER <3:
0> move to the extension register (ER) defined by Exception: Protection violation MTVP (movement to vector register) when trying to approach PERx in the middle of user mode: MRC / MCR where Opc = 1; L = 0; CP # = 011
1; and no CP is used.

【００５８】アッセンブラー構文：ＭＲＣ｛ｃｏｎｄ｝ｐ７、１、Ｒｄ、ＣＲｎ、ｃＲｍ、
ＯＭＦＶＰ｛ｃｏｎｄ｝ＲＮＡＭＥ、Ｒｄここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝、Ｒｄ＝｛ｒ０，．．．，ｒ１
５｝、ＣＲＮ＝｛ｃ０，．．．．ｃ１５｝、ＣＲｍ＝
｛ｃ０，．．．．，ｃ１５｝、そしてＲＮＡＭＥはレジ
スタ連想記号（すなわち、ＳＰ０またはＶＣＳＲ）を言
及する。Assembler syntax: MRC {cond} p7,1, Rd, CRn, cRm,
O MFVP {cond} RNAME, Rd where cond = {eq, ne, cs, cc,
mi, pl, vs, vc, hi, ls, ge, lt, g
t, le, al, nv}, Rd = {r0,. . . , R1
5}, CRN = {c0,. . . . c15｝, CRm =
{C0,. . . . , C15}, and RNAME refer to the register associative symbol (ie, SP0 or VCSR).

【００５９】説明：命令語ＭＴＶＰは、Ｃｏｎｄが真の
場合のみ実行される。ＡＲＭ７レジスタ（Ｒｄ）からの
データは、ベクトルプロセッサのスカラー／特殊用レジ
スタＣＲｎ＜１：０＞：ＣＲｍ＜３：０＞に移動され
る。ＣＲｎ：ＣＲｍからベクトルプロセッサのスカラー
及び特殊用レジスタへのマッピングは、前記表Ａ１０に
図示されている。例外：ベクトルプロセッサ使用不能ＰＦＴＣＨ（先取り）形式：ＬＤＣ／ＳＴＣここにおいて、Ｎ＝０；Ｌ＝１；ＣＲｎ＝００１０；Ｃ
Ｐ＃＝１１１１アッセンブラー構文：ＬＤＣ｛ｃｏｎｄ｝ｐ１５、２、＜アドレス＞ＰＦＴＣＨ｛ｃｏｎｄ｝＜アドレス＞ここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝アドレスモード構文に関して
は、ＡＲＭ７データシートを参照する。Description: The instruction MTVP is executed only when Cond is true. The data from the ARM7 register (Rd) is moved to the scalar / special registers CRn <1: 0>: CRm <3: 0> of the vector processor. The mapping from CRn: CRm to the scalar and special registers of the vector processor is illustrated in Table A10 above. Exception: Unusable vector processor PFTCH (prefetch) Format: LDC / STC where N = 0; L = 1; CRn = 0010; C
P # = 1111 Assembler syntax: LDC {cond} p15, 2, <address> PFTCH {cond} <address> where cond = {eq, ne, cs, cc,
mi, pl, vs, vc, hi, ls, ge, lt, g
For the t, le, al, nv @ address mode syntax, refer to the ARM7 data sheet.

【００６０】説明：命令語ＰＦＴＣＨは、Ｃｏｎｄが真
の場合のみ実行される。ＥＡによって指定されるキャッ
シュラインは、ＡＲＭ７データキャッシュで先取られ
る。ＥＡの計算法に関しては、ＡＲＭ７データシートを
参照する。例外：なしＳＴＡＲＴＶＰ（開始ベクトルプロセッサ）形式：ＣＤＰ形式ここにおいて、Ｏｐｃ＝００００；ＣＰ＃＝０１１１；
そしてＣＲｎ、ＣＲｄ、ＣＰ及びＣＲｍは使用されな
い。アッセンブラー構文：ＣＤＰ｛ｃｏｎｄ｝ｐ７、０、ｃ０、ｃ０、ｃ０ＳＴＡＲＴＶＰ｛ｃｏｎｄ｝ここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝アドレスモード構文に関して
は、ＡＲＭ７データシートを参照する。Description: The instruction word PFTCH is executed only when Cond is true. The cache line specified by the EA is prefetched in the ARM7 data cache. For the calculation of EA, refer to the ARM7 data sheet. Exception: None STARTVP (start vector processor) format: CDP format where Opc = 0000; CP # = 0111;
And CRn, CRd, CP and CRm are not used. Assembler syntax: CDP {cond} p7,0, c0, c0, c0 STARTVP {cond} where cond = {eq, ne, cs, cc, cc
mi, pl, vs, vc, hi, ls, ge, lt, g
For the t, le, al, nv @ address mode syntax, refer to the ARM7 data sheet.

【００６１】説明：命令語ＳＴＡＲＴＶＰは、Ｃｏｎｄ
が真の場合のみ実行される。命令語ＳＴＡＲＴＶＰは、
ベクトルプロセッサで実行開示信号を伝達し、自動にＶ
ＩＳＲＣ＜ｖｊｐ＞及びＶＩＳＲＣ＜ｖｉｐ＞を消去す
る。ＡＲＭ７プロセッサは、ベクトルプロセッサの実行
開始を待たず、次の命令語を連続的に実行する。前記ベ
クトルプロセッサは、このような命令語が実行される
前、好ましい状態に初期化される。命令語ＳＴＡＲＴＶ
Ｐは、ベクトルプロセッサが既にＶＰ＿ＲＵＮ状態にあ
る場合は、影響を及ぼさない。Explanation: The instruction word STARTVP is a command
Is executed only if is true. The instruction word STARTVP is
The execution signal is transmitted by the vector processor, and V
Erase ISRC <vjp> and VISRC <vip>. The ARM7 processor continuously executes the next instruction word without waiting for the execution of the vector processor to start. The vector processor is initialized to a preferred state before such an instruction is executed. Command STARTTV
P has no effect if the vector processor is already in the VP_RUN state.

【００６２】例外：ベクトルプロセッサ使用不能ＴＥＳＴＳＥＴ（テストセット）形式：ＭＲＣ／ＭＣＲここにおいて、Ｏｐｃ＝０００；Ｌ＝１；ＣＲｎ＝０；
ＣＰ＃＝０１１１；ＣＲｍ＝ＥＲ；そしてＣＰは使用さ
れない。アッセンブラー構文：ＭＲＣ｛ｃｏｎｄ｝ｐ７、０、Ｒｄ、ｃ０、ｃＥＲ、ＯＴＥＳＴＳＥＴ｛ｃｏｎｄ｝Ｒｄ、ＲＮＡＭＥここにおいて、ｃｏｎｄ＝｛ｅｑ，ｎｅ，ｃｓ，ｃｃ，
ｍｉ，ｐｌ，ｖｓ，ｖｃ，ｈｉ，ｌｓ，ｇｅ，ｌｔ，ｇ
ｔ，ｌｅ，ａｌ，ｎｖ｝、Ｒｄ＝｛ｒ０，．．．，ｒ１
５｝、ＥＲ＝｛０，．．．．，１５｝、そしてＲＮＡＭ
Ｅはレジスタ連想記号（すなわち、ＵＥＲ１またはＶＡ
ＳＹＮＣ）を言及する。Exception: Vector processor unusable TESTSET (test set) format: MRC / MCR where Opc = 000; L = 1; CRn = 0;
CP # = 0111; CRm = ER; and no CP is used. Assembler syntax: MRC ｛cond｝ p7,0, Rd, c0, cER, O TESTSET ｛cond｝ Rd, RNAME where cond = ｛eq, ne, cs, cc,
mi, pl, vs, vc, hi, ls, ge, lt, g
t, le, al, nv}, Rd = {r0,. . . , R1
5}, ER = {0,. . . . , 15｝, and RNAM
E is the register associative symbol (ie, UER1 or VA
SYNC).

【００６３】説明：命令語ＴＥＳＴＳＥＴは、Ｃｏｎｄ
が真の場合のみ実行される。命令語ＴＥＳＴＳＥＴは拡
張レジスタ（ＵＥＲｘ）の内容を、一般用レジスタ（Ｒ
Ｄ）に復旧させ、前記拡張レジスタ（ＵＥＲｘ＜３０
＞）を１にセッティングさせる。もしＡＲＭ７レジスタ
１５が受信レジスタとして指定されると、短い待機ルー
プが供給され得るように、ＵＥＲｘ＜３０＞がレジスタ
（ＣＰＳＲ）のビット（Ｚ）に復旧する。例外：無しDescription: The command word TESTSET is equivalent to Cond
Is executed only if is true. The instruction word TESTSET stores the contents of the extension register (UERx) in the general register (R
D) and restore the extension register (UERx <30
>) Is set to 1. If ARM7 register 15 is designated as a receive register, UERx <30> restores bit (Z) of the register (CPSR) so that a short wait loop can be provided. Exception: None

【００６４】[0064]

【実施例２】本実施例は、命令語ＶＣＩＮＴ、ＶＣＪＯ
ＩＮ、ＡＬＣＶＭＯＶに関して記述している。本発明
の実施例で、これの命令語は、ベクトルプロセッサが制
御プロセッサに対する同期化を支持するために使用され
る。関連出願として、前記に言及したとおりの１９９６
年８月１９日付で米国に出願された“マルチメディア信
号プロセッサにおける単一命令複数データ処理”では、
ベクトルプロセッサのための全体的な命令語セットが記
述されている。命令作業は、Ｃプログラミング言語とし
て定義される構造を使用するものとして定義される。ＶＣＩＮＴ（条件的なインタラプトＡＲＭ７） [Embodiment 2] In this embodiment, the command words VCINT and VCJO are used.
IN, ALC Describes VMOV. In an embodiment of the invention, these instructions are used by the vector processor to support synchronization to the control processor. As a related application, 1996 as referred to above
"Single Instruction Multiple Data Processing in Multimedia Signal Processor," filed in the United States on August 19,
An overall instruction set for a vector processor is described. Instruction work is defined as using a structure defined as the C programming language. VCINT (conditional interrupt ARM7)

【００６５】[0065]

【表９】 [Table 9]

【００６６】アッセンブラー構文：ＶＣＩＮＴ．ｃｏｎｄ＃ＩＣＯＤＥここにおいて、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，
ｇｔ，ｎｅ，ｇｅ，ｏｖ｝説明：もしＣｏｎｄが真であれば、実行を停止し、イン
タラプトが可能になると、ＡＲＭ７プロセッサをインタ
ラプトさせる。作動：条件：（（Ｃｏｎｄ＝＝ＶＣＳＲ［ＳＯ，ＧＴ，ＥＱ，
ＬＴ］）ｌ（Ｃｏｎｄ＝＝ｕｎ））｛ＶＩＳＲ＜ｖｉｐ＞＝１；ＶＩＩＮＳ＝［ＶＣＩＮＴ．ｃｏｎｄ＃ＩＣＯＤＥ命
令］；ＶＥＰＣ＝ＶＰＣ；（ＶＩＭＳＫ＜ｖｉｅ＞＝＝１）信号ＡＲＭ７インタラ
プト；ＶＰ＿ＳＴＡＴＥ−−ＶＰ＿ＩＤＬＥ；｝他のＶＰＣ＝ＶＰＣ＋４；例外；ＶＣＩＮＴインタラプトＶＣＪＯＩＮ（ＡＲＭ７タスクとの条件的な連合） Assembler syntax: VCINT. cond #ICODE where cond = {un, lt, eq, le,
gt, ne, ge, ov} Description: If Cond is true, stop execution and, if interrupts are enabled, cause the ARM7 processor to interrupt. Operation: Condition: ((Cond == VCSR [SO, GT, EQ,
LT]) 1 (Cond == un)) {VISR <vip> = 1; VIINS = [VCINT. VEPC = VPC; (VIMSK <vie> == 1) signal ARM7 interrupt; VP_STATE--VP_IDLE;｝ Other VPC = VPC + 4; Exception; VCINT interrupt VCJOIN (conditional association with ARM7 task)

【００６７】[0067]

【表１０】 [Table 10]

【００６８】アッセンブラー構文：ＶＣＪＯＩＮ．ｃｏｎｄ＃Ｏｆｆｓｅｔここにおいて、ｃｏｎｄ＝｛ｕｎ，ｌｔ，ｅｑ，ｌｅ，
ｇｔ，ｎｅ，ｇｅ，ｏｖ｝説明：もしＣｏｎｄが真であれば、実行を停止し、イン
タラプトが可能になると、ＡＲＭ７プロセッサをインタ
ラプトさせる。作動：Assembler syntax: VCJOIN. cond #Offset where cond = {un, lt, eq, le,
gt, ne, ge, ov} Description: If Cond is true, stop execution and, if interrupts are enabled, cause the ARM7 processor to interrupt. Actuation:

【００６９】条件：（（Ｃｏｎｄ＝＝ＶＣＳＲ［ＳＯ，
ＧＴ，ＥＱ，ＬＴ］）ｌ（Ｃｏｎｄ＝＝ｕｎ））｛ＶＩＳＲ＜ｖｊｐ＞＝１；ＶＩＩＮＳ＝［ＶＣ．ＪＯＩＮ．ｃｏｎｄ＃Ｏｆｆｓｅ
ｔ命令］；ＶＥＰＣ＝ＶＰＣ；（ＶＩＭＳＫ＜ｖｊｅ＞＝＝１）信号ＡＲＭ７インタラ
プト；ＶＰ＿ＳＴＡＴＥ−−ＶＰ＿ＩＤＬＥ；｝他のＶＰＣ＝ＶＰＣ＋４；例外；ＶＣＪＯＩＮインタラプトＶＭＯＶ（移動） Condition: ((Cond == VCSR [SO,
GT, EQ, LT]) l (Cond == un)) ｛VISR <vjp> = 1; VIINS = [VC. JOIN. cond # Offse
t Instruction]; VEPC = VPC; (VIMSK <vje> == 1) signal ARM7 interrupt; VP_STATE - VP_IDLE;} other VPC = VPC + 4; exception; VCJOIN interrupt VMOV (movement)

【００７０】[0070]

【表１１】 [Table 11]

【００７１】アッセンブラー構文：ＶＭＯＶ．ｄｔＲｂ、Ｒｄここにおいて、ｄｔ＝｛ｂ、ｂ９、ｈ，ｗ，ｆ｝、Ｒｄ
及びＲｂはレジスタの名称を示す。接尾語．ｗ及び．ｆ
は、同一作業を示す。補助モード：ｉｎｔ８（ｂ）、ｉｎｔ９（ｂ９）、ｉｎ
ｔ１６（ｈ）及びｉｎｔ３２（ｗ）説明：レジスタ（Ｒｂ）の内容はレジスタ（Ｒｄ）に移
動される。フィールドグループは表１２に定義されてい
る通りの、ソース及び受信レジスタを示す。Assembler syntax: VMOV. dt Rb, Rd where dt = {b, b9, h, w, f}, Rd
And Rb indicate the name of the register. Suffix. w and. f
Indicates the same operation. Auxiliary mode: int8 (b), int9 (b9), in
t16 (h) and int32 (w) Description: The contents of register (Rb) are moved to register (Rd). The field groups indicate source and receive registers as defined in Table 12.

【００７２】[0072]

【表１２】 [Table 12]

【００７３】表１２で、レジスタグループの表記は下記
のとおりである：ＶＲ：作動中のバンクベクトルレジスタＶＲＡ：代替バンクベクトルレジスタＳＲ：スカラーレジスタＳＰ：特殊用レジスタＲＡＳＲ：復旧アドレススタック(stack)レジスタＶＡＣ：ベクトルアキュムレータレジスタ（表Ｂ．５
参照）ベクトルレジスタは、命令語ＶＭＯＶによりスカラーレ
ジスタに移動可能ではないが、命令語ＶＥＸＴＲＴによ
っては移動可能である。表１３は、ＶＡＣレジスタの数エンコーディングを定義
している。In Table 12, the notation of register groups is as follows: VR: Bank vector register in operation VRA: Alternative bank vector register SR: Scalar register SP: Special register RASR: Recovery address stack (stack) register VAC: Vector accumulator register (Table B.5)
(See Reference.) The vector register cannot be moved to the scalar register by the instruction word VMOV, but can be moved by the instruction word VEXTRT. Table 13 defines the number encoding of the VAC registers.

【００７４】[0074]

【表１３】 [Table 13]

【００７５】作動：Ｒｄ＝Ｒｂ例外；ＶＣＳＲまたはＶＩＳＲＣに例外状態ビットをセ
ッティングすることは、対応例外を招くことができる。プログラミングノート：命令語ＶＭＯＶは、要素マスク
(element mask)により影響されない。前記代替バンク概
念がＶＥＣ６４モード内にないので、命令語ＶＭＯＶ
は、ＶＥＣ６４モード内の代替バンクレジスタに、また
は代替バンクレジスタから移動することには使用されな
い。Operation: Rd = Rb exception; setting an exception status bit in VCSR or VISRC can result in a corresponding exception. Programming Note: Command VMOV is Element Mask
Not affected by (element mask). Since the alternative bank concept is not in the VEC64 mode, the instruction word VMOV
Is not used to move to or from the alternate bank register in VEC64 mode.

【００７６】[0076]

【発明の効果】前記ような本発明によると、高度の処理
動力を提供し、内蔵した多数の分離されたプログラムス
レッドが互換性を有し、単純な方式でプログラムを同期
化または整合できるマルチプロセッサシステムを提供す
ることができるようになる。According to the present invention, a multiprocessor that provides a high degree of processing power, has a large number of built-in separated program threads compatible, and can synchronize or match programs in a simple manner. System can be provided.

[Brief description of the drawings]

【図１】本発明の一実施の形態として示したマルチプ
ロセッサのブロックダイアグラムである。FIG. 1 is a block diagram of a multiprocessor shown as an embodiment of the present invention.

【図２】本発明の一実施の形態として示した制御プロ
セッサとベクトルプロセッサとの間のインターフェース
を示すブロックダイアグラムである。FIG. 2 is a block diagram showing an interface between a control processor and a vector processor shown as an embodiment of the present invention.

【図３】本発明の一実施の形態として示した並列プロ
グラムスレッドを同期化させる方法を示すフローダイア
グラムである。FIG. 3 is a flow diagram illustrating a method for synchronizing parallel program threads according to an embodiment of the present invention.

【図４】本発明の一実施の形態として示した並列プロ
グラムスレッドを同期化させる他の方法を示すフローダ
イアグラムである。FIG. 4 is a flow diagram illustrating another method of synchronizing parallel program threads according to an embodiment of the present invention.

【図５】図１のマルチプロセッサ用の並列ソフトウェ
ア方式及び、順次ソフトウェア方式を示す図面である。FIG. 5 is a diagram illustrating a parallel software method and a sequential software method for the multiprocessor of FIG. 1;

【図６】図１のマルチプロセッサ用の並列ソフトウェ
ア方式および、順次ソフトウェア方式を示す図面であ
る。FIG. 6 is a diagram illustrating a parallel software system and a sequential software system for the multiprocessor of FIG. 1;

【図７】図１のマルチプロセッサ用の並列ソフトウェ
ア方式および、順次ソフトウェア方式を示す図面で在
る。FIG. 7 is a diagram showing a parallel software system and a sequential software system for the multiprocessor of FIG. 1;

[Explanation of symbols]

１００：マルチプロセッサ１１０：一般用プロセッサ１１５：拡張レジスタ１２０：ベクトルプロセッサ１３０：キャッシュサブシステム１４０：システムバス１４２：システムタイマー１４６：ビットストリームプロセッサ１４８：インタラプトコントローラー１５０：システムバス１５４：ＤＭＡコントローラー１５６：ローカルバスインタフェース１８０：キャッシュコントロール２４４：ベクトルレジスタ２４５：特殊用レジスタ２４６：並列処理装置２６０：命令デコーダー２７０：実行データ経路２７２：レジスタファイル２７６：処理装置２８０：記録レジスタ２９０：読出レジスタ 100: Multiprocessor 110: General Processor 115: Extension Register 120: Vector Processor 130: Cache Subsystem 140: System Bus 142: System Timer 146: Bitstream Processor 148: Interrupt Controller 150: System Bus 154: DMA Controller 156: Local Bus interface 180: Cache control 244: Vector register 245: Special register 246: Parallel processing unit 260: Instruction decoder 270: Execution data path 272: Register file 276: Processing unit 280: Record register 290: Read register

───────────────────────────────────────────────────── フロントページの続き (72)発明者ル・トロン・ングイェンアメリカ合衆国・カリフォルニア・ 95030・モンテ・セレノ・ダニエル・プレイス・15095 ──────────────────────────────────────────────────の Continued on front page (72) Inventor Le Tron Nguyen United States of America California 95030 Monte Sereno Daniel Place 15095

Claims

[Claims]

1. a first processor; a second processor operable in a first state for executing a sequence of program instructions and a second state in an idle state; and the first and second processors being accessible. And a register coupled to the first and second processors and storing a value indicating whether the second processor is in the first state or the second state. .

2. The method according to claim 1, wherein the first processor includes a first execution data path for operating an operand having a predetermined width that does not exceed a first maximum width, and the second processor includes a first execution data path that does not exceed a second maximum width. 2. The integrated multiprocessor of claim 1, further comprising a second execution data path for enabling operands having a width of ?????, wherein said second maximum width is greater than said first maximum width. .

3. The integrated multiprocessor according to claim 2, wherein the second processor includes a plurality of processing units operated in parallel.

4. The integrated multiprocessor according to claim 3, wherein said second processor has a single instruction multiple data system.

5. A method of operating a multiprocessor, the method comprising: executing a first program thread on a first processor; causing a second processor to disclose a second program thread in response to an instruction executed by the first processor. Executing a loop in the first program thread, wherein the register accessible by the first and second processors in the loop is stored in the first program thread.
Reading on a processor; wherein the second processor is
Recording a value in the register indicating completion of at least a first portion of the program thread; and wherein the second processor
Responsive to the first processor reading the value indicating that a portion has been completed, comprising exiting the loop.

6. The method of claim 1, further comprising the step of executing a portion of the first program thread when the first portion of the second program thread has been completed after exiting the loop. 5. The method according to 5.

7. The method of claim 5, wherein the register stores a value indicating whether the second processor is active or idle.

8. The method of claim 2, wherein the second processor stops execution of the second program thread and records the value in the register during a process of executing a command to put the second processor in an idle state. The method of claim 7, wherein

9. The method of claim 5, wherein the second processor records the value in the register and continuously executes the second program thread.

10. The method of claim 5, wherein said first processor has a narrower data path than said second processor.

11. The method of claim 10, wherein said second processor has a single instruction multiple data scheme.

12. The method of claim 11, wherein the first processor has a general purpose structure.

13. Prior to executing said loop,
The method according to claim 5, wherein the value is recorded.

14. The method of claim 5, wherein the execution of the loop is performed primarily before recording the value and is repeated until after recording the value.