DE4216905C2

DE4216905C2 - Super scalar processor

Info

Publication number: DE4216905C2
Application number: DE4216905A
Authority: DE
Inventors: Hideki Ando
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1991-06-24
Filing date: 1992-05-21
Publication date: 1996-07-25
Anticipated expiration: 2012-05-22
Also published as: US5497496A; DE4216905A1; JPH052484A

Description

Die Erfindung bezieht sich auf einen Superskalarprozessor (Super scalar processor) nach dem Oberbegriff des Patentanspruchs 1.The invention relates to a superscalar processor (Super scalar processor) according to the preamble of claim 1.

Ein Superskalarprozessor ist ein Hochleistungsmikroprozessor mit einer Parallelverarbeitungseinrichtung vom sogenannten "superskalaren Typ", der beispielsweise in S. Mc Geady, "The i960CA Superscalar Implementation of the 80960 Architecture, COMPCON 1990, IEEE Seiten 232-240, oder Randy D. Groves "An IBM Second Generation RISC Processor Architecture", COMPCON 1990 IEEE Seiten 166-172, beschrieben ist. Beim superskalaren Typ führt eine Mehrzahl von parallel vorgesehenen Verarbei tungseinheiten (Prozessoreinheiten) eine Mehrzahl von Befehlen auf parallele Weise durch. Der superskalare Prozessor holt gleichzeitig eine Mehrzahl von Befehlen aus einem Befehls speicher und dekodiert diese. Es wählt diejenigen Befehle aus den dekodierten Befehlen aus, die auf parallele Weise verar beitet werden können, und legt diese an die Verarbeitungsein heiten an.A super scalar processor is a high-performance microprocessor with a parallel processing device from the so-called "superscalar type", for example in S. Mc Geady, "The i960CA Superscalar Implementation of the 80960 Architecture, COMPCON 1990, IEEE pages 232-240, or Randy D. Groves "An IBM Second Generation RISC Processor Architecture ", COMPCON 1990 IEEE pages 166-172. With superscalar Type carries out a plurality of processes provided in parallel processing units (processor units) a plurality of instructions in a parallel way. The superscalar processor picks up simultaneously a plurality of commands from one command store and decode them. It selects those commands the decoded instructions that processed in parallel can be processed, and submits them to the processing instructions.

Ein obiger superskalarer Prozessor soll für eine Vielzahl von Zwecken anwendbar sein, da die Verarbeitungsleistung verglichen mit der eines herkömmlichen normalen Mikroprozessors deutlich erhöht ist.The above superscalar processor is intended for a variety of Purposes can be used since the processing performance compared with that of a conventional normal microprocessor is increased.

Die Fig. 4 zeigt den allgemeinen Aufbau eines herkömmlichen superskalaren Prozessors. In der Figur ist eine Mehrzahl von zu verarbeitenden Befehlen in einem Befehlsspeicher 1 gespeichert. Eine Befehls-Holschaltung 2 liest eine Mehrzahl von Befehlen aus dem Befehlsspeicher 1 gleichzeitig aus (beispielsweise 4 Befehle) und holt diese. Ein Befehlsdekoder 3 dekodiert die Mehrzahl der von der Befehls-Holschaltung 2 angelegten Befehle, wählt Befehle aus, die auf eine parallele Weise verarbeitet werden können, und legt diese an Verarbeitungseinheiten 4 bis 7 an. Die Verarbeitungseinheiten 4 bis 7 weisen beispielsweise eine Pipelinestruktur auf, und jede dieser Einheiten führt unabhängig einen angelegten Befehl aus. Obwohl der in den Ver arbeitungseinheiten 4 bis 7 zu verarbeitende Inhalt unbestimmt sein kann, sind die Verarbeitungseinheiten 4 und 5 in Fig. 4 als Ganzzahl-Arithmetikeinheiten gebildet, die Verarbeitungs einheit 6 als Einheit zum Schreiben oder Lesen in einen Daten speicher 8 gebildet und die Verarbeitungseinheit 7 als Fließ- Arithmetikeinheit gebildet. Der Datenspeicher 8 ist ein Spei cher zum Speichern von Daten. Fig. 4 shows the general structure of a conventional superscalar processor. In the figure, a plurality of instructions to be processed are stored in an instruction memory 1 . An instruction fetch circuit 2 reads a plurality of instructions from the instruction memory 1 simultaneously (for example 4 instructions) and fetches them. A command decoder 3 decodes the majority of the commands applied by the command fetch circuit 2 , selects commands that can be processed in a parallel manner, and applies them to processing units 4 to 7 . The processing units 4 to 7 have, for example, a pipeline structure, and each of these units independently executes an applied command. Although the content to be processed in the processing units 4 to 7 may be indefinite, the processing units 4 and 5 in FIG. 4 are formed as integer arithmetic units, the processing unit 6 as a unit for writing or reading into a data memory 8 , and that Processing unit 7 formed as a flow arithmetic unit. The data memory 8 is a memory for storing data.

Da der in Fig. 4 gezeigte Superskalarprozessor eine Mehrzahl von Befehlen gleichzeitig auf parallele Weise ausführen kann, kann die Verarbeitungsgeschwindigkeit verglichen mit der eines normalen Mikroprozessors erhöht werden. Der in Fig. 4 gezeigte Superskalarprozessor arbeitet in jedem Zyklus eines Taktsigna les (nicht gezeigt), wobei er mit dem Taktsignal synchron läuft. Die Fig. 5 ist ein Diagramm mit einem Beispiel eines Befehls-Holvorganges und einer Befehlsausgabe des Superskalar prozessors nach Fig. 4 für vier aufeinanderfolgende Zyklen. Nachfolgend wird eine Beschreibung eines Beispieles des Betrie bes des in Fig. 4 gezeigten Superskalarprozessors vorgenommen, wobei auf Fig. 5 bezug genommen wird.Since the superscalar processor shown in Fig. 4 can execute a plurality of instructions simultaneously in a parallel manner, the processing speed can be increased compared to that of a normal microprocessor. The super scalar processor shown in Fig. 4 operates in each cycle of a clock signal (not shown), running in synchronism with the clock signal. FIG. 5 is a diagram with an example of an instruction fetching and an instruction output of the superscalar processor according to FIG. 4 for four successive cycles. A description will now be given of an example of the operation of the superscalar processor shown in FIG. 4, with reference to FIG. 5.

1) Cycle 1

Im ersten Zyklus liest das Befehls-Holregister 2 vier Befehle 1 bis 4 aus dem Befehlsspeicher 1 in dieser Reihenfolge. Die vier von der Befehls-Holschaltung 2 geholten Befehle 1 bis 4 werden durch den Befehlsdekoder 3 dekodiert. Wenn keine weiteren Befehle existieren, die auf parallele Weise mit dem Befehl 1 verarbeitet werden können, nimmt der Befehlsdekoder 3 nur den Befehl 1, der zuerst von der Befehls-Holschaltung geholt wurde, und legt diesen an eine der Verarbeitungseinheiten 4 bis 7 an. Die Nummern der vom Befehlsdekoder 3 angelegten Befehle sind unterstrichen.In the first cycle, the instruction fetch register 2 reads four instructions 1 to 4 from the instruction memory 1 in this order. The four instructions 1 to 4 fetched by the instruction fetch circuit 2 are decoded by the instruction decoder 3 . If there are no further instructions that can be processed in parallel with instruction 1, instruction decoder 3 only accepts instruction 1 that was first fetched by the instruction fetch circuit and applies it to one of processing units 4 to 7 . The numbers of the commands created by the command decoder 3 are underlined.

2) Cycle 2

Im zweiten Zyklus entscheidet der Befehlsdekoder 3, daß die Be fehle 2 und 3 auf parallele Weise verarbeitet werden können. Der Befehlsdekoder 3 holt dann die Befehle 2 und 3 aus der Be fehlsholschaltung 2 und legt jeden von diesen an beliebige der Verarbeitungseinheiten 4 bis 7 an.In the second cycle, the instruction decoder 3 decides that instructions 2 and 3 can be processed in a parallel manner. The instruction decoder 3 then fetches the instructions 2 and 3 from the instruction fetching circuit 2 and applies each of these to any of the processing units 4 to 7 .

3) Cycle 3

Da der Befehl 4 der einzig verbleibende in der Befehlsholschal tung 2 ist, nimmt im dritten Zyklus der Befehlsdekoder 3 den Befehl 4 aus dem Befehls-Holkreis 2 und legt diesen an die Ver arbeitungseinheit 7 an.Since the command 4, the only remaining processing in the Befehlsholschal 2, takes in the third cycle, the instruction decoder 3 the command 4 from the instruction Holkreis 2 and applies it to the processing unit 7 to Ver.

4) Cycle 4

Im vierten Zyklus liest die Befehls-Holschaltung 2 sequentiell vier neue Befehle 5 bis 8 aus dem Befehlsspeicher 1 aus. Zu diesem Zeitpunkt bestimmt der Befehlsdekoder 3, daß die Befehle 5 und 6 aus den von der Befehls-Holschaltung 2 geholten Befeh len auf parallele Weise verarbeitet werden können und legt die Befehle 5 und 6 an beliebige der Verarbeitungseinheiten 4 bis 7 an.In the fourth cycle, the instruction fetch circuit 2 sequentially reads four new instructions 5 to 8 from the instruction memory 1 . At this time, the instruction decoder 3 determines that the instructions 5 and 6 from the instructions fetched from the instruction fetch circuit 2 can be processed in parallel and applies the instructions 5 and 6 to any of the processing units 4 to 7 .

Bei dem in Fig. 4 gezeigten superskalaren Prozessor kann die Befehls-Holschaltung 2 keinen neuen Befehl aus dem Befehlsspei cher 1 holen, bis alle dort gehaltenen Befehle an die Verarbei tungseinheiten durch den Befehlsdekoder 3 weitergeleitet wurden. Folglich bestimmt der Befehlsdekoder 3 den Zusammenhang zwischen den Befehlen auf einer 4-Befehlsbasis. Selbst wenn beispielsweise der Befehl 4 und die Befehle 5 und 6 auf paral lele Weise ausgeführt werden könnten, werden Befehl 4 und Be fehle 5 und 6 an die Verarbeitungseinheiten in verschiedenen Zyklen angelegt. Daher können die Parallelverarbeitungsfähig keiten der Verarbeitungseinheiten 4 bis 7 nicht völlig genutzt werden, und eine potentielle Steigerung der Verarbeitungsge schwindigkeit bleibt ungenutzt.In the superscalar processor shown in FIG. 4, the instruction fetch circuit 2 can not fetch a new instruction from the instruction memory 1 until all instructions held there have been forwarded to the processing units by the instruction decoder 3 . Accordingly, the instruction decoder 3 determines the relationship between the instructions on a 4-instruction basis. For example, even if instruction 4 and instructions 5 and 6 could be executed in parallel, instruction 4 and instructions 5 and 6 are applied to the processing units in different cycles. Therefore, the parallel processing capabilities of processing units 4 to 7 cannot be fully used, and a potential increase in processing speed remains unused.

Aus IBM Journal of Research and Development, Vol. 34, No. 1, Januar 1990, S. 37 bis 58 ist ein Superskalarprozessor nach dem Oberbegriff des Patentanspruchs 1 bekannt.From IBM Journal of Research and Development, Vol. 34, No. 1, January 1990, pp. 37 to 58 a super scalar processor according to the generic term of Claim 1 known.

Aufgabe der Erfindung ist es, einen Superskalarprozessor nach dem Oberbegriff des Patentanspruchs 1 zu schaffen, der eine erhöhte Verarbeitungsgeschwindigkeit ermöglicht.The object of the invention is to provide a superscalar processor Preamble of claim 1 create an increased processing speed enables.

Die Aufgabe wird durch den Superskalarprozessor nach dem Patentanspruch 1 gelöst.The object is achieved by the superscalar processor according to claim 1.

Vorteilhafte Weiterbildungen sind in den Unteransprüchen be schrieben. Advantageous further developments are in the dependent claims wrote.

Die Anzahl der leeren Register in der Holvorrichtung wird in jedem Zyklus festgestellt, und die Anzahl der Befehle, die die Holvorrichtung aus der Befehlsspeichervorrichtung holt, wird für jeden Zyklus entsprechend der erkannten Anzahl gesteuert. Folglich kann die Holvorrichtung den nächsten Befehl aus der Befehlsspeichervorrichtung holen, ohne daß sie warten muß, bis alle eingelesenen Befehle an die Verarbeitungseinheiten ange legt (weitergeleitet) worden sind. Als Ergebnis wird die Anzahl von Befehlen, die von der Dekodervorrichtung während jedes Zyklus dekodiert werden, erhöht, und die Anzahl von parallel anzulegenden Befehlen kann erhöht werden. Folglich kann die Mehrzahl von Verarbeitungseinheiten effizienter arbeiten, und es wird ein Anstieg in der Verarbeitungsgeschwindigkeit erreicht.The number of empty registers in the fetching device is in found each cycle and the number of instructions that the Fetch device from the instruction storage device controlled for each cycle according to the detected number. Consequently, the fetch device can take the next command from the Get the instruction storage device without waiting for all read commands are sent to the processing units have been forwarded. As a result, the number of commands issued by the decoder device during each Cycle decoded, increased, and the number of parallel commands to be created can be increased. Consequently, the Plurality of processing units work more efficiently, and there will be an increase in processing speed reached.

Es folgt die Beschreibung von Ausführungsbeispielen anhand der Figuren.The following is a description of exemplary embodiments with reference to FIG Characters.

Von den Figuren zeigtFrom the figures shows

Fig. 1 ein Blockschaltbild mit dem Aufbau einer Ausfüh rungsform; Fig. 1 is a block diagram with the structure of an embodiment approximately;

Fig. 2 ein Diagramm mit einem Betriebsbeispiel der in Fig. 1 gezeigten Ausführungsform; Fig. 2 is a diagram showing an example of operation of the embodiment shown in Fig. 1;

Fig. 3 ein Diagramm mit dem Zusammenhang zwischen den Werten eines NUM-Signales aus Fig. 1 und den Auswahlzuständen jedes der Selektoren 100 bis 103, 200 bis 203; Fig. 3 is a diagram illustrating the relationship between the values of a signal-NUM of Figure 1 and the selection states of each of the selectors 100 to 103, 200-203.;

Fig. 4 ein Blockschaltbild mit einem allgemeinen Aufbau eines herkömmlichen Superskalarprozessors; und Fig. 4 is a block diagram showing a general configuration of a conventional superscalar processor; and

Fig. 5 ein Diagramm mit dem Betrieb des in Fig. 4 ge zeigten Superskalarprozessors. Fig. 5 is a diagram showing the operation of the superscalar processor shown in Fig. 4.

Die Fig. 1 zeigt ein Blockschaltbild mit dem Aufbau entspre chend einer Ausführungsform. In der Figur umfaßt ein Superska larprozessor in dieser Ausführungsform einen Befehlscachel 1, einen Programmzähler 9, ein erstes Schieberegister SR1, ein zweites Schieberegister SR2, Selektoren 100 bis 103 zum Auswäh len von Befehlen, Selektoren 200 bis 203 zum Auswählen von Flags, Befehlsregister IR0 bis IR3, Flagregister FR0 bis FR3, einen Befehlsdekoder 3, Verarbeitungseinheiten 4 bis 7, einen Selektor 300, einen Addierer 10, ein Leerzahlregister 11 und eine Selektorsteuerschaltung 12. Fig. 1 shows a block diagram with the structure accordingly one embodiment. In the figure, a superscalar processor in this embodiment includes a command tile 1 , a program counter 9 , a first shift register SR1, a second shift register SR2, selectors 100 to 103 for selecting commands, selectors 200 to 203 for selecting flags, command registers IR0 to IR3, flag registers FR0 to FR3, an instruction decoder 3 , processing units 4 to 7 , a selector 300 , an adder 10 , an empty number register 11 and a selector control circuit 12 .

Der Befehlscache 1 als Befehlsspeicher speichert eine Mehrzahl von Befehlen. Der Programmzähler 9 hält eine erste Adresse von vier Befehlen, die aus dem Befehlscache 1 während eines Zyklus geholt werden. Der Programmzähler 9 legt die Adressen von vier Befehlen an den Befehlscache 1 an, indem er während eines Zyklus viermal zählt. Als Ergebnis werden vier Befehle aus dem Befehlscache 1 während jeden Zyklus ausgelesen. Der Inhalt der im Programmzähler 9 gehaltenen Adresse wird entsprechend der Anforderung des superskalaren Prozessorsystems geändert. Wenn innerhalb des Befehlscache 9 kein der vom Programmzähler 9 ,be reitgestellten Adresse entsprechender Befehl vorliegt, wird auf einen externen Speicher (nicht gezeigt) zugegriffen. Der aus dem externen Speicher ausgelesene Befehl wird zum Befehlscache 1 übertragen und dort gespeichert. Normalerweise werden mehrere Zyklen benötigt, um den Befehl vom externen Speicher zum Befehlscache 1 zu übertragen. Der Befehlscache 1 stellt keinen Befehl während des Zyklus bereit, in welchem der Befehl über tragen wird. Daher kann der Befehl nicht vom Befehlscache 1 während dieser Zeit geholt werden. Der Befehlscache 1 bewirkt, daß ein ICR-Signal auf logisch 1 in einem Zyklus steht, bei welchem der Befehl ausgelesen wird, und bewirkt, daß das ICR- Signal auf logisch 0 in einem Zyklus steht, bei welchem der Be fehl nicht ausgelesen wird. Das ICR-Signal wird an jedem An schluß b der Selektoren 200 bis 203 angelegt, und wird auch an den Selektor 300 zum selben Zeitpunkt als Steuersignal ange legt.The instruction cache 1 as an instruction memory stores a plurality of instructions. The program counter 9 holds a first address of four instructions which are fetched from the instruction cache 1 during a cycle. The program counter 9 applies the addresses of four instructions to the instruction cache 1 by counting four times during a cycle. As a result, four instructions are read from instruction cache 1 during each cycle. The content of the address held in the program counter 9 is changed in accordance with the requirement of the superscalar processor system. If there is no command corresponding to the address provided by the program counter 9 , within the instruction cache 9 , an external memory (not shown) is accessed. The command read from the external memory is transferred to command cache 1 and stored there. Typically, several cycles are required to transfer the command from external memory to command cache 1 . Instruction cache 1 does not provide an instruction during the cycle in which the instruction is transmitted. Therefore, the instruction cannot be fetched from instruction cache 1 during this time. Instruction cache 1 causes an ICR signal to be logic 1 in a cycle in which the command is read out and causes the ICR signal to be logic 0 in a cycle in which the command is not read out. The ICR signal is applied to each terminal b of the selectors 200 to 203 , and is also applied to the selector 300 at the same time as a control signal.

Die vier aus dem Befehlscache 1 ausgelesenen Befehle werden einmal in einem zweiten Schieberegister SR2 gespeichert. Das zweite Schieberegister SR2 umfaßt vier Einheitsregister, die als Kaskade verbunden sind und jeweils einen Befehl speichern können. Das zweite Schieberegister SR2 schiebt die gespeicher ten Befehle nach rechts um einen vorbestimmten Betrag und stellt dann den Inhalt aus jedem Einheitsregister parallel bereit. Der Schiebebetrag (Schiebebreite) des zweiten Schiebe registers SR2 wird durch ein vom Selektor 300 bereitgestelltes NUM-Signal gesteuert. Diese Schiebeoperation entfernt unnötige Befehle, und es verbleibt die Anzahl von Befehlen, die der Zahl von leeren Schieberegistern der Schieberegister IR0 bis IR3 entsprechen. Der parallele Ausgang jedes Einheitsregisters des zweiten Schieberegisters SR2 wird an jeden Anschluß b der Se lektoren 100 bis 103 angelegt.The four instructions read out from instruction cache 1 are stored once in a second shift register SR2. The second shift register SR2 comprises four unit registers which are connected as a cascade and can each store an instruction. The second shift register SR2 shifts the stored instructions to the right by a predetermined amount and then provides the content from each unit register in parallel. The shift amount (shift width) of the second shift register SR2 is controlled by a NUM signal provided by the selector 300 . This shift operation removes unnecessary instructions, and there remains the number of instructions corresponding to the number of empty shift registers of shift registers IR0 to IR3. The parallel output of each unit register of the second shift register SR2 is supplied to each terminal b of the selectors 100 Se applied to the 103rd

Ein im ersten Schieberegister SR1 gespeicherter Befehl wird an jeden Anschluß a der Selektoren 100 bis 103 angelegt. Ein im ersten Schieberegister SR1 gespeichertes Flag wird an jeden a- Anschluß der Selektoren 200 bis 203 angelegt. Die Auswahl in jedem der Selektoren 100 bis 103, 200 bis 203 wird durch eine Selektorsteuerschaltung 12 gesteuert. Die Selektorsteuerschal tung 12 steuert den Auswahlzustand jedes Selektors auf der Basis des NUM-Signales aus dem Selektor 300. A command stored in the first shift register SR1 is applied to each terminal a of the selectors 100 to 103 . A flag stored in the first shift register SR1 is applied to each a terminal of the selectors 200 to 203 . The selection in each of the selectors 100 to 103 , 200 to 203 is controlled by a selector control circuit 12 . The selector control circuit 12 controls the selection state of each selector based on the NUM signal from the selector 300 .

Die Ausgangssignale der Selektoren 100 bis 103 werden an die Befehlsregister IR0 bis IR3 angelegt. Die Ausgangssignale der Selektoren 200 bis 203 werden an Flagregister FR0 bis FR3 an gelegt. Ein Befehlsdekoder 3 wählt Befehle aus, die auf eine parallele Weise verarbeitet werden können, und legt diese an die Verarbeitungseinheiten 4 bis 7 an, indem er die in den Be fehlsregister IR0 bis IR3 gespeicherten Befehle dekodiert. Zu diesem Zeitpunkt bezieht sich der Befehlsdekoder 3 auf die in den Flagregistern FR0 bis FR3 gespeicherten Flags und bewirkt, daß nur die bezeichneten Befehle Zielobjekte der Dekodierung werden. Danach überträgt der Befehlsdekoder 3 in den Befehls dekodern IR0 bis IR3 und den Flagregistern FR0 bis FR3 ge speicherte Daten zu einem ersten Schieberegister SR1 und stellt zum selben Zeitpunkt ein CNT-Signal bereit. Das CNT-Signal ist ein Signal, das die Anzahl von Befehlen darstellt, die an die Verarbeitungseinheiten 4 bis 6 durch den Befehlsdekoder 3 an gelegt werden. Der Befehlsdekoder 3 stoppt das Dekodieren als Reaktion auf ein BUSY-Signal der Verarbeitungseinheiten 4 bis 7, wenn eine beliebige der Verarbeitungseinheiten 4 bis 7 nicht arbeitet. Der Grund für den Halt der Operation der Verarbei tungseinheit kann beispielsweise darin liegen, daß Daten aus dem Datenspeicher noch nicht rechtzeitig in der Verarbeitungs einheit 6 bereitstehen und diese nicht mit dem nächsten Verar beitungsschritt fortschreiten kann.The output signals of the selectors 100 to 103 are applied to the command registers IR0 to IR3. The output signals of the selectors 200 to 203 are applied to flag registers FR0 to FR3. A command decoder 3 selects commands that can be processed in a parallel manner and applies them to the processing units 4 to 7 by decoding the commands stored in the command registers IR0 to IR3. At this time, the instruction decoder 3 refers to the flags stored in the flag registers FR0 to FR3 and causes only the designated instructions to become decoding targets. Thereafter, the command decoder 3 transfers data stored in the command decoders IR0 to IR3 and the flag registers FR0 to FR3 to a first shift register SR1 and provides a CNT signal at the same time. The CNT signal is a signal representing the number of instructions that are applied to the processing units 4 to 6 by the instruction decoder 3 . The instruction decoder 3 stops decoding in response to a BUSY signal from the processing units 4 to 7 when any of the processing units 4 to 7 is not operating. The reason for stopping the operation of the processing unit can be, for example, that data from the data memory are not yet available in time in the processing unit 6 and this cannot proceed with the next processing step.

Das vom Befehlsdekoder 3 bereitgestellte CNT-Signal wird an das erste Schieberegister SR1, den Selektor 300 und einen Addierer 10 angelegt. Das erste Schieberegister SR1 schiebt die vom Be fehlsregister 3 übertragenen Befehle und Flags nach links, ent sprechend der Zahl, die dem CNT-Signal entspricht. Diese Schie beoperation schiebt die Befehle, die im vorhergehenden Zyklus nicht an die Verarbeitungseinheiten angelegt wurden, nach links. Nach der Schiebeoperation legt das erste Schieberegister SR1 die darin gespeicherten Befehle und Flags an die Selektoren 100 bis 103 und 200 bis 203. The CNT signal provided by the command decoder 3 is applied to the first shift register SR1, the selector 300 and an adder 10 . The first shift register SR1 shifts the commands and flags transmitted from the command register 3 to the left, corresponding to the number corresponding to the CNT signal. This push operation pushes the instructions that were not applied to the processing units in the previous cycle to the left. After the shift operation, the first shift register SR1 applies the commands and flags stored therein to the selectors 100 to 103 and 200 to 203 .

Der Selektor 300 wählt entweder das CNT-Signal des Befehlsdeko ders 3 oder den Ausgang des Addierers 10 und erzeugt ein NUM- Signal als Reaktion auf ein ICR-Signal des Befehlscache 1. Das NUM-Signal stellt die Zahl der leeren der Befehlsregister IR0 bis IR3 dar. Das NUM-Signal wird an das Leerzahlregister 11 so wie das zweite Schieberegister SR2 und eine Selektorsteuer schaltung 12 angelegt. Das Leerzahlregister 11 speichert das NUM-Signal zeitweise. Das Ausgangssignal REG des Leerzahlregi sters 11 wird an den Addierer 10 angelegt. Der Addierer 10 addiert das CNT-Signal des Befehlsdekoders 3 zum Ausgangssi gnal REG des Leerzahlregisters 11.The selector 300 selects either the CNT signal of the instruction decoder 3 or the output of the adder 10 and generates a NUM signal in response to an ICR signal of the instruction cache 1 . The NUM signal represents the number of empty command registers IR0 to IR3. The NUM signal is applied to the empty number register 11 as well as the second shift register SR2 and a selector control circuit 12 . The empty number register 11 temporarily stores the NUM signal. The output signal REG of the Leerzahlregi sters 11 is applied to the adder 10 . The adder 10 adds the CNT signal of the instruction decoder 3 to the output signal REG of the empty number register 11 .

Der Betrieb entsprechend der in Fig. 1 gezeigten Ausführungs form wird nachfolgend im Detail beschrieben.The operation corresponding to the embodiment shown in Fig. 1 is described in detail below.

Zuerst wird ein Verfahren beschrieben, bei dem die leeren (= nicht belegten) der Befehlsregister IR0 bis IR3 erkannt werden. Wenn ein neuer Befehl in einen leeren Platz der Befehlsregi ster IR0 bis IR3 vom Befehlscache 1 während eines bestimmten Zyklus geholt werden kann, wird die Zahl NUM des Leerzahlregi sters im Zyklus gleich der Anzahl CNT von Befehlen, die an die Verarbeitungseinheiten aus den Befehlsregistern durch den Be fehlsdekoder 3 während des Zyklus angelegt werden. Das bedeu tet, NUM = CNT. Wenn der Befehl aus dem Befehlscache 1 geholt werden kann, erreicht das ICR-Signal logisch 1, so daß der Se lektor 300 ein CNT-Signal auswählt (welches die Anzahl von Be fehlen darstellt, die zu den Verarbeitungseinheiten aus den Befehlsregistern übertragen wurden), das vom Befehlsdekoder 3 angelegt wurde, und gibt ein NUM-Signal aus. Daher entspricht das NUM-Signal der Anzahl von leeren Befehlsregistern.First, a method is described in which the empty (= not used) command registers IR0 to IR3 are recognized. If a new instruction can be fetched into an empty location of instruction registers IR0 to IR3 from instruction cache 1 during a particular cycle, the number NUM of the idle number register in the cycle will equal the number of CNTs of instructions sent to the processing units from the instruction registers by the Command decoder 3 can be created during the cycle. That means NUM = CNT. If the instruction can be fetched from instruction cache 1 , the ICR signal reaches logic 1 so that selector 300 selects a CNT signal (representing the number of instructions transmitted to the processing units from the instruction registers). which was created by the instruction decoder 3 and outputs a NUM signal. Therefore the NUM signal corresponds to the number of empty command registers.

Wenn ein Befehl in einem anderen Zyklus nicht vom Befehlscache l ausgelesen werden kann, wird kein neuer Befehl in die Befehlsregister IR0 bis IR3 geholt. Als Ergebnis wird die An zahl NUM von leeren Befehlsregistern die Summe der Zahl REG (im Leerzahlregister 11 gespeichert) von leeren Registern des vorhergehenden Zyklus und der Anzahl CNT von an die Verarbei tungseinheiten angelegten Befehlen der Befehlsregister durch den Befehlsdekoder 3 im gegenwärtigen Zyklus, das heißt, NUM = REG + CNT. Wenn der Befehl nicht vom Befehlscache 1 geholt werden kann, erreicht das ICR-Signal logisch 0, so daß der Se lektor 300 das Ausgangssignal des Addierers 10 auswählt und ein NUM-Signal bereitstellt. Der Addierer 10 addiert das CNT-Signal des Befehlsdekoders 3 zum REG-Signal des Leerzahlregisters 11. Daher entspricht das NUM-Signal des Selektors 300 der Zahl des Leerzahlregisters.If a command cannot be read from the command cache 1 in another cycle, no new command is fetched into the command registers IR0 to IR3. As a result, the number NUM of empty instruction registers becomes the sum of the number REG (stored in the empty number register 11 ) of empty registers of the previous cycle and the number CNT of instructions of the instruction registers applied to the processing units by the instruction decoder 3 in the current cycle, that is , NUM = REG + CNT. If the instruction cannot be fetched from instruction cache 1 , the ICR signal reaches logic 0 so that selector 300 selects the output of adder 10 and provides a NUM signal. The adder 10 adds the CNT signal of the instruction decoder 3 to the REG signal of the empty number register 11 . Therefore, the NUM signal of the selector 300 corresponds to the number of the empty number register.

Die Zahl von in den Befehlsregistern IR0 bis IR3 zu speichern den Befehlen aus den aus dem Befehlscache 1 ausgelesenen Befeh len entspricht der Zahl NUM von leeren Befehlsregistern. Das heißt, leere Befehlsregister werden mit Befehlen aufgefüllt. Die vier im selben Zyklus aus dem Befehlscache 1 ausgelesenen Befehle werden auf die Zahl vermindert, die der Anzahl von leeren Befehlsregistern entspricht, und dann in den Befehlsre gistern IR0 bis IR3 über Selektoren 100 bis 103 gespeichert. Die Anzahl von Befehlen wird im zweiten Schieberegister SR2 vermindert. Das heißt, das zweite Schieberegister SR2 schiebt die aus dem Befehlscache 1 ausgelesenen vier Befehle um die An zahl, die der Zahl von leeren Befehlsregistern entspricht (durch das NUM-Signal des Selektors 300 angezeigt) und entfernt unnötige Befehle. Der Betrieb des zweiten Schieberegisters SR2 wird an späterer Stelle mehr im Detail beschrieben.The number of commands to be stored in the command registers IR0 to IR3 from the commands read out from the command cache 1 corresponds to the number NUM of empty command registers. That is, empty command registers are filled with commands. The four instructions read from instruction cache 1 in the same cycle are reduced to the number corresponding to the number of empty instruction registers, and then stored in instruction registers IR0 to IR3 via selectors 100 to 103 . The number of instructions is reduced in the second shift register SR2. That is, the second shift register SR2 shifts the four instructions read from the instruction cache 1 by the number corresponding to the number of empty instruction registers (indicated by the NUM signal of the selector 300 ) and removes unnecessary instructions. The operation of the second shift register SR2 will be described in more detail later.

Die vom Befehlscache 1 geholten Befehle müssen in den Verarbei tungseinheiten in der Reihenfolge verarbeitet werden, in der sie geholt wurden. Es wird bevorzugt, daß zum Erhalten der Ver arbeitungsreihenfolge der geholten Befehle die in den Befehls registern IR0 bis IR3 gespeicherten Befehle in der Reihenfolge angeordnet sind, in der sie geholt wurden, da auf diese Weise der Befehlsdekoder 3 leicht bestimmen kann, welches Befehlsre gister die zuerst zu verarbeitenden Befehle speichert. Das erste Schieberegister SR1 ändert die in jedem Befehlsregister gespeicherten Befehle für jeden Zyklus, um die Reihenfolge der Befehle in den Befehlsregistern IR0 bis IR3 zu erhalten. Das heißt, das erste Schieberegister SR1 schiebt die vom Befehls dekoder 3 empfangenen Befehle und Flags um den Betrag nach links, der dem CNT-Signal des Befehlsdekoders 3 entspricht. Als Ergebnis werden die Position des zuerst geholten Befehles und des entsprechenden Flags zum linken Rand geschoben. Wenn nach dieser Operation das Ausgangssignal des ersten Schieberegisters SR1 auf jedes Befehlsregister und Flagregister übertragen wird, wird der zuerst geholte Befehl und das dementsprechende Flag im Befehlsregister IR0 bzw. im Flagregister FR0 gespeichert. Weitere ältere Befehle und diesen entsprechende Flags werden in der Reihenfolge
(Befehlsregister IR1, Flagregister FR1),
(Befehlsregister IR2, Flagregister FR2),
(Befehlsregister IR3, Flagregister FR3) gespeichert.The instructions fetched from instruction cache 1 must be processed in the processing units in the order in which they were fetched. It is preferred that in order to obtain the processing order of the fetched instructions, the instructions stored in the instruction registers IR0 to IR3 are arranged in the order in which they were fetched, since the instruction decoder 3 can thus easily determine which instruction register the commands to be processed first. The first shift register SR1 changes the instructions stored in each instruction register for each cycle to maintain the order of the instructions in the instruction registers IR0 to IR3. That is, the first shift register SR1 shifts the decoder 3 from the instruction received commands and flags to the left by the amount corresponding to the CNT signal of the instruction decoder. 3 As a result, the position of the first command and the corresponding flag are moved to the left margin. If, after this operation, the output signal of the first shift register SR1 is transferred to each command register and flag register, the command which was fetched first and the corresponding flag are stored in the command register IR0 or in the flag register FR0. Other older commands and these corresponding flags are in order
(Command register IR1, flag register FR1),
(Command register IR2, flag register FR2),
(Command register IR3, flag register FR3) stored.

Entsprechend bestimmt das Befehlsregister 3 die Möglichkeit einer parallelen Ausführung eines in jedem Befehlsregister ge speicherten Befehles, wobei er das Befehlsregister IR0 am linken Rand stets als Startpunkt nimmt. Der Betrieb des ersten Schieberegisters SR1 wird an späterer Stelle im Detail beschrieben.Accordingly, the command register 3 determines the possibility of a parallel execution of a command stored in each command register, always taking the command register IR0 on the left edge as the starting point. The operation of the first shift register SR1 will be described in detail later.

Die Fig. 2 ist ein Diagramm mit einem Beispiel des Betriebes der in Fig. 1 gezeigten Ausführungsform. Genaue Beispiele des Betriebes gemäß der in Fig. 1 gezeigten Ausführungsform werden nachfolgend für jeden Zyklus beschrieben, unter Bezug auf Fig. 2. FIG. 2 is a diagram showing an example of the operation of the embodiment shown in FIG. 1. Specific examples of the operation according to the embodiment shown in FIG. 1 will be described below for each cycle, with reference to FIG. 2.

1) Cycle 1

Da in Zyklus 1 ein Befehl aus dem Befehlscache 1 ausgelesen wird, befindet sich das ICR-Signal auf logisch 1. Zu diesem Zeitpunkt wird nur der im Befehlsregister IR0 gespeicherte Befehl 1 an eine der Verarbeitungseinheiten 4 bis 7 durch den Befehlsdekoder 3 angelegt. Das heißt, es wird bestimmt, daß andere Befehle 2 bis 4 nicht parallel mit dem Befehl 1 verarbeitet werden können, und nur Befehl 1 ist Zielobjekt der Verarbeitung (des Prozesses). Folglich beträgt der Wert des vom Befehlsdekoder 3 bereitgestellten CNT-Signales 1. Da sich das ICR-Signal auf logisch 1 befindet, wählt der Selektor 300 das CNT-signal und erzeugt ein NUM-Signal. Folglich wird der Wert des NUM-Signales zu 1. Das NUM-Signal wird im Leerzahlregister 11 gespeichert.Since a command is read from command cache 1 in cycle 1, the ICR signal is at logic 1. At this time, only command 1 stored in command register IR0 is applied to one of processing units 4 to 7 by command decoder 3 . That is, it is determined that other commands 2 through 4 cannot be processed in parallel with command 1, and only command 1 is the target of the processing. Consequently, the value of the CNT signal provided by the instruction decoder 3 is 1 . Since the ICR signal is at logic 1, the selector 300 selects the CNT signal and generates a NUM signal. As a result, the value of the NUM signal becomes 1. The NUM signal is stored in the empty number register 11 .

2) Cycle 2

Die Anzahl (1) von leeren Befehlsregistern im vorhergehenden Zyklus (1) ist im Leerzahlregister 11 gespeichert. Im Zyklus 2 werden, wie in Fig. 2 gezeigt, Befehle 2, 3, 4, die im vorher gehenden Zyklus nicht an die Verarbeitungseinheiten angelegt wurden, durch das Schieberegister NR1 an die Befehlsregister IR0, IR1, IR2 angelegt. Zu diesem Zeitpunkt werden aus dem Be fehlscache 1 ausgelesene Befehle 5, 6, 7, 8 im zweiten Schie beregister SR2 gespeichert. Das zweite Schieberegister SR2 ent fernt die Befehle 6, 7, 8 und beläßt nur den Befehl 5, indem die Schiebeoperation so viele Male ausgeführt wird, wie es das NUM-Signal bestimmt (in diesem Fall 3x). Dieser Befehl 5 wird zum Befehlsregister IR3 über den Selektor 103 übertragen. Wie oben angegeben, da der Befehl vom Befehlscache 1 ausgelesen wird, befindet sich im Zyklus 2 das ICR-Signal auf logisch l. Der Befehlsdekoder 3 bestimmt, daß die in den Befehlsregistern IR0, IR1 gespeicherten Befehle 2, 3 parallel verarbeitet werden können und legt diese Befehle 2, 3 an beliebige der Verarbei tungseinheiten 4 bis 7 an. Daher beträgt der Wert des CNT-Si gnales 2. Da sich das CR-Signal auf logisch 1 befindet, wählt der Selektor 300 das CNT-Signal und erzeugt so ein NUM-Signal. Folglich wird der Wert des NUM-Signales zu 2. Das NUM-Signal wird im Leerzahlregister 11 gespeichert.The number (1) of empty command registers in the previous cycle (1) is stored in the empty number register 11 . In cycle 2, as shown in FIG. 2, commands 2, 3, 4 which were not applied to the processing units in the previous cycle are applied to the command registers IR0, IR1, IR2 by the shift register NR1. At this time, commands 5, 6, 7, 8 read from the instruction cache 1 are stored in the second shift register SR2. The second shift register SR2 removes the instructions 6, 7, 8 and leaves only the instruction 5 by executing the shift operation as many times as it determines the NUM signal (in this case 3 times). This command 5 is transferred to the command register IR3 via the selector 103 . As indicated above, since the instruction is read from instruction cache 1 , the ICR signal is at logic 1 in cycle 2. The instruction decoder 3 determines that the instructions 2, 3 stored in the instruction registers IR0, IR1 can be processed in parallel and applies these instructions 2, 3 to any of the processing units 4 to 7 . Therefore, the value of the CNT signal is 2 . Since the CR signal is logic 1, the selector 300 selects the CNT signal and thus generates a NUM signal. As a result, the value of the NUM signal becomes 2. The NUM signal is stored in the empty number register 11 .

3) Cycle 3

Die Anzahl (2) von Leerbefehlsregistern im vorhergehenden Zyklus (Zyklus 2) ist im Leerzahlregister 11 gespeichert. Wie in Fig. 2 gezeigt, werden in Zyklus 3 Befehle 4, 5, die während des vorhergehenden Zyklus nicht an die Verarbeitungseinheiten ange legt wurden, an die Befehlsregister IR0, IR1 durch das erste Schieberegister IR1 angelegt. Befehle 6, 7, die allein aus der Schiebeoperation der vier aus dem Befehlscache 1 ausgelesenen Befehle 6, 7, 8, 9 übrig bleiben, werden zu den Befehlsregi stern IR2, IR3 über Selektoren 102, 103 übertragen. Da ein Be fehl auf dem Befehlscache 1 während des Zyklus 3 ausgelesen wurde, befindet sich das ICR-Signal auf logisch 1. Der Befehlsdekoder 3 bestimmt, daß die in den Befehlsregistern IR0, IR1 gespeicherten Befehle 4, 5 parallel verarbeitet werden können und überträgt diese Befehle 4, 5 zu beliebigen der Verarbeitungseinheiten 4 bis 7. Folglich beträgt der Wert des CNT-Signales 2. Da sich das ICR-Signal auf logisch 1 befindet, wählt der Selektor 300 das CNT-Signal und erzeugt ein NUM-Signal. Daher wird der Wert des NUM-Signales zu 2. Das NUM-Signal wird im Leerzahlregister 11 gespeichert.The number (2) of empty command registers in the previous cycle (cycle 2) is stored in the empty number register 11 . As shown in Fig. 2, in cycle 3, commands 4, 5 that were not applied to the processing units during the previous cycle are applied to the command registers IR0, IR1 by the first shift register IR1. Instructions 6, 7, which remain from the shift operation of the four instructions 6, 7, 8, 9 read from instruction cache 1 , are transmitted to instruction registers IR2, IR3 via selectors 102 , 103 . Since a command on command cache 1 was read out during cycle 3, the ICR signal is at logic 1. Command decoder 3 determines that commands 4, 5 stored in command registers IR0, IR1 can be processed in parallel and transmits them Instructions 4, 5 to any of the processing units 4 to 7 . As a result, the value of the CNT signal is 2 . Since the ICR signal is at logic 1, the selector 300 selects the CNT signal and generates a NUM signal. Therefore, the value of the NUM signal becomes 2. The NUM signal is stored in the empty number register 11 .

4) Cycle 4

Die Anzahl (2) von leeren Befehlsregistern im vorhergehenden Zyklus (Zyklus 3) ist im Leerzahlregister 11 gespeichert. Wie in Fig. 2 gezeigt, werden in Zyklus 4 Befehle 6, 7, die im vorhergehenden Zyklus nicht an die Verarbeitungseinheiten ange legt wurden, an Befehlsregister IR0 bzw. IR1 durch das erste Schieberegister SR1 angelegt. Da im Zyklus 4 kein Befehl aus dem Befehlscache 1 ausgelesen wurde (dies kann beispielsweise daran liegen, daß kein Befehl aus dem Befehlscache 1 auszule sen ist, und ein Befehl von einem externen Speicher übertragen werden muß), befindet sich das ICR-Signal auf logisch 0. Zu sätzlich wird kein Befehl zu Befehlsregistern vom zweiten Schieberegister SR2 übertragen. Daher sind die Befehlsregister IR2, IR3 leer, das heißt es befinden sich unbestimmte Daten darin gespeichert. Der Befehlsdekoder 3 legt den im Befehlsre gister IR0 gespeicherten Befehl 6 an beliebige der Verarbei tungseinheiten 4 bis 7 selbst an. Das heißt, der Befehlsdekoder 7 bestimmt, daß der im Befehlsregister IR1 gespeicherte Befehl 7 nicht parallel mit Befehl 6 verarbeitet werden kann und stellt nur den Befehl 6 für die Verarbeitungseinheit bereit. Zu diesem Zeitpunkt beträgt der Wert des CNT-Signales 1. Da sich das ICR-Signal auf logisch 0 befindet, wählt der Selektor 300 den Ausgang des Addierers 10 und erzeugt ein NUM-Signal. Zu diesem Zeitpunkt, da der Addierer 10 den Wert (1) des CNT-Si gnales zum Wert (2) des Ausgangssignales REG des Leerzahlregi sters addiert, wird der Wert des NUM-Signales zu 3. Das NUM- Signal wird im Leerzahlregister 11 gespeichert.The number (2) of empty command registers in the previous cycle (cycle 3) is stored in the empty number register 11 . As shown in Fig. 2, in cycle 4 commands 6, 7, which were not applied to the processing units in the previous cycle, are applied to command registers IR0 and IR1 by the first shift register SR1. Since no command was read from command cache 1 in cycle 4 (this may be due, for example, to the fact that no command can be read from command cache 1 and a command must be transferred from an external memory), the ICR signal is logic In addition, no command is transmitted to command registers from the second shift register SR2. The command registers IR2, IR3 are therefore empty, that is to say indefinite data are stored therein. The instruction decoder 3 applies the instruction 6 stored in the instruction register IR0 to any of the processing units 4 to 7 itself. That is, the command decoder 7 determines that the command 7 stored in the command register IR1 cannot be processed in parallel with command 6 and only provides command 6 for the processing unit. At this time, the value of the CNT signal is 1 . Since the ICR signal is at logic 0, the selector 300 selects the output of the adder 10 and generates a NUM signal. At this time, since the adder 10 adds the value (1) of the CNT signal to the value (2) of the output signal REG of the empty number register, the value of the NUM signal becomes 3. The NUM signal is stored in the empty number register 11 .

5) cycle 5

Die Anzahl (3) von leeren Befehlsregistern im vorhergehenden Zyklus (Zyklus 4) ist im Leerzahlregister gespeichert. Wie in Fig. 2 gezeigt, wird im Zyklus 5 der Befehl 7, der im vorher gehenden Zyklus nicht an die Verarbeitungseinheit angelegt wur de, in das Befehlsregister IR0 durch das erste Schieberegister SR1 geschoben. Im Zyklus 5 wurde kein Befehl vom Befehlscache 1 ausgelesen, und das ICR-Signal befindet sich auf logisch 0. Zu sätzlich wurde kein Befehl zum Befehlsregister durch das zweite Schieberegister SR2 übertragen. Daher sind die Befehlsregister IR1 bis IR3 leer. Der Befehlsdekoder 3 bestimmt, daß die Opera tion in einer beliebigen der Verarbeitungseinheiten entspre chend eines BUSY-Signales aus der Verarbeitungseinheit angehal ten ist und stoppt die Dekodieroperation eines Befehles. Folg lich wird kein Befehl zur Verarbeitungseinheit aus dem Befehls register übertragen.The number (3) of empty command registers in the previous cycle (cycle 4) is stored in the empty number register. As shown in FIG. 2, in cycle 5, instruction 7, which was not applied to the processing unit in the previous cycle, is shifted into instruction register IR0 by first shift register SR1. In cycle 5, no instruction was read from instruction cache 1 , and the ICR signal is at logic 0. In addition, no instruction was transferred to the instruction register by the second shift register SR2. The command registers IR1 to IR3 are therefore empty. The instruction decoder 3 determines that the operation in any one of the processing units is halted in accordance with a BUSY signal from the processing unit and stops the decoding operation of an instruction. Consequently, no command is transmitted to the processing unit from the command register.

Der Wert des CNT-Signales befindet sich dadurch auf 0. Da das ICR-Signal sich auf logisch 0 befindet, wählt der Selektor 300 den Ausgang des Addierers 10 und erzeugt ein NUM-Signal. Da der Addierer 10 den Wert (0) des CNT-Signales zum Wert (3) des Aus gangssignales REG des Leerzahlregisters 10 addiert, ist zu diesem Zeitpunkt der Wert des NUM-Signales 3. Das NUM-Signal wird im Leerzahlregister 11 gespeichert.The value of the CNT signal is therefore at 0. Since the ICR signal is at logic 0, the selector 300 selects the output of the adder 10 and generates a NUM signal. Since the adder 10 adds the value (0) of the CNT signal to the value (3) of the output signal REG of the empty number register 10 , the value of the NUM signal is 3 at this time. The NUM signal is stored in the empty number register 11 .

6) Cycle 6

Die Anzahl (3) von leeren Befehlsregistern im vorhergehenden Zyklus (Zyklus 6) ist im Leerzahlregister 11 gespeichert. Im Zyklus 6 werden Befehle 8, 9, 10, 11 aus dem Befehlscache 1 ausgelesen und im zweiten Schieberegister SR2 gespeichert. Die Befehle 8, 9, 10 aus diesen Befehlen 8, 9, 10, 11 bleiben durch die Schiebeoperation erhalten und werden zu den Befehlsregi stern IR1, IR2, IR3 über Selektoren 101, 102, 103 übertragen. Auf diese Weise ist im Zyklus 6 das ICR-Signal auf logisch 1, da die Befehle aus den Befehlscache 1 ausgelesen werden. Der Befehlsdekoder 3 bestimmt, daß die Befehle 7, 8, 9 aus den in den Befehlsregistern IR0 bis IR3 gespeicherten Befehlen paral lel verarbeitet werden können und legt diese Befehle 7, 8, 9 an beliebige der Verarbeitungseinheiten 4 bis 7 an. Der Wert des CNT-Signales wird hierdurch zu 3. Da sich das ICR-Signal auf logisch 1 befindet, wählt der Selektor 300 das CNT-Signal und erzeugt ein NUM-Signal. Folglich wird der Wert des NUM-Signales zu 3. Das NUM-Signal wird im Leerzahlregister 11 gespeichert.The number (3) of empty command registers in the previous cycle (cycle 6) is stored in the empty number register 11 . In cycle 6, instructions 8, 9, 10, 11 are read from instruction cache 1 and stored in the second shift register SR2. The commands 8, 9, 10 from these commands 8, 9, 10, 11 are retained by the shift operation and are transmitted to the command registers star IR1, IR2, IR3 via selectors 101 , 102 , 103 . In this way, in cycle 6 the ICR signal is at logical 1, since the commands are read from the command cache 1 . The instruction decoder 3 determines that the instructions 7, 8, 9 can be processed in parallel from the instructions stored in the instruction registers IR0 to IR3 and applies these instructions 7, 8, 9 to any of the processing units 4 to 7 . The value of the CNT signal thereby becomes 3. Since the ICR signal is at logic 1, the selector 300 selects the CNT signal and generates a NUM signal. As a result, the value of the NUM signal becomes 3. The NUM signal is stored in the empty number register 11 .

Wie oben beschrieben kann bei der in Fig. 1 gezeigten Ausfüh rungsform ein neuer Befehl aus dem Befehlscache 1 ausgelesen werden und in ein leeres der Befehlsregistern geholt werden, ohne zu warten, bis alle durch die Befehlsregister IR0 bis IR3 geholten Befehle an die Verarbeitungseinheiten angelegt worden sind. Folglich werden Zielobjekte, für die die Möglichkeit einer Parallelverarbeitung bestimmt wird, nicht nach einer vor bestimmten Anzahl von Befehlen wie in Fig. 5 gezeigt geteilt, und jede Verarbeitungseinheit kann effizient ausgenutzt werden.As described above, in the embodiment shown in FIG. 1, a new instruction can be read out from instruction cache 1 and fetched into an empty one of the instruction registers without waiting for all instructions fetched by instruction registers IR0 to IR3 to be applied to the processing units are. As a result, target objects for which the possibility of parallel processing is determined are not divided after a predetermined number of instructions as shown in Fig. 5, and each processing unit can be used efficiently.

Es folgt eine detaillierte Beschreibung des Betriebes jedes Be reiches entsprechend der in Fig. 1 gezeigten Ausführungsform.The following is a detailed description of the operation of each area according to the embodiment shown in FIG. 1.

Zuerst wird der Betrieb des ersten Schieberegisters SR1 beschrieben. Das erste Schieberegister SR1 schiebt vom Befehls dekoder 3 angelegte Befehle und Flags nach links. Die Befehle und Flags werden paarweise geschoben, wobei einander jeweils entsprechende ein Paar bilden. Der Schiebebetrag des ersten Schieberegisters SR1 wird durch das CNT-Signal des Befehlsdeko ders 3 bestimmt. Das heißt, das erste Schieberegister SR1 führt die Schiebeoperation so viele Male durch, wie Befehle durch den Befehlsdekoder 3 aus den Befehlsregistern an die Verarbeitungs einheiten angelegt werden. Das heißt, der Verschiebungsbetrag des ersten Schieberegisters SR1 wird wie folgt gesteuert:
CNT = 0 (Zahl von ausgegebenen Befehlen 0): kein Schieben. First, the operation of the first shift register SR1 will be described. The first shift register SR1 shifts commands and flags created by the command decoder 3 to the left. The commands and flags are pushed in pairs, with each other forming a pair. The shift amount of the first shift register SR1 is determined by the CNT signal of the instruction decoder 3 . That is, the first shift register SR1 performs the shift operation as many times as instructions are applied to the processing units by the instruction decoder 3 from the instruction registers. That is, the shift amount of the first shift register SR1 is controlled as follows:
CNT = 0 (number of commands issued 0): no pushing.

CNT = 1 (Zahl von ausgegebenen Befehlen 1): Verschieben nach links um 1.CNT = 1 (number of commands issued 1): Move to left by 1.

CNT = 2 (Zahl von ausgegebenen Befehlen 2): Verschieben nach links um 2.CNT = 2 (number of commands 2 issued): Move to left by 2.

CNT = 3 (Zahl von ausgegebenen Befehlen 3): Verschieben nach links um 3.CNT = 3 (number of commands 3 issued): Move to left at 3.

CNT = 4 (Zahl von ausgegebenen Befehlen 4): kein Verschieben.CNT = 4 (number of commands 4 issued): no moving.

Beispielsweise sind die Befehle 1, 2, 3, 4 in den Befehlsregi stern IR0, IR1, IR2, IR3 gespeichert, und wenn nur der Befehl 1 an die Verarbeitungseinheit angelegt wird, führt dies zu CNT = 1. Das erste Schieberegister SR1 führt dabei die Schiebeopera tion nach links 1x aus. Folglich wird die Reihenfolge der Be fehle nach der Schiebeoperation zu 2, 3, 4, X, wobei X einen Leerzustand repräsentiert. Entsprechend wird jedes Flag auch mit dem entsprechenden Befehl weitergeschoben. Nach der Schie beoperation legt das erste Schieberegister SR1 jeden Befehl und jedes Flag parallel an. Die parallel angelegten Befehle werden durch die Selektoren 100 bis 103 ausgewählt und in Befehlsregi stern IR0 bis IR3 eingeschrieben. Entsprechend werden die parallel angelegten Flags durch Selektoren 200 bis 203 ausge wählt und in Flagregister FR0 bis FR3 eingeschrieben.For example, commands 1, 2, 3, 4 are stored in the command registers star IR0, IR1, IR2, IR3, and if only command 1 is applied to the processing unit, this leads to CNT = 1. The first shift register SR1 carries out the Slide operation to the left 1x off. Consequently, the order of the commands after the shift operation becomes 2, 3, 4, X, where X represents an empty state. Accordingly, each flag is moved on with the corresponding command. After the shift operation, the first shift register SR1 applies each instruction and each flag in parallel. The commands created in parallel are selected by selectors 100 to 103 and written into command registers IR0 to IR3. Correspondingly, the flags created in parallel are selected by selectors 200 to 203 and written into flag registers FR0 to FR3.

Der Betrieb des zweiten Schieberegisters SR2 wird nachfolgend beschrieben. Das zweite Schieberegister SR2 schiebt vier aus dem Befehlscache 1 ausgelesene Befehle nach rechts. Der Ver schiebebetrag des zweiten Schieberegisters SR2 wird durch das NUM-Signal des Selektors 300 bestimmt. Das bedeutet, der Ver schiebungsbetrag des zweiten Schieberegisters SR2 wird wie folgt gesteuert:
NUM = 1 (die Zahl von leeren Befehlsregistern ist 0): kein Ver schieben
NUM = 1 (die Zahl von leeren Befehlsregistern beträgt 1): Ver schieben nach rechts um 3
NUM = 2 (die Zahl von leeren Befehlsregistern beträgt 2): Ver schieben nach rechts um 2
NUM = 3 (die Zahl von leeren Befehlsregistern beträgt 3): Ver schieben nach rechts um 1
NUM = 4 (die Zahl von leeren Befehlsregistern beträgt 4): kein Verschieben.The operation of the second shift register SR2 is described below. The second shift register SR2 shifts four commands read from the command cache 1 to the right. The shift amount of the second shift register SR2 is determined by the NUM signal of the selector 300 . That is, the shift amount of the second shift register SR2 is controlled as follows:
NUM = 1 (the number of empty command registers is 0): no shift
NUM = 1 (the number of empty command registers is 1): move right by 3
NUM = 2 (the number of empty command registers is 2): move right by 2
NUM = 3 (the number of empty command registers is 3): move right by 1
NUM = 4 (the number of empty command registers is 4): no moving.

Wenn beispielsweise Befehle 1, 2, 3, 4 in den Befehlsregistern IR0, IR1, IR2, IR3 im vorhergehenden Zyklus gespeichert sind und nur der Befehl 1 an die Verarbeitungseinheit angelegt wird, führt dies zu NUM = 1, so daß das zweite Schieberegister SR2 die aus dem Befehlscache 1 ausgelesenen Befehle 5, 6, 7, 8 um 3 nach rechts verschiebt. Folglich wird der Befehl nach dem Ver schieben im zweiten Schieberegister SR2 zu X, X, X, 5. Der Befehl 5 wird in das Befehlsregister IR3 über den Selektor 103 eingeschrieben. Währenddessen sind die Befehle 2, 3, 4 in die Befehlsregister IR0, IR1, IR2 durch das erste Schieberegister SR1 eingeschrieben worden.For example, if commands 1, 2, 3, 4 are stored in the command registers IR0, IR1, IR2, IR3 in the previous cycle and only command 1 is applied to the processing unit, this leads to NUM = 1, so that the second shift register SR2 the commands 5, 6, 7, 8 read from the command cache 1 are shifted by 3 to the right. Consequently, the command after shifting in the second shift register SR2 becomes X, X, X, 5. The command 5 is written into the command register IR3 via the selector 103 . In the meantime, the commands 2, 3, 4 have been written into the command registers IR0, IR1, IR2 by the first shift register SR1.

Der Betrieb der Selektorsteuerschaltung 12 wird nachfolgend be schrieben. Es wird angenommen, daß die Befehle 1, 2, 3, 4 im vorhergehenden Zyklus in den Befehlsregistern IR0, IR1, IR2, IR3 gespeichert sind und beispielsweise nur der Befehl 1 aus diesen zu einer Verarbeitungseinheit übertragen wurde. Dieser Fall führt CNT = 1. Wenn ein Befehl aus dem Befehlscache 1 im gegenwärtigen Zyklus ausgelesen wird, erreicht das ICR-Signal logisch 1 und das CNT-Signal wird als NUM-Signal vom Selektor 300 ausgewählt. Als Ergebnis wird der Wert des NUM-Signales zu 1. Da die Befehle 1, 2, 3, 4, die das erste Schieberegister SR1 vom Befehlsdekoder 3 empfangen hat, um 1 nach links verschoben wurden, sind die Befehlsausgangssignale des ersten Schieberegi sters SR1 nach der Schiebeoperation 2, 3, 4, X. Da vier aus dem Befehlscache 1 ausgelesenen Befehle 5, 6, 7, 8 im zweiten Schieberegister SR2 um drei nach rechts verschoben wurden, sind die Befehlsausgangssignale des zweiten Schieberegisters SR2 X, X, X, 5. Die Selektoren 100 bis 103 werden durch die Selektor steuerschaltung 12 gesteuert, die auf das NUM-Signal reagiert und wie folgt auswählt:
Selektor 100: a
Selektor 101: a
Selektor 102: a
Selektor 103: b.The operation of the selector control circuit 12 will be described below. It is assumed that commands 1, 2, 3, 4 in the previous cycle are stored in command registers IR0, IR1, IR2, IR3 and, for example, only command 1 was transferred from them to a processing unit. This case results in CNT = 1. When an instruction is read from instruction cache 1 in the current cycle, the ICR signal reaches logic 1 and the CNT signal is selected by selector 300 as a NUM signal. As a result, the value of the NUM signal becomes 1. Since the commands 1, 2, 3, 4 received by the first shift register SR1 from the command decoder 3 have been shifted to the left by 1, the command output signals of the first shift register SR1 are after the shift operation 2, 3, 4, X. Since four instructions 5, 6, 7, 8 read from the instruction cache 1 have been shifted to the right by three in the second shift register SR2, the instruction output signals of the second shift register SR2 are X, X, X, 5 . The selectors 100 to 103 are controlled by the selector control circuit 12 , which reacts to the NUM signal and selects as follows:
Selector 100 : a
Selector 101 : a
Selector 102 : a
Selector 103 : b.

Als Ergebnis werden die Befehle 2, 3, 4, 5 in den Befehlsregi stern IR0, IR1, IR2, IR3 gespeichert.As a result, commands 2, 3, 4, 5 are in the command register star IR0, IR1, IR2, IR3 saved.

Bezüglich der in den Flagregistern FR0 bis FR3 gespeicherten Flags werden diese an jeden Anschluß a der Selektoren 200 bis 203 angelegt, nachdem sie durch das erste Schieberegister SR1 verschoben wurden. Das ICR-Signal des Befehlscache 1 wird an jeden Anschluß b des Selektoren 200 bis 203 angelegt. Bei die sem Beispiel werden die Selektoren 200 bis 203 durch den Selek torsteuerkreis 12 gesteuert, der auf das NUM-Signal reagiert und die folgenden Auswahlen trifft:
Selektor 200: a
Selektor 201: a
Selektor 202: a
Selektor 203: b.Regarding the flags stored in the flag registers FR0 to FR3, these are applied to each terminal a of the selectors 200 to 203 after being shifted by the first shift register SR1. The ICR signal of the instruction cache 1 is applied to each terminal b of the selectors 200-203 is applied. In this example, the selectors 200 to 203 are controlled by the selector control circuit 12 , which responds to the NUM signal and makes the following selections:
Selector 200 : a
Selector 201 : a
Selector 202 : a
Selector 203 : b.

Als Ergebnis werden das Flag von Befehl 2, das Flag von Befehl 3, das Flag von Befehl 4 und der logische Wert des ICR-Signales in den Flagregistern FR0, FR1, FR2 und FR3 gespeichert.As a result, the flag of command 2, the flag of command 3, the flag of command 4 and the logic value of the ICR signal stored in the flag registers FR0, FR1, FR2 and FR3.

Allgemeine Zusammenhänge zwischen dem NUM-Signal und den Aus wahlzuständen der Selektoren 100 bis 103, 200 bis 203 werden in Fig. 3 gezeigt.General relationships between the NUM signal and the selection states of the selectors 100 to 103 , 200 to 203 are shown in FIG. 3.

Eine Beschreibung wird nachfolgend von den Wirkungen der Flags vorgenommen, die in den Flagregistern FR0 bis FR3 gespeichert sind. Jedes Flag zeigt die Gültigkeit/Ungültigkeit eines in einem entsprechenden Befehlsregister gespeicherten Befehles an. Beispielsweise ist ein Befehl, der einem Flag auf logisch 1 entspricht, gültig, und ein Befehl, der einem Flag auf logisch 0 entspricht, ungültig. Jedes Flag wird abgefragt, wenn der Be fehlsdekoder 3 einen Befehl dekodiert. Das heißt, der Befehls dekoder 3 behandelt nur Befehle, die einem auf logisch 1 ge setzten Flag entsprechen, als gültige Befehle und dekodiert nur diese Befehle. Hierdurch wird verhindert, daß unbestimmte Daten zur Prozessoreinheit gelangen. Die in den Flagregistern FR0 bis FR3 gespeicherten Flags werden zusammen mit entsprechenden Be fehlen zum ersten Schieberegister SR3 über den Befehlsdekoder 3 übertragen. Das erste Schieberegister SR1 paart einen Befehl mit einem entsprechenden Flag und führt die Schiebeoperation durch. Die Schiebeoperation im ersten Schieberegister SR1 wird wie oben beschrieben ausgeführt. Folglich werden Befehle durch das erste Schieberegister SR1 verschoben, wobei die Gültigkeit/ Ungültigkeit des in jedem Befehlsregister gespeicherten Befeh les erhalten bleibt. Wenn ein neu aus dem Befehlscache 1 aus gelesener Befehl in ein Befehlsregister geholt wird, wird das ICR-Signal in ein dem Befehlsregister entsprechendes Flagregi ster eingeschrieben. Da das ICR-Signal sich auf logisch 1 in dem Zyklus befindet, bei dem ein Befehl aus dem Befehlscache 1 ausgelesen wird, wird ein neu in das Befehlsregister zu diesem Zeitpunkt geholter Befehl nachfolgend als gültige Daten be trachtet. Währenddessen wird das ICR-Signal von logisch 0 in ein Flagregister eingeschrieben, das einem leeren Befehlsregi ster entspricht, in einem Zyklus, bei dem ein Befehl zur Ver arbeitungseinheit aus dem Befehlsregister angelegt wird, aber kein Befehl aus dem Befehlscache 1 geholt werden kann. Folglich werden in den leeren Befehlsregistern gespeicherte Daten nach folgend als ungültig behandelt.A description is made below of the effects of the flags stored in the flag registers FR0 to FR3. Each flag indicates the validity / invalidation of an instruction stored in a corresponding instruction register. For example, an instruction that corresponds to a logic 1 flag is valid and an instruction that corresponds to a logic 0 flag is invalid. Each flag is queried when the command decoder 3 decodes a command. That is, the command decoder 3 only treats commands that correspond to a flag set to logic 1 as valid commands and decodes only these commands. This prevents undetermined data from reaching the processor unit. The flags stored in the flag registers FR0 to FR3 are transmitted together with corresponding commands to the first shift register SR3 via the command decoder 3 . The first shift register SR1 pairs an instruction with a corresponding flag and carries out the shift operation. The shift operation in the first shift register SR1 is carried out as described above. As a result, instructions are shifted by the first shift register SR1, the validity / invalidity of the instruction stored in each instruction register being retained. When a newly read command from command cache 1 is fetched into a command register, the ICR signal is written into a flag register corresponding to the command register. Since the ICR signal is at logic 1 in the cycle in which an instruction is read out from instruction cache 1 , an instruction newly fetched into the instruction register at this point in time is subsequently considered to be valid data. Meanwhile, the ICR signal from logic 0 is written into a flag register corresponding to an empty instruction register, in a cycle in which an instruction to the processing unit is applied from the instruction register but no instruction can be fetched from instruction cache 1 . As a result, data stored in the empty instruction registers is treated as invalid after the following.

Wie oben beschrieben kann bei der in Fig. 1 gezeigten Ausfüh rungsform ein neuer Befehl aus dem Befehlscache ausgelesen werden und an ein leeres Befehlsregister als Ersatz angelegt werden, ohne daß gewartet werden muß, bis die in den Befehlsre gistern IR0 bis IR3 gespeicherten Befehle an die Verarbeitungs einheiten angelegt worden sind. Ferner kann bei der in Fig. 1 gezeigten Ausführungsform ein Befehl aus dem Befehlscache 1 ausgelesen werden und ein neuer Befehl kann in ein leeres Be fehlsregister geholt werden, selbst wenn der Befehlsdekoder 3 durch ein BUSY-Signal der Verarbeitungseinheit nicht dekodiert. Folglich ist es bei der in Fig. 1 gezeigten Ausführungsform möglich, das Auftreten von leeren Befehlsregistern IR0 bis IR3 zu minimieren. Als Ergebnis wird die Anzahl von vom Befehlsde koder 3 auf parallele Weise bereitgestellten Befehlen erhöht, wodurch die Effizienz jeder Verarbeitungseinheit erhöht und die Verarbeitungsgeschwindigkeit deutlich beschleunigt wird.As described above, in the embodiment shown in FIG. 1, a new instruction can be read from the instruction cache and applied to an empty instruction register as a replacement without having to wait until the instructions stored in the instruction registers IR0 to IR3 are sent to the Processing units have been created. Furthermore, in the embodiment shown in FIG. 1, a command can be read out from the command cache 1 and a new command can be fetched into an empty command register, even if the command decoder 3 is not decoded by a processing unit's BUSY signal. Accordingly, in the embodiment shown in Fig. 1, it is possible to minimize the occurrence of empty instruction registers IR0 to IR3. As a result, the number of commands provided by the command decoder 3 in a parallel manner is increased, which increases the efficiency of each processing unit and significantly speeds up the processing speed.

Wie oben beschrieben wird es möglich, die auf parallele Weise vorgesehenen Verarbeitungseinheiten effizient auszunutzen und die Verarbeitungsgeschwindigkeit deutlich zu erhöhen.As described above, it becomes possible in a parallel manner to efficiently use the processing units provided and to increase the processing speed significantly.

Claims

1. superscalar processor, which operates cyclically, with a plurality of processing units ( 4 to 7 ) which are capable of executing simultaneously applied commands in a parallel manner,
an instruction storage device ( 1 ) for storing a plurality of instructions to be processed,
a fetching device (IR0 to IR3, 9) with a plurality of registers for fetching at least one command from the command storage device ( 1 ) and for storing this at least one command in a register, and
a decoder ( 3 ) for decoding an instruction stored in each register of the fetching device (IR0 to IR3, 9) during each cycle, for selecting instructions which can be executed in parallel and for applying them to the processing units ( 4 to 7 ) at the same time, characterized by an empty number detection device ( 10 , 11 , 300 ) for detecting the number of empty registers in the fetch device (IR0 to IR3, 9) during each cycle, and
a control device ( 12 , 100 to 103 , 200 to 203 ), which responds to the result of the detection by the empty number detection device, for controlling the number of commands which the fetch device (IR0 to IR3, 9) from the command storage device ( 1 ) picks up during each cycle.

2. Superscalar processor according to claim 1, characterized in that
the empty number detection device ( 10 , 11 , 300 ) detects the number of commands issued to the processing units ( 4 to 7 ) by the fetching device (IR0 to IR3, 9) as the number of empty registers in the cycle in which the commands from the loading device fault memory device ( 1 ) can be fetched, and
the empty number detection device ( 10 , 11 , 300 ) is worth a sum of the number of empty registers in the previous cycle and the number of instructions applied to the processing units ( 4 to 7 ) from the fetching device (IR0 to IR3, 9) in the current cycle than that Number of empty registers for the cycle recognizes in which the commands from the command memory device ( 1 ) cannot be fetched.

3. Superscalar processor according to claim 2, characterized in that the empty number detection device
an empty number storage device ( 11 ) for storing the empty registers recognized in the previous cycle,
comprises an adding device ( 10 ) for adding the number of instructions applied to the processing units ( 4 to 7 ) from the fetching device (IR0 to IR3, 9) via the decoder device ( 3 ) to the number of empty registers which are stored in the empty number storage device ( 11 ) are stored, and
an idle selector ( 300 ) for selecting either the number of instructions applied to the processing units ( 4 to 7 ) from the fetch device (IR0 to IR3) or the result of adding the adder ( 10 ) and to create the same as the number of empty registers in the fetching device (IR0 to IR3, 9).

4. Superscalar processor according to claim 3, characterized in that
the instruction storage device ( 1 ) in each cycle outputs an identification signal (ICR) indicating whether an instruction has been fetched therefrom, and
the idle selector ( 300 ) performs a selection operation in response to the identification signal.

5. Superscalar processor according to one of claims 1 to 4, characterized by a storage position shifting device (SR1) for shifting the storage position of each command within the Holvorrich device (IR0 to IR3, 9) after completion of the decoding operation of the decoder device ( 3 ).

6. Superscalar processor according to claim 5, characterized in that
the storage position shifting device comprises a first sliding device (SR1) for temporarily storing and holding the commands read from the fetching device (IR0 to IR3, 9) and for shifting them, and
the commands moved in the first pushing device are transmitted back to the fetching device and are written there.

7. Superscalar processor according to claim 6, characterized in that the shift amount of the first shifting device is controlled on the basis of the number of commands sent to the processing units ( 4 to 7 ) from the fetching device (IR0 to IR3, 9) by the decoder device ( 3 ) can be created.

8. Superscalar processor according to claim 6 or 7, characterized in that the control device comprises a second pushing device (SR2) for temporarily storing and holding a plurality of commands read out from the command storage device ( 1 ) and for shifting them during each cycle.

9. Superscalar processor according to claim 8, characterized in that
the shift amount in the second shift device is controlled based on the result of the idle number detection device, and
the commands shifted in the second shifting device (SR2) are transmitted to empty registers of the fetching device (IR0 to IR3, 9) and are written there.

10. scalar processor according to claim 8 or 9, characterized in that the control device
comprises a plurality of command selection devices ( 100 to 103 ) which correspond to each register of the fetching device (IR0 to IR3, 9), for selecting either a corresponding output signal from the first pusher device or a corresponding output signal from the second pusher device and for applying the same to a corresponding one Register, and
a selection control device ( 12 ) for controlling the selection state of each of the command selection devices based on the result of the idle number detection device.

11. Superscalar processor according to one of claims 1 to 10, characterized in that the decoder device ( 3 ) stops the decoder operation of a command if one of the processing devices ( 4 to 7 ) does not execute commands.

12. Superscalar processor according to claim 11, characterized in that the fetching device (IR0 to IR3, 9), the empty number detection device and the control device continue the respective operations in a cycle in which the decoder device ( 3 ) is not in operation.