DE4447706B4

DE4447706B4 - Data flow processor - uses charging logic for individual and group-wise programming of mutually orthogonal homogeneously structured cells in integrated circuit chip

Info

Publication number: DE4447706B4
Application number: DE4447706A
Authority: DE
Inventors: Martin Vorbach; Robert MÜNCH
Original assignee: PACT XPP Technologies AG
Current assignee: Krass Maren Zuerich Ch; Richter Thomas 04703 Bockelwitz De
Priority date: 1993-05-13
Filing date: 1994-05-13
Publication date: 2006-11-23
Anticipated expiration: 2014-05-14

Abstract

An integrated circuit chip carries a number of mutually-orthogonal homogeneously-structured cells, each with a number of logically and structurally-identical components. The cells are combined in rows and columns, possibly also in groups, for connection to chip input/output connections. A charging logic is associated with the cells, via which they can be programmed individually and in groups so that optional logical functions and/or networks can be verified. Manipulation of the data flow processor chip's configuration, i.e. the modification of functional parts or macros can be performed during operation without stopping other functional parts or adversely affecting their functioning.

Description

Die vorliegende Erfindung bezieht sich auf einen partiell zur Laufzeit rekonfigurierbaren Datenflussprozessor, d. h. eine Hardwareeinheit zur logischen Manipulation (Verknüpfung) von in binärer Form vorliegenden Daten (Informationen).The The present invention relates to a part at runtime reconfigurable data flow processor, d. H. a hardware unit for logical manipulation (linking) in binary form available data (information).

Derartige Datenverarbeitungseinrichtungen sind mittlerweile lange bekannt und sie haben bereits breite Anwendung und Anerkennung gefunden. Die prinzipielle Aufbau- und Arbeitsstruktur der bekannten Datenverarbeitungseinrichtungen ist in etwa so zu definieren, daß eine arithmetisch-logische Verknüpfungseinheit vorgesehen ist, in der die zu verknüpfenden Daten einer programmtechnischen Anweisung (Software) zufolge verarbeitet werden. Die Daten werden dabei über ein Steuerwerk in mehr oder weniger komplexen Adressierungsvorgängen entsprechend abgerufen und zunächst in Arbeitsregistern bereitgestellt; nach der logischen Verknüpfung werden dann die neuen Daten in einer vorgegebenen Speicherstelle wieder abgelegt. Die arithmetisch-logische Verknüpfungseinheit besteht dabei aus logischen Verknüpfungsbausteinen (Gatter, Glieder), die jeweils so miteinander gekoppelt sind, daß die zu manipulierenden Daten der zugrunde liegenden Software entsprechend den vier Grundrechenarten gemäß logisch verarbeitet werden.such Data processing equipment has long been known and they have already found wide application and recognition. The basic structure and working structure of the known data processing equipment is to be defined in such a way that an arithmetic-logical Linking unit provided is in which the to be linked Data processed according to a software instruction (software) become. The data is about a control unit in more or less complex addressing operations accordingly retrieved and first provided in working registers; after the logical link will be then the new data in a given location again stored. The arithmetic-logical linking unit consists of this from logic operation blocks (Gates, links), which are each coupled to each other so that the manipulating data of the underlying software accordingly according to the four basic operations according to logical are processed.

Es ist leicht nachzuvollziehen, daß auf der Basis der bekannten Strukturen relativ viel Rechenzeit dafür erforderlich ist, die zu manipulierenden Daten auszulesen, und die Arbeitsregister zu überführen, den spezifischen Logikbausteinen in der arithmetisch-logischen Verknüpfungseinheit zuzuleiten und schließlich wieder abzuspeichern. Es ist ferner einsichtig, daß die Hardware-Struktur der arithmetisch-logischen Verknüpfungseinheit insoweit nicht als optimal betrachtet werden kann, als schließlich die hardwaremäßig vorhandenen integrierten logischen Bausteine stets nur in ein und derselben Art und Weise im Gesamt system aktiv benutzt werden. Ebenso wird durch strikte Hardwarevorgabe ein Aneinanderreihen von Funktionen in sogenannten Pipelines sehr erschwert oder eingeschränkt, was zwangsläufig ein häufiges Registerumladen zwischen Arbeitsregistern und Rechenwerk bedeutet. Derartige Bausteine sind des weiteren nur schlecht kaskadierbar und erfordern dann sehr viel Programmierarbeit.It is easy to understand that on The base of the known structures relatively much computing time required for it is to read the data to be manipulated, and the working registers to convict the specific logic blocks in the arithmetic logic logic unit to feed and finally save again. It is further understood that the hardware structure the arithmetic-logical linking unit so far as can not be considered optimal, as finally the available in hardware integrated logical building blocks always in one and the same Way to be actively used in the overall system. Likewise will Strict hardware specification, a stringing together of functions in so-called pipelines very difficult or limited what inevitably a common one Register reloading between working registers and arithmetic means. Furthermore, such components are only poorly cascadable and then require a lot of programming work.

Ein zusätzlicher Vorteil der vorliegenden Erfindung liegt darin, daß eine über einen weiten Raum skalierbare Parallelität zur Verfügung steht. Hierbei wird eine Basis zum schnellen und flexiblen Aufbau von neuronalen Strukturen geschaffen, wie sie bis dato lediglich mit erheblichem Aufwand simuliert werden können.One additional Advantage of the present invention is that one over a wide space scalable parallelism is available. Here is a Basis for fast and flexible construction of neural structures created, as it simulates so far only with considerable effort can be.

Der Aufsatz „A self-contained dynamically reconfigurable processor architecture" von Michael Saleeba, Department of Computer Science, Monash University, veröffentlicht vor Februar 1993 untersucht bereits die Vorteile, Bereiche von Logikfeldern „on the fly" zu rekonfigurieren, um die Nutzbarkeit der verfügbaren Logik zu maximieren. Saleeba schlägt unter anderem vor, einen Teil eines FPGAs zu rekonfigurieren, indem ein Programm nach seiner Ausführung auf dem Logikarray die Kontrolle an das Betriebssystem zurückgibt. Saleeba stellt fest, dass es am besten sei, in einer solchen Situation ein Reset-Signal anzulegen. Saleeba beschreibt unter anderem, dass nach der Rekonfiguration gegebenenfalls Teile eines Arrays in ihgren Originalzustand zurückgebracht werden müssen, falls dies notwendig sei, was zeitaufwendig sei und die Rekonfigurationen beschränkt.Of the Essay "A self-contained dynamically reconfigurable processor architecture "by Michael Saleeba, Department of Computer Science, Monash University, published Before February 1993, the benefits of exploring areas of logic fields on the fly "to reconfigure, to the usability of the available To maximize logic. Saleeba suggests, among other things, a Reconfigure part of an FPGA by running a program after its execution on the logic array returns control to the operating system. Saleeba notes that it is best in such a situation to create a reset signal. Saleeba describes, among other things, that after reconfiguration, if necessary, parts of an array in ihgren Original condition returned Need to become, if necessary, which is time consuming and the reconfigurations limited.

In dem Aufsatz „The Function Processor: A Data-Driven Processor Array for Irregular Computations" von J. Vasell und J. Vasell, gleichfalls vor dem Anmeldetag der vorliegenden Erfindung veröffentlicht, wird eine Zellarchitektur mit untereinander über nächste Nachbarverbindungen verbundene Zellen vorgeschlagen. Über die Verbindungen können entweder Daten fließen oder Konfigurtionsbefehle für die Zellen. Die der vorliegenden Erfindung zugrunde liegende Aufgabe besteht darin, eine Datenverarbeitungseinrichtung mit programmierer und konfigurierbarer Zellstruktur bereitzustellen, die eine bessere Verarbeitung gewährleistet.In the essay "The Function Processor: A Data-Driven Processor Array for Irregular Computations "of J. Vasell and J. Vasell, also before the filing date of the present Invention published, a cell architecture is interconnected with each other via nearest neighbor links Cells proposed. About the Connections can either data flow or configuration commands for the cells. The object underlying the present invention is a data processing device with a programmer and configurable cell structure that provide better Processing guaranteed.

Außer dem Einsatz als reiner Datenflußprozessor, soll der DFP folgende weitere Aufgaben erfüllen können:

– Einsatz als universeller Baustein zum Aufbau von herkömmlichen Rechnern, wobei der Aufbau einfacher und billiger werden soll.
– Einsatz in neuronalen Netzen.

In addition to being a pure data flow processor, the DFP should be able to perform the following additional tasks:

- Use as a universal building block for the construction of conventional computers, whereby the structure should be simpler and cheaper.
- Use in neural networks.

Diese Aufgabe wird durch die im Patentanspruch 1 angegebenen Merkmale beziehungsweise Schritte gelöst, wobei letztlich ein integrierter Schaltkreis (Chip) mit einer Vielzahl insbesondere orthogonal zueinander angeordneter Zellen mit je einer Mehrzahl jeweils logisch gleicher und strukturell identisch angeordneter Zellen vorgesehen ist deren Anordnung, sowie die interne Busstruktur, zur Erleichterung der Programmierung äußerst homogen ist. Dennoch ist es denkbar innerhalb eines Datenflußprozessors Zellen mit verschiedenen Zellogiken und Zellstrukturen unterzubringen, um so die Leistungsfähigkeit zu erhöhen, indem zum Beispiel für Speicheransteuerungen andere Zellen als für arithmetische Operationen existieren. insbesondere kann für neuronale Netze eine gewisse Spezialisierung von Vorteil sein. Den Zellen ist eine Ladelogik zugeordnet, über die die Zellen je für sich und gegebenenfalls gruppenweise in sogenannte MACROs zusammengefaßt so programmierbar sind, daß einerseits, beliebige logische Funktionen, andererseits aber auch die Verknüpfung der Zellen untereinander in weiten Bereichen verifizierbar sind. Dies wird erreicht indem jeder einzelnen Zelle ein gewisser Speicherplatz zur Verfügung steht, in dem die Konfigurationsdaten abgelegt sind. Anhand dieser Daten werden Multiplexer oder Transistoren in der Zelle beschaltet um die jeweilige Zellfunktion zu gewährleisten (siehe 12).This object is achieved by the features specified in claim 1 or steps solved, wherein ultimately an integrated circuit (chip) is provided with a plurality of orthogonal arranged in particular cells each having a plurality of each logically identical and structurally identically arranged cells whose arrangement, and the internal bus structure, to facilitate programming is extremely homogeneous. Nevertheless, within a data flow processor, it is conceivable to accommodate cells with different cell logics and cell structures so as to increase performance by, for example, having cells other than arithmetic operations for memory drives. In particular, some specialization may be beneficial for neural networks. The cells are assigned a charge logic, via which the cells are grouped into so-called MACROs individually and optionally in groups, so that on the one hand, any logical functions, but on the other hand also the linking of the cells with each other can be verified in a wide range. This is achieved by providing each cell with a certain storage space in which the configuration data are stored. On the basis of this data, multiplexers or transistors in the cell are connected in order to ensure the respective cell function (see 12 ).

Mit anderen als im Patentanspruch 1 gebrauchten Worten besteht der Kern der vorliegenden Erfindung darin, einen Datenflußprozessor vorzuschlagen, der zellular aufgebaut ist und dessen Zellen über eine externe Ladelogik im arithmetisch-logischen Sinne quasi beliebig neu konfiguriert werden können. Dabei ist es von äußerster Notwendigkeit, daß die betreffenden Zeilen einzeln und ohne Beeinflussung der übrigen Zeilen oder gar einer Stillegung des gesamten Bausteins umkonfiguriert werden können. Der Datenflußprozessor gemäß der vorliegenden Erfindung kann so während eines ersten Arbeitszyklusses als Addierer und während eines späteren Arbeitszyklusses als Multiplizierer "programmiert" werden, wobei die Anzahl der für die Addition beziehungsweise die Multiplikation erforderlichen Zellen durchaus unterschiedlich sein können. Dabei bleibt die Plazierung der bereits geladenen MACROs erhalten; der Ladelogik beziehungsweise dem Compiler obliegt es, das neu zu ladende MACRO innerhalb der freien Zellen zu partitionieren (d.h. das zu ladende MACRO so zu zerlegen, daß es sich optimal einfügen läßt). Die Ablaufsteuerung des Programms wird dabei von der Ladelogik übernommen, indem sie gemäß dem momentan ausgeführten Programmabschnitt die entsprechenden MACROs in den Baustein lädt, wobei der Ladevorgang von der später beschriebenen Synchronisationslogik mitgesteuert wird, indem sie den Zeitpunkt des Umladens festlegt. Daher entspricht der DFP nicht der bekannten von-Neumann-Architektur, da die Daten- und Programmspeicher getrennt sind. Dies bedeutet jedoch gleichzeitig eine höhere Sicherheit, da fehlerhafte Programme keinen CODE, sondern lediglich DATEN zerstören können.With other than in claim 1 used words, the core The present invention is to propose a data flow processor, the is cellular and its cells via an external charging logic in the arithmetic-logical sense virtually reconfigured arbitrarily can be. It is of the utmost Need that the relevant lines individually and without affecting the other lines or even a decommissioning of the entire block are reconfigured can. The data flow processor according to the present Invention can be so during a first cycle as an adder and during a later work cycle be "programmed" as a multiplier, the Number of for the addition or multiplication required cells can be quite different. This preserves the placement of the already loaded MACROs; the charge logic or the compiler, it is up to the new loading MACRO within the free cells (i.e. to disassemble the MACRO to be loaded so that it can be optimally inserted). The Sequence control of the program is taken over by the charging logic, by following the current executed Program section loads the corresponding MACROs into the block, where the charging of the later described synchronization logic is controlled by determines the time of reloading. Therefore, the DFP does not match the well-known von Neumann architecture, because the data and program memories are separated. this means but at the same time a higher one Security, because bad programs do not CODE, but only Destroy data can.

Um den Datenflußprozessor eine arbeitsfähige Struktur zu geben, werden einige Zellen, und zwar unter anderem die Eingabe-/Ausgabefunktionen (I/O) und Speichermanagementfunktionen vor dem Laden der Programme geladen und bleiben für gewöhnlich während der gesamten Laufzeit konstant. Dies ist erforderlich um den Datenflußprozessor an seine Hardwareumgebung anzupassen. Die übrigen Zellen werden zu sogenannten MACROs zusammengefaßt und können während der Laufzeit nahezu beliebig und ohne Beeinflussung der Nachbarzellen umkonfiguriert werden. Dazu sind die Zellen einzeln und direkt adressierbar.Around the data flow processor a workable one To give structure to some cells, among others the input / output (I / O) functions and memory management functions loaded before loading the programs and usually remain during the total duration constant. This is required around the data flow processor adapt to its hardware environment. The remaining cells become so-called MACROs summarized and can while the runtime almost arbitrary and without affecting the neighboring cells be reconfigured. For this, the cells are individually and directly addressable.

Um die Umstrukturierung (das Umladen) der Zellen oder MACROs mit der Ladelogik zu synchronisieren, kann – wo notwendig, da nur Umgeladen werden darf, wenn die MACROs mit ihrer alten Tätigkeit fertig sind – eine Synchronisationsschaltung als MACRO auf dem Datenflußprozessor untergebracht werden, die die entsprechenden Signale an die Ladelogik absendet. Hierzu kann eventuell eine Modifikation der gewöhnlichen MACROs von Nöten sein, da diese dann der Synchronisations-Schaltung Zustandsinformationen zur Verfügung stellen müssen.Around the restructuring (reloading) of the cells or MACROs with the Synchronize charging logic, where necessary, because only transhipped may be when the MACROs are done with their old activity - a synchronization circuit as MACRO on the data flow processor be housed, the corresponding signals to the charging logic dispatches. This may possibly be a modification of the usual MACROs of need be, since these then the synchronization circuit state information to disposal have to ask.

Diese Zustandsinformationen signalisieren der Synchronisationslogik für gewöhnlich, daß einzelne MACROs ihre Aufgabe erledigt haben, was aus programmiertechnischer Sicht zum Beispiel die Terminierung einer Prozedur oder das Erreichen der Terminierungsbedingung einer Schleife bedeuten kann. D.h. das Programm wird an einer anderen Stelle fortgesetzt und die die Zustandsinformation absendenden MACROs können umgeladen werden. Zudem kann es von Interesse sein, daß die MACROs in einer bestimmten Reihenfolge umgeladen werden. Hierzu kann eine Wertung der einzelnen Synchronisations-Signale durch einen Prioritätsdekoder erfolgen. Eine derartige – einfache – Logik ist in 13 gezeichnet. Die Logik besitzt sieben Eingangssignale durch die die sieben MACROs ihre Zustandsinformation abgeben. In diesem Fall soll 0 für "in Arbeit" und 1 für "fertig" stehen. Die Logik besitzt drei Ausgangssignale, die an die Ladelogik geführt werden, wobei der Zustand 000 als Ruhezustand gilt. Liegt ein Signal an einem der sieben Eingänge an, so findet eine Dezimal-Binär-Umsetzung statt, so wird zum Beispiel Sync6 als 110 dargestellt, was der Ladelogik anzeigt, daß das MACRO, welches Sync6 bedient, seine Aufgabe beendet hat. Liegen gleichzeitig mehrere Synchronisations-Signale am Eingang an, so gibt die Synchronisationsschaltung das Signal mit der höchsten Priorität an die Ladelogik weiter; liegen zum Beispiel Sync0, Sync4 und Sync6 an, so reicht die Syncronistaions-Schaltung zunächst Sync6 an die Ladelogik weiter. Nachdem die entsprechenden MACROs umgeladen sind und somit Sync6 nicht mehr anliegt wird Sync4 weitergeleitet usw.. Zur Verdeutlichung dieses Prinzips kann der Standard-TTL-Baustein 74148 in Betracht gezogen werden.This state information usually signals the synchronization logic that individual MACROs have completed their task, which from a programming point of view can mean, for example, the termination of a procedure or the achievement of the termination condition of a loop. This means that the program will be continued at another location and the MACROs sending the status information can be reloaded. In addition, it may be of interest that the MACROs are reloaded in a particular order. For this purpose, an evaluation of the individual synchronization signals can be carried out by a priority decoder. Such - simple - logic is in 13 drawn. The logic has seven input signals through which the seven MACROs output their status information. In this case, 0 stands for "in progress" and 1 for "finished". The logic has three output signals, which are fed to the charging logic, the state 000 is considered to be at rest. If a signal is present at one of the seven inputs, then a decimal-to-binary conversion takes place, for example, Sync6 is represented as 110, which indicates to the load logic that the MACRO serving Sync6 has completed its task. If several synchronization signals are present at the same time at the input, the synchronization circuit forwards the signal with the highest priority to the charging logic; For example, if Sync0, Sync4 and Sync6 are on, then the Syncronistaions scarf is enough Initially, Sync6 was forwarded to the charging logic. After the corresponding MACROs are reloaded and thus Sync6 is no longer present, Sync4 is forwarded, etc. To illustrate this principle, the standard TTL module 74148 can be considered.

Über die Ladelogik kann der Datenflußprozessor jeweils optimal und gegebenenfalls dynamisch auf eine zu lösende Aufgabe eingestellt werden. Damit ist zum Beispiel der große Vorteil verbunden, daß neue Normen oder dergleichen einzig und allein durch eine Umprogrammierung des Datenflußprozessors umgesetzt werden können und nicht – wie bisher – einen Austausch mit entsprechendem Anfall von Elektronikschrott bedingen.About the Loading logic can be the data flow processor each optimal and possibly dynamic to a problem to be solved be set. This is the big advantage, for example connected to that new standards or the like solely by reprogramming the Datenflußprozessors can be implemented and not - like so far - one Exchange with appropriate accumulation of electronic waste condition.

Die Datenflußprozessoren sind untereinander kaskadierbar, was zu einer beinahe beliebigen Erhöhung des Parallelisierungsgrades, der Rechenleistung, sowie der Netzgröße in neuronalen Netzen führt. Besonders wichtig ist hier eine klare homogene Verbindung der Zellen mit den Ein-/Ausgangs-Pins (IO-Pin) der Datenflußprozessoren, um möglichst keine Einschränkungen auf die Programme zu haben.The Datenflußprozessoren are cascadable with each other, resulting in almost any increase the degree of parallelization, the computing power, as well as the network size in neural Networks leads. Particularly important here is a clear homogeneous connection of the cells with the input / output pins (IO pin) of the data flow processors, if possible no restrictions to have on the programs.

In 14 ist zum Beispiel die Kaskadierung von vier DFPs gezeigt. Sie erscheinen der Umgebung wie ein großer homogener Baustein (15). Prinzipiell sind damit zwei Kaskadierungsmethoden denkbar:

a) Nur die lokalen Verbindungen zwischen den Zellen werden herausgeführt, was im vorliegenden Beispiel zwei TO-Pins pro Kantenzelle und vier IO-Pins pro Eckzelle bedeutet. Allerdings hat der Compiler/Programmierer zu beachten, daß die globalen Verbindungen nicht herausgeführt werden, wodurch die Kaskadierung nicht vollständig homogen ist. (Globale Verbindungen zwischen mehreren Zellen, für gewöhnlich zwischen einer kom pletten Zellenreihe oder -spalte – siehe 6 –; lokale Verbindungen existieren nur zwischen zwei Zellen). 16a zeigt den Aufbau innerhalb eines DFPs, 17a zeigt die daraus resultierende Kaskadierung von mehreren DFPs (drei gezeichnet).
b) Die lokalen und globalen Verbindungen werden herausgeführt, was die Anzahl der benötigten Treiber/IO-Pins und Leitungen drastisch erhöht, in unserem Beispiel auf sechs IO-Pins pro Kantenzelle und zwölf IO-Pins pro Eckzelle. Dadurch ist eine vollständige Homogenität bei der Kaskadierung gegeben.

In 14 For example, the cascading of four DFPs is shown. They seem like a big homogeneous building block to the environment ( 15 ). In principle, two cascading methods are conceivable:

a) Only the local connections between the cells are led out, which in the present example means two TO pins per edge cell and four IO pins per corner cell. However, the compiler / programmer has to keep in mind that the global connections are not routed out, so that the cascading is not completely homogeneous. (Global connections between multiple cells, usually between a complete cell row or column - see 6 -; local links exist only between two cells). 16a shows the structure within a DFP, 17a shows the resulting cascading of multiple DFPs (three drawn).
b) The local and global connections are led out, which drastically increases the number of required driver / IO pins and lines, in our example to six IO pins per edge cell and twelve IO pins per corner cell. As a result, a complete homogeneity in the cascading is given.

Da die globalen Verbindungen insbesondere bei Verwendung der Kaskadierungstechnik b) sehr lang werden können, kann der unangenehme Effekt auftreten, daß die Zahl der globalen Verbindungen nicht ausreicht, da bekanntlich jede Verbindung nur von einem Signal genutzt werden kann. Um diesen Effekt zu minimieren, kann nach einer gewissen Länge der globalen Verbindungen ein Treiber eingeschleift werden. Der Treiber hat zum einen eine Verstärkung des Signals zur Aufgabe, die bei langen Strecken und entsprechend hohen Lasten, unbedingt erforderlich ist; zum anderen kann der Treiber in Tristate gehen und damit das Signal unterbrechen. Dadurch können die Abschnitte links und rechts, beziehungsweise oberhalb und unterhalb des Treibers von verschiedenen Signalen genutzt werden, sofern der Treiber in Tristate ist, ansonsten wird ein Signal durchgeschleift. Wichtig ist hierbei, daß die Treiber der einzelnen globalen Leitungen auch einzeln angesteuert werden können, d.h. ein globales Signal kann unterbrochen sein, das Nachbarsignal ist jedoch durchgeschleift. Somit können auf einer globalen Verbindung durchaus abschnittweise verschiedene Signale anliegen, während die globale Nachbarverbindung tatsächlich global von ein und demselben Signal verwendet wird (vergleiche 22).Since the global connections can become very long, in particular when using the cascading technique b), the unpleasant effect can occur that the number of global connections is not sufficient since, as is known, each connection can only be used by one signal. To minimize this effect, a driver can be inserted after a certain length of the global connections. The driver has on the one hand an amplification of the signal to the task, which is absolutely necessary for long distances and correspondingly high loads; on the other hand, the driver can go into Tristate and thus interrupt the signal. As a result, the sections on the left and right, or above and below the driver of different signals can be used, if the driver is in Tristate, otherwise a signal is looped through. It is important here that the drivers of the individual global lines can also be controlled individually, ie a global signal can be interrupted, but the neighboring signal is looped through. Thus, different signals may be present in sections on a global connection, while the global neighboring connection is actually used globally by one and the same signal (cf. 22 ).

Zur besseren Kommunikation zwischen den Datenflußprozessoren und der Ladelogik können sogenannte Shared-Memories eingesetzt werden. So können zum Beispiel Programme von einer Festplatte, die im IO-Bereich eines Datenflußprozessors liegt zur Ladelogik durchgereicht werden, indem die Datenflußprozessoren die Daten von der Platte in den Shared-Memory schreiben und die Ladelogik sie dort abholt. Dies ist besonders wichtig, da hier, wie bereits erwähnt, keine von-Neumann- sondern eine Harvardarchitektur vorliegt. Ebenso sind die Shared-Memories von Vorteil, wenn Konstanten, die im Programm – das im Speicherbereich der Ladelogik liegt – definiert sind, mit Daten – die im Speicherbetriab der Datenflußprozessoren liegen – verknüpft werden sollen.to better communication between the data flow processors and the charging logic can so-called shared memories be used. So can For example, programs from a hard drive that is in the IO range of a Datenflußprozessors is passed to the charging logic by the data flow processors write the data from the disk to the shared memory and the Ladelogik picks you up there. This is especially important because here, As already mentioned, no von Neumann but a Harvard architecture exists. As well are the shared memories of advantage, if constants in the program - the im Memory area of the charging logic is - defined, with data - in the Memory operation of the data flow processors lie - linked should.

Weiterbildungen der vorstehend definierten und umschriebenen Erfindung sind Gegenstand der Unteransprüche.further developments the above defined and circumscribed invention are the subject of the dependent claims.

Eine besondere Verwendung des erfindungsgemäßen Datenflußprozessors ist darin zu sehen, daß er in Verbindung mit geeigneten Ein-/Ausgabe-Einheiten einerseits und einem Speicher andererseits die Basis für einen kompletten (komplexen) Rechner bilden kann. Dabei kann ein Großteil der IO-Funktionen als MACROs auf dem Datenflußprozessor implementiert werden und es brauchen momentan lediglich Spezialbausteine (Ethernet-Treiber, VRAMS...) extern zugefügt zu werden. Bei einer Normänderung oder Verbesserung muß dann wie bereits angedeutet nur das MACRC softwareseitig gewechselt werden; ein Eingriff in die Hardware ist nicht notwendig. Es bietet sich hier an, einen IO-(Eingabe-/Ausgabe-) Stecker festzulegen, über welchen dann die Zusatzbausteine angeschlossen werden können.A particular use of the data flow processor according to the invention is to be seen in that it can form the basis for a complete (complex) computer in conjunction with suitable input / output units on the one hand and a memory on the other hand. A large part of the IO functions can be implemented as MACROs on the data flow processor and at the moment only special modules (Ethernet driver, VRAMS ...) need to be added externally. In the case of a standard change or improvement, as already indicated, only the MACRC must be changed on the software side; an intervention in the Hard ware is not necessary. It makes sense here to specify an IO (input / output) connector, via which the additional modules can then be connected.

20 zeigt den stark vereinfachten Aufbau eines heute üblichen Rechners. Durch den Einsatz eines DFP-Bausteins können erhebliche Teile eingespart werden (21), wobei die entsprechenden herkömmlichen Baugruppen (CPU, Speicherverwaltung, SCSI-, Tastatur- und Videointerface, sowie der parallelen und seriellen Schnittstellen) als MACROs in die kaskadierten DFPs abgelegt werden. Nur die durch einen DFP nicht nachbildbaren Teile wie Speicher und Leitungstreiber mit nicht TTL-Pegeln oder für hohe Lasten müssen extern zugeschaltet werden. Durch die Verwendung des DFPs ist eine günstige Produktion gegeben, da ein und derselbe Baustein sehr häufig verwendet wird, das Layout der Platine ist durch die homogene Vernetzung entsprechend einfach. Zudem wird der Aufbau des Rechners durch die Ladelogik bestimmt, die hier für gewöhnlich nur zu Beginn der Abarbeitung (nach einem Reset) das DFP-Array lädt, wodurch eine günstige Fehlerkorrektur- und Erweiterungsmöglichkeit gegeben ist. Ein derartiger Rechner kann insbesondere mehrere verschiedene Rechnerstrukturen simulieren, indem einfach der Aufbau des zu simulierenden Rechners in das DFP-Array geladen wird. Zu bemerken ist, daß hierbei der DFP nicht in seiner Funktion als DFP arbeitet sondern lediglich ein hochkomplexes und frei programmierbares Logikarray zur Verfügung stellt, sich hierbei jedoch von herkömmlichen Bausteinen in seiner besonderen guten Kaskadierbarkeit unterscheidet. 20 shows the simplistic structure of a today's usual computer. Substantial parts can be saved by using a DFP module ( 21 ), whereby the corresponding conventional components (CPU, memory management, SCSI, keyboard and video interface, as well as the parallel and serial interfaces) are stored as MACROs in the cascaded DFPs. Only parts that can not be replicated by a DFP, such as memory and line drivers with non TTL levels or for high loads, must be connected externally. By using the DFPs a favorable production is given, since one and the same building block is used very frequently, the layout of the board is correspondingly simple due to the homogenous networking. In addition, the structure of the computer is determined by the charging logic, which here usually only at the beginning of the processing (after a reset) loads the DFP array, whereby a favorable error correction and expansion option is given. Such a computer can in particular simulate several different computer structures by simply loading the structure of the computer to be simulated into the DFP array. It should be noted that the DFP does not function as a DFP but merely provides a highly complex and freely programmable logic array, but differs from conventional devices in its particular good cascadability.

Ein weiteres Einsatzgebiet des Bausteins ist der Aufbau großer neuronaler Netze. Sein besonderer Vorzug liegt hierbei in seiner hohen Gatterdichte, seiner ausgezeichneten Kaskadierbarkeit, sowie seiner Homogenität. Ein Lernvorgang, der eine Änderung einzelner axiomatischer Verbindungen beziehungsweise einzelner Zellfunktionen beinhaltet ist auf üblichen Bausteinen ebenso schlecht durchführbar, wie der Aufbau großer homogener und gleichzeitig flexibler Zellstrukturen. Die dynamische Umkonfigurierbarkeit ermöglicht erstmalig die optimale Simulation von Lernvorgängen.One Another application of the device is the construction of large neural Networks. Its special advantage lies in its high gate density, its excellent cascadability, as well as its homogeneity. A learning process, the one change single axiomatic connections or individual cell functions includes is on usual Building blocks as poorly feasible as the construction of large homogeneous and at the same time flexible cell structures. The dynamic reconfigurability allows for the first time the optimal simulation of learning processes.

Die vorliegende Erfindung wird im folgenden anhand der weiteren Figuren näher erläutert. Insgesamt zeigenThe The present invention will be described below with reference to the further figures explained in more detail. Overall show

1 ein Schaltsymbol für einen 8-Bit-Addierer; 1 a switching symbol for an 8-bit adder;

2 ein Schaltsymbol für einen aus acht 1-Bit-Addierern bestehenden 8-Bit-Addierer nach 1; 2 a switching symbol for an 8-bit adder consisting of eight 1-bit adders 1 ;

3 eine logische Struktur eines 1-Bit-Addierers entsprechend 2; 3 a logical structure of a 1-bit adder accordingly 2 ;

4 eine Zellenstruktur des 1-Bit-Addierers entsprechend 3; 4 a cell structure of the 1-bit adder accordingly 3 ;

5 einen der Zellenstruktur nach 1 entsprechend aufgebauten 8-Bit-Addierer; 5 one of the cell structure after 1 correspondingly constructed 8-bit adder;

6 ein aus vier Zellen bestehendes unprogrammiertes SUBMACRO X (analog einem 1-Bit-Addierer gemäß 4 beziehungsweise 5) mit den erforderlichen Leitungsanschlüssen; 6 a four-cell unprogrammed SUBMACRO X (analogous to a 1-bit adder according to 4 respectively 5 ) with the required line connections;

7 einen Teilausschnitt eines integrierten Schaltkreises (Chip) mit einer Vielzahl von Zellen und einem separierten SUBMACRO X gemäß 6; 7 a partial section of an integrated circuit (chip) with a plurality of cells and a separated SUBMACRO X according to 6 ;

8 einen integrierten Schaltkreis (Chip) mit einer Orthogonalstruktur einer quasi beliebigen Vielzahl von Zellen und einer extern zugeordneten Ladelogik; 8th an integrated circuit (chip) having an orthogonal structure of a virtually arbitrary plurality of cells and an externally assigned charging logic;

9 ein erstes Ausführungsbeispiel einer Mehrzahl miteinander zu einem Rechenwerk gekoppelter integrierter Schaltkreise (Datenflußprozessor) nach 8; 9 a first embodiment of a plurality of coupled to an arithmetic unit integrated circuits (data flow processor) after 8th ;

10 ein zweites Ausführungsbeispiel einer Mehrzahl miteinander zu einem Rechenwerk gekoppelter integrierter Schaltkreise (Datenflußprozessor) nach 8; 10 a second embodiment of a plurality of integrated circuits coupled together to form an integrated circuit (data flow processor) 8th ;

11 ein Ausführungsbeispiel eines MACRO zur Addition zweier Zahlenreihen; 11 an embodiment of a MACRO for adding two rows of numbers;

12 einen beispielhaften Aufbau einer Zelle mit Multiplexern zur Auswahl der jeweiligen logischen Bausteine; 12 an exemplary structure of a cell with multiplexers for selecting the respective logical components;

13 eine zum Beispiel mit einem Standard-TTL-Baustein 74148 ausgeführte Synchronisationslogik; 13 one for example with a standard TTL module 74148 executed synchronization logic;

14 die Kaskadierung von vier DFPs, wobei die Verbindung zwischen den IO-Pins nur schematisch dargestellt sind (tatsächlich bedeutet eine gezeichnete Verbindung eine Mehrzahl von Leitungen); 14 the cascading of four DFPs, with the connection between the IO pins shown only schematically (in fact, a drawn connection means a plurality of lines);

15 die durch die Kaskadierung erreichte Homogenität; 15 the homogeneity achieved by cascading;

16a die Struktur der E/A-Zellen, wobei die globalen Verbindungen nicht herausgeführt werden, 16a the structure of the I / O cells, where the global connections are not led out,

16b die Struktur der E/A-Zellen, jedoch mit herausgeführten globalen Verbindungen; 16b the structure of the I / O cells, but with outgoing global connections;

17a die aus 16a resultierende Kaskadierung, wobei eine Eckzelle, sowie die zwei mit ihr kommunizierenden Treiberzellen der kaskadierten Bausteine (vergleiche hierzu 14) gezeichnet sind; 17a from 16a resulting cascading, wherein a corner cell, as well as the two communicating with them driver cells of the cascaded blocks (see 14 ) are drawn;

17b die aus 16b resultierende Kaskadierung, wobei eine Eckzelle, sowie die zwei mit ihr kommunizierenden Treiberzellen der kaskadierten Bausteine (vergleiche hierzu 14) gezeichnet sind; 17b from 16b resulting cascading, wherein a corner cell, as well as the two communicating with them driver cells of the cascaded blocks (see 14 ) are drawn;

18a eine Multiplikationsschaltung (vergleiche 11a); 18a a multiplication circuit (cf. 11a );

18b die interne Struktur des DFPs nach dem Laden (vergleiche 11b); 18b the internal structure of the DFP after loading (compare 11b );

19c die Arbeitsweise des DFPs im Speicher, sowie die Zustände der Zähler 47, 49; 19c the operation of the DFP in memory, as well as the states of the counters 47 . 49 ;

19 eine Kaskadenschaltung, wobei der Addierer aus 11 und der Multiplizierer aus 18 zur Steigerung der Rechenleistung hintereinander geschaltet sind; 19 a cascade circuit, wherein the adder off 11 and the multiplier 18 are connected in series to increase the computing power;

20 den stark schematisierten Aufbau eines herkömmlichen Rechners; 20 the highly schematic structure of a conventional computer;

21 den möglichen Aufbau desselben Rechners mit Hilfe eines Arrays aus kaskadierten DFPs; 21 the possible construction of the same computer using an array of cascaded DFPs;

22 einen Ausschnitt mit eingezeichneten (Leitungs-) Treibern eines DFPs. 22 a section with drawn (line) drivers of a DFP.

In 1 ist ein Schaltsymbol eines 8-Bit-Addierers dargestellt. Das Schaltsymbol besteht aus einem quadratischen Baustein 1 mit acht Eingängen A 0...7 für ein erstes Datenwort A und acht Eingängen B 0...7 für ein zweites (zu addierendes) Datenwort B. Die jeweils acht Eingänge Ai, Bi werden ergänzt durch einen weiteren Eingang Üein über den dem Baustein 1 gegebenenfalls ein Übertrag zugeleitet wird. Der Baustein 1 hat funktions- und bestimmungsgemäß acht Ausgänge S 0...7 für binären Summanden und einen weiteren Ausgang Üaus für den gegebenenfalls bestehenden Übertrag.In 1 is a switching symbol of an 8-bit adder shown. The switching symbol consists of a square block 1 with eight inputs A 0 ... 7 for a first data word A and eight inputs B 0 ... 7 for a second (to be added) data word B. The eight inputs Ai, Bi are supplemented by a further input Uein via the building block 1 if necessary, a transfer is forwarded. The building block 1 has functionally and correctly eight outlets S 0 ... 7 for binary summands and another output Out for the possible existing carry.

Das in 1 dargestellte Schaltsymbol ist in 2 als Anordnung sogenannter SUBMACROS dargestellt. Diese SUBMACROS 2 bestehen je aus einem 1-Bit-Addierer 3 mit je einem Eingang für die entsprechenden Bits des Datenworts und einem weiteren Eingang für ein Übertragsbit. Die 1-Bit-Addierer 3 weisen darüberhinaus einen Ausgang für den Summanden und einen Ausgang für den Übertrag Üaus auf.This in 1 shown switching symbol is in 2 represented as an arrangement of so-called SUBMACROS. This SUBMACROS 2 each consist of a 1-bit adder 3 with one input each for the corresponding bits of the data word and another input for a carry bit. The 1-bit adders 3 In addition, they have an output for the summand and an output for the carry out.

In 3 ist die binäre Logik eines 1-Bit-Addierers beziehungsweise eines SUBMACROS 2 nach 2 dargestellt. Analog zu 2 weist diese Schaltlogik je einen Eingang Ai, Bi für die konjugierten Bits der zu verknüpfenden Daten auf; ferner ist ein Eingang Üein für den Übertrag vorgesehen. Diese Bits werden den dargestellten Verbindungen beziehungsweise Verknüpfungen entsprechend in zwei ODER-Gliedern 5 und drei NAND-Gliedern 6 verknüpft, so daß am Ausgangsanschluß Si und am Ausgang für den Übertrag Üaus die einem Volladdierer entsprechenden Verknüpfungsergebnisse (Si, Üaus) anstehen.In 3 is the binary logic of a 1-bit adder or a SUBMACROS 2 to 2 shown. Analogous to 2 this switching logic has an input Ai, Bi for the conjugate bits of the data to be linked; Furthermore, an input Uein is provided for the transfer. These bits corresponding to the connections or links shown in two OR gates 5 and three NAND members 6 linked, so that at the output terminal Si and at the output for the carry Oaus the full adder corresponding connection results (Si, Üaus) are pending.

Die Erfindung setzt da ein, wo es – wie in 4 dargestellt – darum geht, das in 3 gezeigte SUBMACRO 2 oder eine oder mehrere beliebige Funktion(en) in geeigneter Weise in einer Zellstruktur zu implementieren. Dies geschieht auf der Grundlage logisch und strukturell identi scher Zellen 10, deren einzelne logische Bausteine der auszuführenden Verknüpfungsfunktion entsprechend miteinander gekoppelt werden, und zwar mittels der noch zu beschreibenden Ladelogik. Gemäß der in 4 gezeigten, von der Schaltlogik nach 3 abgeleiteten Verknüpfungslogik für einen 1-Bit-Addierer sind je zwei Zellen 10.1, 10.2 bezüglich der logischen Bausteine insoweit gleich, daß jeweils ein ODER-Glied 5 und ein NAND-Glied 6 aktiviert sind. Eine dritte Zelle 10.3 wird nur als Leitungszelle (Leiterbahnzelle) benutzt und die vierte Zelle 10.4 ist bezüglich des dritten NAND-Gliedes 6 aktiv geschaltet. Das aus den vier Zellen 10.1 ... 10.4 bestehende SUBMACRO 2 steht somit stellvertretend für einen 1-Bit-Addierer, d.h. ein 1-Bit-Addierer einer Datenverarbeitungseinrichtung gemäß der vorliegenden Erfindung wird über vier entsprechend programmierte (konfigurierte) Zellen 10.1 ... 10.4 verifiziert. (Der Vollständigkeit halber soll angemerkt werden, daß die einzelnen Zellen ein erheblich umfangreicheres Netzwerk von logischen Bausteinen, sprich Verknüpfungsgliedern, und Invertern aufweist, die jeweils dem aktuellen Befehl der Ladelogik zufolge aktiv geschaltet werden können. Neben den logischen Bausteinen ist auch ein dichtes Netz von Verbindungsleitungen zwischen den jeweils benachbarten Bausteinen und zum Aufbau von zeilen- und spaltenweisen Busstrukturen zur Datenübertragung andererseits vorgesehen, so daß über eine entsprechende Programmierung seitens der Ladelogik quasi beliebige logische Verknüpfungsstrukturen implementiert werden können).The invention starts where it - as in 4 shown - that's what it's about in 3 shown SUBMACRO 2 or to implement one or more arbitrary function (s) in a suitable manner in a cell structure. This is done on the basis of logically and structurally identical cells 10 whose individual logic components of the logic function to be executed are correspondingly coupled to one another by means of the charging logic to be described later. According to the in 4 shown, by the switching logic 3 derived linking logic for a 1-bit adder are two cells each 10.1 . 10.2 with respect to the logical components so far the same, that in each case an OR gate 5 and a NAND member 6 are activated. A third cell 10.3 is used only as a line cell (track cell) and the fourth cell 10.4 is re of the third NAND member 6 switched active. That from the four cells 10.1 ... 10.4 existing SUBMACRO 2 is thus representative of a 1-bit adder, ie a 1-bit adder of a data processing device according to the present invention will have four correspondingly programmed (configured) cells 10.1 ... 10.4 Verified. (For the sake of completeness, it should be noted that the individual cells have a considerably larger network of logical building blocks, ie logic gates, and inverters, which can be switched according to the current command of the charging logic according to the logic building blocks is also a dense network of Provided connecting lines between the respective adjacent blocks and to build line and column-wise bus structures for data transmission on the other hand, so that virtually any logical link structures can be implemented via a corresponding programming on the part of the charging logic).

Der Vollständigkeit halber ist in 5 der Zellenaufbau eines 8-Bit-Addierers in seiner Gesamtheit dargestellt. Die in 5 gezeigte Struktur entspricht insoweit der nach 2, wobei die in 2 symbolisch als SUBMACROS 2 dargestellten 1-Bit-Addierer jeweils durch eine vier-zellige Einheit 10.1 ... 10.4 ersetzt sind. Bezogen auf den erfindungsgemäße Datenflußprozessor bedeutet dies, daß zweiund dreißig Zellen der zur Verfügung stehenden Gesamtheit von Zellen einer zellular mit logisch identischem Layout gefertigten Schaltungsplatine seitens der Ladelogik so angesteuert und konfiguriert beziehungsweise programmiert werden, daß diese zweiunddreißig Zellen ein 8-Bit-Addierer bilden.For completeness is in 5 the cell structure of an 8-bit adder is shown in its entirety. In the 5 The structure shown corresponds to that extent 2 , where the in 2 symbolically as SUBMACROS 2 represented 1-bit adder each by a four-cell unit 10.1 ... 10.4 are replaced. With respect to the data flow processor of the present invention, this means that thirty-two cells of the available set of cells of a cellular logic-identical layout board are driven and configured by the load logic such that these thirty-two cells form an 8-bit adder.

In der Darstellung nach 5 ist über eine strichpunktierte Umrahmung ein SUBMACRO "X" zeichnerisch separiert, das letztlich als aus vier einem 1-Bit-Addierer entsprechend programmierten Zellen (10 gemäß 4) bestehende Untereinheit zu betrachten ist.In the illustration after 5 a SUBMACRO "X" is diagrammatically separated by a dash-dotted frame, which is ultimately designed as four cells correspondingly programmed from a 1-bit adder ( 10 according to 4 ) existing subunit is to be considered.

Das in 5 separierte SUBMACRO "X" ist in 6 als Teil eines integrierten Schaltkreises (Chip) 20 gemeinsam mit Leitungs- und Datenanschlüssen dargestellt. Das SUBMACRO "X" besteht aus den vier Zellen 10 die entsprechend der orthogonalen Struktur je Seite vier Datenanschlüsse (also insgesamt sechzehn Datenanschlüsse je Zelle) aufweisen. nie Datenanschlüsse verbinden jeweils benachbarte Zellen, so daß ersichtlich wird, wie beispielsweise eine Dateneinheit von Zelle zu Zelle durchgeschleust wird. Die Ansteuerung der Zellen 10 erfolgt einerseits über sogenannte lokale Steuerungen, das sind lokale Leitungen, die mit allen Zellen verbunden sind, und andererseits über sogenannte globale Leitungen, d.h. Leitungen, die über den gesamten integrierten Schaltkreis (Chip) 20 geführt sind.This in 5 separated SUBMACRO "X" is in 6 as part of an integrated circuit (chip) 20 presented together with line and data connections. The SUBMACRO "X" consists of the four cells 10 according to the orthogonal structure have four data ports per page (ie a total of sixteen data ports per cell). no data connections ever connect adjacent cells, so that it becomes apparent, for example, how a data unit is passed through from cell to cell. The control of the cells 10 takes place on the one hand via so-called local controls, which are local lines which are connected to all cells, and on the other hand via so-called global lines, ie lines which run across the entire integrated circuit (chip) 20 are guided.

In 7 ist ein vergrößerter Ausschnitt eines integrierten Schaltkreises 20 dargestellt, der mit einem orthogonalen Raster von Zellen 10 belegt ist. Wie in 7 angedeutet kann so zum Beispiel eine Gruppe von vier Zellen 10 als SUBMACRO "X" ausgewählt und dem 1-Bit-Addierer entsprechend 4 gemäß programmiert beziehungsweise konfiguriert werden.In 7 is an enlarged section of an integrated circuit 20 shown with an orthogonal grid of cells 10 is occupied. As in 7 For example, a group of four cells may be indicated 10 selected as SUBMACRO "X" and corresponding to the 1-bit adder 4 be programmed or configured according to.

Ein vollständiger integrierter Schaltkreis (Chip) 20 ist in 8 dargestellt. Dieser integrierte Schaltkreis 20 besteht aus einer Vielzahl im orthogonalen Raster angeordneter Zellen 10 und weist an seinen Außenkanten eine entsprechende Anzahl von Leitungsanschlüssen (Pins) auf, über die Signale, insbesondere Ansteuersignale und Daten zugeführt und weitergeleitet werden können. In 8 ist wiederum das SUBMACRO "X" gemäß 5/6 abgegrenzt; darüberhinaus sind auch weitere SUBMACROS separiert, die spezifischen Funktionen und Vernetzungen entsprechend zu Untereinheiten zusammengefaßt sind. Dem integrierten Schaltkreis (Chip) 20 ist eine Ladelogik 30 zugeordnet beziehungsweise übergeordnet, über die der integrierte Schaltkreis 20 programmiert und konfiguriert wird. Die Ladelogik 30 teilt letztlich dem integrierten Schaltkreis 20 mit, wie er arithmetisch-logisch zu arbeiten hat. Bezugnehmend auf die 1 bis 5 ist in 8 einerseits das SUBMACRO "X" entsprechend 4 und 5 hervorgehoben; andererseits ist auch ein MACRO "Y" entsprechend 1 und 2 angezeichnet, das als Einheit einem 8-Bit-Addierer entspricht.A complete integrated circuit (chip) 20 is in 8th shown. This integrated circuit 20 consists of a large number of cells arranged in the orthogonal grid 10 and has at its outer edges a corresponding number of line terminals (pins), can be supplied via the signals, in particular drive signals and data and forwarded. In 8th again is the SUBMACRO "X" according to 5 / 6 demarcated; In addition, other SUBMACROS are separated, the specific functions and networks are summarized according to subunits. The integrated circuit (chip) 20 is a charging logic 30 assigned or superior, over which the integrated circuit 20 is programmed and configured. The charging logic 30 ultimately shares the integrated circuit 20 with how he has to work arithmetically and logically. Referring to the 1 to 5 is in 8th on the one hand the SUBMACRO "X" accordingly 4 and 5 highlighted; on the other hand, a MACRO "Y" is also appropriate 1 and 2 which corresponds as a unit to an 8-bit adder.

Anhand von 9 beziehungsweise 10 soll im folgenden eine Rechnerstruktur beschrieben werden, die auf den im vorstehenden definierten und erläuterten integrierten Schaltkreis 20 aufbaut.Based on 9 respectively 10 will be described in the following a computer structure based on the above-defined and explained integrated circuit 20 builds.

Gemäß dem in 9 dargestellten ersten Ausführungsbeispiel ist – analog zur Anordnung der Zellen – im Orthogonalraster eine Mehrzahl von integrierten Schaltkreisen 20 angeordnet, deren jeweils benachbarte über lokale BUS-Leitungen 21 miteinander gekoppelt beziehungsweise vernetzt sind. Die – beispielsweise aus sechzehn integrierten Schaltkreisen 20 bestehende – Rechnerstruktur weist Ein-/Ausgangsleitungen IO auf, über die der Rechner quasi mit der Außenwelt in Verbindung steht, d.h. korrespondiert. Der Rechner gemäß 9 weist ferner einen Speicher 22 auf, der dem dargestellten Ausführungsbeispiel entsprechend aus zwei separierten Speichern, zusammengesetzt aus jeweils RAM, ROM sowie einem Dual-Ported RAM als shared memory zu der Ladelogik geschaltet, besteht, die gleichermaßen als Schreib-Lese-Speicher oder auch nur als Lese-Speicher realisiert sein können. Der soweit beschriebenen Rechnerstruktur ist die Ladelogik 30 zu- beziehungsweise übergeordnet, mittels der die integrierten Schaltkreise (Datenflußprozessor) 20 programmiert und konfiguriert und vernetzt werden.According to the in 9 illustrated first embodiment is - analogous to the arrangement of the cells - in Orthogonalraster a plurality of integrated circuits 20 arranged, their respective adjacent via local bus lines 21 coupled or networked with each other. The - for example, sixteen integrated circuits 20 existing - computer structure has input / output lines IO, via which the computer is virtually in connection with the outside world, ie corresponds. The calculator according to 9 also has a memory 22 on, which, according to the illustrated embodiment, consists of two separate memories, each composed of RAM, ROM and a dual-ported RAM as shared memory to the charging logic, which functions equally as a read-write memory or only as a Read memory can be realized. The computer structure described so far is the charging logic 30 to or above, by means of which the integrated circuits (data flow processor) 20 programmed and configured and networked.

Die Ladelogik 30 baut auf einem Transputer 31, d.h. einem Prozessor mit mikrocodiertem Befehlssatz auf, dem seinerseits ein Speicher 32 zugeordnet ist. Die Verbindung zwischen dem Transputer 31 und dem Datenflußprozessor basiert auf einer Schnittstelle 33 für die sogenannten Ladedaten, d.h. die Daten die den Datenflußprozessor aufgabenspezifisch programmieren und konfigurieren und einer Schnittstelle 34 für den bereits genannten Rechnerspeicher 22, d.h. den Shared-Memory-Speicher.The charging logic 30 builds on a transputer 31 ie a microprocessor-coded instruction processor, which in turn is a memory 32 assigned. The connection between the transputer 31 and the data flow processor is based on an interface 33 for the so-called load data, ie the data which program the data flow processor task specific and configure and an interface 34 for the already mentioned computer memory 22 ie the shared memory memory.

Die in 9 dargestellte Struktur stellt so einen kompletten Rechner dar, der über die Ladelogik 30 jeweils fall- beziehungsweise aufgabenspezifisch programmiert und konfiguriert werden kann. Der Vollständigkeit halber sei noch angemerkt, daß – wie in Verbindung mit der Ladelogik 30 über Pfeile angedeutet – mehrere dieser Rechner vernetzt, d.h. miteinander gekoppelt werden können.In the 9 represented structure represents so a complete computer, which over the Ladelogik 30 can be programmed and configured for each case or task. For completeness, it should be noted that - as in connection with the charging logic 30 indicated by arrows - several of these computers networked, that can be coupled with each other.

Ein weiteres Ausführungsbeispiel einer Rechnerstruktur ist in 10 dargestellt. Im Unterschied zu 9 sind dabei neben den lokalen BUS-Leitungen zwischen den benachbarten integrierten Schaltkreisen 20 noch übergeordnete zentrale BUS-Leitungen 23 vorgesehen, um zum Beispiel spezifische Ein- beziehungsweise Ausgangsprobleme lösen zu können. Auch der Speicher 22 (Shared-Memory) ist über zentrale BUS-Leitungen 23 mit den integrierten Schaltkreisen 20 verbunden, und zwar wie dargestellt jeweils mit Gruppen dieser integrierten Schaltkreise. Die in 10 dargestellte Rechnerstruktur weist die gleiche Ladelogik 30 auf, wie sie anhand von 9 erläutert wurde.Another embodiment of a computer structure is in 10 shown. In contrast to 9 are in addition to the local BUS lines between the adjacent integrated circuits 20 still higher-level central bus lines 23 provided, for example, to be able to solve specific input or output problems. Also the memory 22 (Shared memory) is via central bus lines 23 with the integrated circuits 20 connected, as shown in each case with groups of these integrated circuits. In the 10 Computer structure shown has the same charging logic 30 on how they are based on 9 was explained.

In Verbindung mit 11a soll eine auf erfindungsgemäßen Datenflußprozessoren aufgebaute Additionsschaltung erläutert werden. Ausgegangen wird von zwei Zahlenreihen An und Bn für sämtliche n zwischen 0 und 9; die Aufgabe besteht darin, die Summe Ci = Ai + Bi zu bilden, wobei der Index i die Werte 0 <= n < 9 annehmen kann.Combined with 11a an addition circuit constructed on data flow processors according to the invention will be explained. The starting point is two series of numbers An and Bn for all n between 0 and 9; the task is to form the sum Ci = Ai + Bi, where the index i can assume the values 0 <= n <9.

Bezugnehmend auf die Darstellung nach 11a ist die Zahlenreihe An in einem ersten Speicher RAM1 abgespeichert und zwar zum Beispiel ab einer Speicheradresse 1000h; die Zahlenreihe Bn ist in einem Speicher RAM2 an einer Speicheradresse 0dfa0h abgespeichert; die Summe Cn wird in den RAM1 eingeschrieben und zwar unter der Adresse 100ah.Referring to the illustration below 11a the number series An is stored in a first memory RAM1, for example starting at a memory address 1000h; the number series Bn is stored in a memory RAM2 at a memory address 0dfa0h; the sum Cn is written in the RAM1 under the address 100ah.

Es ist ein weiterer Zähler 49 zugeschaltet, der lediglich die einzelnen durch die Steuerschaltung freigegebenen Taktzyklen hochzählt. Dies soll im Weiteren zur Verdeutlichung der Umkonfigurierbarkeit einzelner MACROs ohne Beeinflussung der an der Umkonfigurierung nicht beteiligten MACROs dienen.It is another counter 49 switched on, which only counts up the individual clock cycles enabled by the control circuit. This is intended to further clarify the reconfigurability of individual MACROs without influencing the MACROs not involved in the reconfiguration.

11a zeigt zunächst die eigentliche Additionsschaltung 40, die aus einem ersten Register 41 zur Aufnahme der Zahlenreihe An und einem zweiten Register 42 zur Aufnahme der Zahlenreihe Bn besteht. Den beiden Registern 41/42 ist ein 8-Bit-Addierer entsprechend dem in 1 dargestellten MACRO 1 nachgeschaltet. Der Ausgang des MACRO 1 führt über eine Treiberschaltung 43 zurück zum Speicher RAM1. Die Takt- beziehungsweise Zeitsteuerung der Addi tionsschaltung 40 erfolgt über eine von einem Taktgenerator T angesteuerte Zeitsteuerung (STATEMRCHINE) 45, die mit den Registern 41, 42 und der Treiberschaltung 43 verbunden ist. 11a shows first the actual addition circuit 40 coming from a first register 41 to record the number series An and a second register 42 to record the number series Bn. The two registers 41 / 42 is an 8-bit adder corresponding to the one in 1 presented MACRO 1 downstream. The output of the MACRO 1 leads via a driver circuit 43 back to memory RAM1. The clock or timing of Addi tion circuit 40 occurs via a timer controlled by a clock generator T (STATEMRCHINE) 45 that with the registers 41 . 42 and the driver circuit 43 connected is.

Die Additionsschaltung 40 wird funktional durch eine Adreßschaltung 46 zur Generierung der Adressdaten für die abzuspeichernden Additionsergebnisse ergänzt. Die Adreßschaltung 46 besteht ihrerseits aus drei MACROs 1 (gemäß 1) zur Bildung der Adreßdaten, wobei diese MACROs 1 wie folgt geschaltet sind: Über jeweils einen Eingang werden die zu verknüpfenden Adressen für An, Bn, Cn zugeführt. Diese Adressen werden mit den Ausgangssignalen eines Zählers 47 addiert und mit der Statemachine 45 so verknüpft, daß am Ausgang die neue Zieladresse ansteht. Der Zähler 47 und der Komparator 48 haben dabei die Aufgabe sicherzustellen, daß jeweils die richtigen Summanden verknüpft werden und daß jeweils am Ende der Zahlenreihen, d.h. bei n = 9 abgebrochen wird. Ist die Addition vollendet, so wird in der Zeitsteuerung 45 ein STOP-Signal generiert und die Schaltung passiv geschaltet. Ebenso kann das STOP-Signal als Eingangssignal für eine Synchronisations-Schaltung verwenden werden, indem die Synchronisationslogik anhand dieses Signals erkennen kann, daß die Gesamtfunktion "Addieren" gemäß dem nachfolgend beschriebenen ML1 Programm beendet ist und die MACROs somit durch neue ersetzt werden können (zum Beispiel könnte STOP das Signal Sync5 sein).The addition circuit 40 becomes functional through an address circuit 46 to generate the address data for the addition results to be stored. The address circuit 46 consists of three MACROs 1 (according to 1 ) to form the address data, these MACROs 1 are connected as follows: Via one input in each case the addresses to be linked are supplied to An, Bn, Cn. These addresses are used with the output signals of a counter 47 added and with the state machine 45 linked so that the new destination address is present at the output. The counter 47 and the comparator 48 have the task to ensure that each of the correct summands are linked and that in each case at the end of the series of numbers, ie at n = 9 is canceled. If the addition is completed, then in the time control 45 generates a STOP signal and the circuit is switched passive. Likewise, the STOP signal can be used as an input to a synchronization circuit in that the synchronization logic can recognize from this signal that the total function "add" according to the ML1 program described below is completed and the MACROs can thus be replaced by new (for For example, STOP could be the signal Sync5).

Der Zeitablauf in der Zeitsteuerung 45 (STATEMACHINE) läßt sich dabei wie folgt darstellen, wobei noch anzumerken ist, daß in der Zeitsteuerung 45 eine Verzögerungszeit T (in Form von Taktzyklen) zwischen der Adreßgenerierung und dem Datenerhalt implementiert ist:

– Im Zyklus 1 wird jeweils der Zähler 47 um 1 erhöht und im Komparator 48 wird geprüft, ob n > 9 erreicht ist; syn chron zu diesen Operationen werden die Adressen für A, B, C berechnet;
– im Zyklus (T + 1) werden die Summanden A, B ausgelesen und addiert;
– im Zyklus (T + 2) wird die Summe C abgespeichert.

The timing in the time control 45 (STATEMACHINE) can be represented as follows, where It should be noted that in the timing 45 a delay time T (in the form of clock cycles) is implemented between the address generation and the data retention:

- In the cycle 1 is the counter 47 increased by 1 and in the comparator 48 it is checked whether n> 9 is reached; synchronously with these operations, the addresses for A, B, C are calculated;
- In the cycle (T + 1) the summands A, B are read out and added;
- in the cycle (T + 2) the sum C is stored.

Mit anderen Worten heißt dies, daß die Operationsschleife und die eigentliche Addition gerade (T + 2) Taktzyklen erfordert. Im allgemeinen sind für T 2 ... 3 Takte erforderlich, so daß verglichen mit den herkömmlichen Prozessoren (CPU), die im allgemeinen 50 bis mehrere 100 Taktzyklen bedingen, eine ganz wesentliche Rechenzeit-Reduzierung möglich wird.With other words means this, that the Operation loop and the actual addition of even (T + 2) clock cycles requires. In general, for T 2 ... 3 cycles required, so that compared with the conventional Processors (CPU), which are generally 50 to several 100 clock cycles conditional, a very significant computing time reduction is possible.

Die anhand von 11 aufgezeigte Konfiguration soll im folgenden über eine hypothetische MACRO-Sprache ML1 nochmals erläutert werden:
Es existieren die Zahlenreihen An und Bn
∀ n: 0 <= n <= 9
Es sollen die Summen Ci = Ai + Bi mit I ∈ N gebildet werden.The basis of 11 The configuration shown below will be explained again below using a hypothetical MACRO language ML1:
There are the number series An and Bn
∀ n: 0 <= n <= 9
The sums Ci = Ai + Bi with I ∈ N are to be formed.

Das Timing der Statemachine sieht demnach folgendermaßen aus:
Zyklus Aktivität

1: Zähler erhöhen, Vergleich auf > 9 (ja => Abbruch) und Adressen für A, B, C, berechnen
T + 1: A, B, holen und addieren
T + 2: Nach C speichern

The timing of the state machine looks like this:
Cycle activity

1: Increase counter, compare to> 9 (yes => abort) and calculate addresses for A, B, C
T + 1: A, B, bring and add
T + 2: Save to C

Das heißt – wie bereits erwähnt – die Schleife und die Addition benötigen gerade einmal T + 2 Taktzyklen.The means - as already mentioned - the loop and need the addition just once T + 2 clock cycles.

11b zeigt den groben Aufbau der einzelnen Funktionen (MACROs) in einem DFP. Die MACROs sind in ihrer etwaigen Lage und Größe eingezeichnet und mit den anhand von 11a erläuterten entsprechenden Nummern versehen. 11b shows the rough structure of the individual functions (MACROs) in a DFP. The MACROs are drawn in their possible position and size and with the basis of 11a explained accordingly numbered.

11c zeigt den groben Aufbau der einzelnen Funktionen auf die RAM-Blöcke 1 und 2: Die Summanden werden nacheinander in aufsteigender Reihenfolge aus den RAM-Blöcken 1 und 2 ab Adresse 1000h beziehungsweise 0dfa0h gelesen und in RAM-Block 1 ab Adresse 100ah gespeichert. Zudem sind die Zähler 47 und 49 gegeben, beide zählen während des Ablaufs der Schaltung von 0 bis 9. 11c shows the rough structure of the individual functions on the RAM blocks 1 and 2: The summands are successively read in ascending order from the RAM blocks 1 and 2 from address 1000h or 0dfa0h and stored in RAM block 1 from address 100ah. In addition, the counters 47 and 49 given, both count during the course of the circuit from 0 to 9.

Nach Beendigung des beschriebenen Programms soll ein neues Programm geladen werden, das die Ergebnisse weiterverwertet. Die Umladung soll zur Laufzeit erfolgen. Das Programm ist im Folgenden gegeben:
Es existieren die Zahlenreihen An und Bn, wobei An durch das Ergebnis Cn des vorher ausgeführten Programms gegeben ist:
n: 0 <= n <= 9
Es sollen die Produkte Ci = Ai*Bi mit I ∈ N gebildet werden.After completion of the program described, a new program is to be loaded, which reuses the results. The transhipment should take place at runtime. The program is given below:
There are the number series An and Bn, where An is given by the result Cn of the previously executed program:
n: 0 <= n <= 9
The products Ci = Ai * Bi with I ∈ N are to be formed.

Die Beschreibung der einzelnen Befehle ist bereits bekannt, * symbolisiert die Multiplikation.The Description of each command is already known * symbolizes the multiplication.

Die MACRO-Struktur ist in 18a beschrieben, 18b gibt in bekannter Weise die Lage und Größe der einzelnen MACROs auf dem Chip an, besonders zu beachten ist die Größe des Mulitplizierers 2 in Vergleich zu Addierer 1 aus 11b. In 18c ist erneut die Auswirkung der Funktion auf den Speicher aufgezeigt, Zähler 47 zählt erneut von 0 bis 9, d.h. er wird beim Nachladen der MACROs zurückgesetzt.The MACRO structure is in 18a described, 18b Specifies the location and size of the individual MACROs on the chip in a known manner; the size of the multiplier should be noted in particular 2 in comparison to adders 1 out 11b , In 18c Again, the effect of the function on the memory is shown counter 47 counts again from 0 to 9, ie it is reset when reloading the MACROs.

Besonders zu beachten ist der Zähler 49. Angenommen, das Umladen der MACROs beträgt 10 Taktzyklen. Dann läuft der Zähler 49 von 9 auf 19, da der Baustein dynamisch umgeladen wird, d.h. nur die umzuladenden Teile werden gestoppt, der Rest arbeitet weiter. Das führt nun dazu, daß der Zähler während des Programmablaufs von 19 auf 29 hochläuft. (Hier mit soll das dynamische unabhängige Umladen demonstriert werden, in jedem bisher bekannten Baustein würde der Zähler erneut von 0 auf 9 laufen, da er zurückgesetzt wird).Of particular note is the counter 49 , Suppose the reload of MACROs is 10 clock cycles. Then the counter runs 49 from 9 to 19, because the block is reloaded dynamically, ie only the parts to be reloaded are stopped, the rest continues to work. This will cause the counter to run from 19 to 29 during the program. (Here the dynamic independent reloading is to be demonstrated with, in each module known so far, the counter would again run from 0 to 9 because it is reset).

Bei näherer Betrachtung des Problems stellt sich die Frage, warum nicht beide Operationen, die Addition und die Multiplikation in einem Zyklus durchgeführt werden, also die Operation:
Es existieren die Zahlenreihen An und Bn, wobei An durch das Ergebnis von Cn des vorher ausgeführten Programms gegeben ist:
n: 0 <= n <= 9
Es sollen die Produkte Ci = (Ai + Bi)*Bi mit I ∈ N gebildet werden.

path D definiert einen internen nicht aus den DFP herausgeführten Doppelpfad. Die Operation benötigt wegen einem zusätzlichen 1 einen Taktzyklus mehr aus vorher, ist insgesamt jedoch schneller als die beiden obigen Programme in Folge ausgeführt, da zum einen die Schleife nur einmal durchlaufen wird, zum zweiten nicht umgeladen wird.A closer look at the problem raises the question of why not both operations, addition and multiplication are performed in one cycle, ie the operation:
The number series An and Bn exist, where An is given by the result of Cn of the previously executed program:
n: 0 <= n <= 9
The products Ci = (Ai + Bi) * Bi with I ∈ N are to be formed.

path D defines an internal double path not taken from the DFP. The operation requires one more clock cycle from before because of an extra one, but overall it is faster than the two programs above in sequence because, on the one hand, the loop is passed only once, the second is not reloaded.

Prinzipiell könnte das Programm auch so formuliert werden:

In principle, the program could also be formulated as follows:

Sind die Gatterlaufzeiten des Addierers und des Mulitplizierers zusammen kleiner als ein Taktzyklus, kann die Operation (A + B)*B auch in einem Taktzyklus durchgeführt werden, was zu einer weiteren erheblichen Geschwindigkeitssteigerung führt:

If the gate times of the adder and the multiplier together are less than one clock cycle, the operation (A + B) * B can also be performed in one clock cycle, resulting in a further significant speed increase:

Anhand von 12 soll ein einfaches Beispiel eines Zellenaufbaus erläutert werden. Die Zelle 10 umfaßt zum Beispiel ein UND-Glied 51, ein ODER-Glied 52, ein XOR-Glied 53, einen Inverter 54 sowie eine Registerzelle 55. Die Zelle 10 weist darüberhinaus eingangsseitig zwei Multiplexer 56, 57 mit (den sechzehn Eingängen der Zelle entsprechend 6) zum Beispiel je sechzehn Eingangsanschlüssen IN 1, IN 2 auf. Über diesen (16:1)-Multiplexer 56/57 werden jeweils die den genannten logischen Gliedern UND, ODER, XOR 51...53 zuzuführenden Daten ausgewählt. Diese logischen Glieder sind ausgangsseitig mit einem (3:1)-Multiplexer 58 gekoppelt, der seinerseits mit dem Eingang des Inverters 54, einem Eingang der Registerzelle 55 und einem weiteren (3:16)-Multiplexer 59 gekoppelt ist. Der letztgenannte Multiplexer 59 ist zusätzlich mit dem Ausgang des Inverters 54 und einem Ausgang der Registerzelle 55 verbunden und gibt das Ausgangssignal OUT ab.Based on 12 Let us explain a simple example of cell construction. The cell 10 includes, for example, an AND gate 51 , an OR gate 52 , an XOR member 53 , an inverter 54 and a register cell 55 , The cell 10 also has two multiplexers on the input side 56 . 57 with (corresponding to the sixteen entrances of the cell 6 ), for example, each sixteen input terminals IN 1, IN 2. About this (16: 1) multiplexer 56 / 57 In each case, the said logical elements AND, OR, XOR 51 ... 53 data to be supplied. These logic gates are output with a (3: 1) multiplexer 58 coupled, in turn, with the entrance of the inverter 54 , an input of the register cell 55 and another (3:16) multiplexer 59 is coupled. The latter multiplexer 59 is in addition to the output of the inverter 54 and an output of the register cell 55 connected and outputs the output signal OUT.

Der Vollständigkeit halber sei angemerkt, daß die Registerzelle 55 mit einem Reset-Eingang R und einem Takteingang gekoppelt ist.For completeness, it should be noted that the register cell 55 is coupled to a reset input R and a clock input.

Dem im vorstehenden erläuterten Zellenaufbau, d.h. der Zelle 10 ist nun eine Ladelogik 30 übergeordnet, die mit den Multiplexern 56, 57, 58 und 59 verbunden ist und diese den gewünschten Funktionen entsprechend ansteuert.The above-explained cell structure, ie the cell 10 is now a charging logic 30 superordinate with the multiplexers 56 . 57 . 58 and 59 is connected and this controls the desired functions accordingly.

Sollen zum Beispiel die Signale A2 mit B5 verundet werden, so werden die Multiplexer 56, 57 den Leitungen "ZWEI" beziehungsweise "FÜNF" entsprechend aktiv geschaltet; die Summanden gelangen dann zum UND-Glied 51 und werden bei entsprechender Aktivierung der Multiplexer 58, 59 am Ausgang OUT abgegeben. Soll zum Beispiel eine NAND-Verknüpfung durchgeführt werden, so schaltet der Multiplexer 58 zum Inverter 54 und am Ausgang OUT steht dann das negierte UND-Ergebnis an.For example, if the A2 signals are to be B5 rounded, then the multiplexers will be 56 . 57 the lines "TWO" or "FIVE" accordingly switched active; the summands then go to the AND gate 51 and, with appropriate activation of the multiplexers 58 . 59 delivered at the output OUT. If, for example, an NAND operation is to be performed, then the multiplexer switches 58 to the inverter 54 and at the output OUT then the negated AND result.

Claims

As data flow processor working integrated Circuit having a plurality of partially reconfigurable cells with a reconfigurable at runtime switching logic to execute a Program, being a) the cells for reconfiguration individually and are directly addressable, b) during reconfiguration to Running time the cells to be reconfigured to be stopped while the keep their entire condition and continue working for others c) the cells are assigned a charge logic, via which the cells individually or in cell groups and also the logical connections between the cells can be reconfigured as MACROs, d) the charging logic the sequence control of the program takes over and according to a currently running Program section corresponding MACROs in cells to be reconfigured invites, e) the MACROs are configurable to provide status information to disposal who signal the charge logic that individual MACROs are theirs Have done the job and can be reloaded.

As data flow processor working integrated Circuit according to the preceding claim, characterized that the cells are arranged two-dimensionally.

As data flow processor working integrated Circuit according to one of the preceding claims, characterized that the cells are arranged orthogonal to each other.