Related U.S. patent
This was the part of 09/144,222 the U.S. patent application case application case that continues for sequence number, and it submits United States Patent (USP) trademark office (USPTO) on August 31st, 1998.
Embodiment
This instructions is under the environment that is called as " analog simulator " (" SEmulator ") or " analog simulation " (" SEmulation ") system and by these system description different embodiments of the invention.In the whole instructions, used term " Analog Simulation System ", " analog simulator system ", " analog simulator ", or only be " system ".These terms refer to the different device and the method embodiment that are used for the combination in any of following four kinds of operator schemes according to of the present invention: (1) software simulation, (2) by hardware-accelerated simulation, (3) interior circuit simulation (ICE), and (4) back sunykatuib analysis, comprise their corresponding configuration or pretreatment stages.Other the time, can use term " analog simulation ".New processing described herein represented in this term.
Same, the term finger print as " reconfigurable calculating (RCC) array system " or " rcc computing system " is intended/works in coordination with in the check system comprising primary processor, the part of the software model of software kernel and user's design.Term finger print plan as " reconfigurable hardware array " or " RCC hardware array "/work in coordination with the part of the hardware model that comprises user's design in the check system also refers to comprise the part of reconfigurable array of logic elements in one embodiment.
Also used " user " and user " circuit design " or " Electronic Design " in the instructions." user " is the people who uses Analog Simulation System by its interface, may be the deviser of circuit or participated in some or had neither part nor lot in the test/commissioning staff of design process." circuit design " or " Electronic Design " is the system or the parts of Custom Design, can be software or hardware, and it can simulated the analogue system modelling to realize test/debugging.In many cases, " user " also designed " circuit design " or " Electronic Design ".
Instructions has also used as " lead ", " wire line ", " lead/bus line " and " bus " such term.These terms refer to different conducting wires.Multi-thread between single line that every circuit can be a point-to-point transmission or point.These terms can exchange use, because " lead " can comprise one or more conductor wire, " bus " also can comprise one or more conductor wire.
This instructions launches according to outline.At first, instructions has been introduced the roughly general survey of Analog Simulation System, comprises the general introduction of four kinds of operator schemes and hardware implementations.Secondly, instructions has carried out detailed discussion to Analog Simulation System.In some cases, some figure can show the variant of embodiment shown in the figure of its front.At this moment, use identical sequence number to represent identical components/units/process.The outline of instructions is as follows:
I. general introduction
A. simulation/hardware-accelerated pattern
B. utilize the goal systems mode simulation
C. sunykatuib analysis pattern after
D. hardware implementations
E. emulating server
F. storer simulation
G. collaborative check system
II. system description
III. simulation/hardware-accelerated pattern
IV. utilize the goal systems mode simulation
V. sunykatuib analysis pattern after
VI. hardware implementations
A. general introduction
B. address pointer
C. gate data/clock network analysis
D.FPGA array and control
E. use the alternate embodiment of more intensive fpga chip
The F.TIGF logical unit
VII. emulating server
VIII. storer simulation
IX. collaborative check system
X. example
I. general introduction
Each embodiment of the present invention generally has four kinds of operator schemes: (1) software simulation, (2) be by hardware-accelerated simulation, (3) interior circuit simulation (ICE), and (4) back sunykatuib analysis.The different embodiment that comprise the system and method for these patterns have some in the following feature at least:
(1) have a software and hardware model of a single close-coupled simulation engine, a software kernel, a circulation connects a cycle control software and hardware model; (2) the self-winding parts type analysis in the process of compilation process is used for the generation and the subregion of software and hardware model; (3) have in the software simulation pattern, by hardware-accelerated pattern simulation, interior circuit simulation pattern, and the ability of conversion between the back sunykatuib analysis pattern (circulation connects a circulation); (4) see through the complete hardware model visibility that the combination of software parts are rebuild; (5) has the double buffering clock models of software clock and gated clock/data logic, to avoid the race state; And (6) any selected element from past simulation process is simulated again or with the ability of hardware-accelerated subscriber's line circuit design.Net result is quick flexibly simulator/emulator systems and the method with complete HDL function and emulator execution performance.
A. simulation/hardware-accelerated pattern
The analog simulator system can carry out modelling with software and hardware with user's custom circuit design by the self-winding parts type analysis.Software modularity is used in whole subscriber's line circuit design, and estimation components (being memory component, combiner) is then used hardware modeling.The unit type analysis helps to carry out hardware modeling.
The software kernel that is present in the general processor main system memory serves as the master routine of analog simulator system, and master routine is responsible for controlling the overall operation and the execution of its different mode and function.As long as there is any test platform program to activate, kernel is the test platform component of estimation activation just, and the estimation clock-pulse component detects the clock edge to upgrade RS and to transmit combinational logic data and propulsion module pseudotime.This software kernel provides hardware acceleration engine for the close-coupled characteristic of simulator engine.For the software/hardware border, the analog simulator system provides a large amount of input/output address space one REG (register), CLK (software clock), S2H (software is to hardware), and H2S (hardware is to software).
Analog simulator has the ability of selecting conversion between four kinds of operator schemes.The user of system can begin simulation, stop simulation, assert input value, check the value, carry out by the circulation single step, and go back to or turn to four different patterns.For example, system can be in a period of time in software mimic channel, quicken simulation by hardware model, and then return the software simulation pattern.
Usually, Analog Simulation System provides the ability that can " see " each model element for the user, no matter its be software or in hardware modelling.Owing to multiple reason, combiner is unlike register " as seen ", and therefore, it is very difficult obtaining the combiner data.Reason be used for reconfigurable circuit board with the modeled FPGA of hardware components (but field programmable gate array) of subscriber's line circuit design generally combiner is modeled as look-up table (look-up table, LUT), rather than actual combiner.Therefore, Analog Simulation System reads the value in the register and regenerates combiner.Because need some expenses to regenerate combiner, so this regeneration process is not always being carried out; But only when needing, the user carries out.
Because software kernel is present in the software end, so the generation that provides clock edge testing mechanism to cause so-called software clock, this software clock will start input and deliver in each register of hardware model.By the strict control timing of double buffering circuit arrangement, the software clock enabling signal was entered in the register model before data.In case the data stabilization of these input register models is got off, the synchronous gate data of software clock do not have the danger that any generation holding time is upset to guarantee the data value that common gate is all.
Software simulation is also very fast, because therefore all input value and the selected register value/states of system log (SYSLOG) have minimized expense by the quantity that reduces the I/O operation.The user can select recording frequency selectively.
B. utilize the goal systems mode simulation
Analog Simulation System can be in its goal systems environment emulation user's circuit.Goal systems is used for estimation to the hardware model output data, and hardware model is also to the goal systems output data.In addition, software kernel is controlled the operation of this pattern, makes that the user still can select to begin, stops, the value of asserting, check the value, single step carry out, and from the conversion of a pattern to another.
C. sunykatuib analysis pattern after
Job record provides the historical record of simulation process for the user.Be different from known simulation system, Analog Simulation System does not write down each monodrome, the internal state in the simulation process, or value changes.Analog Simulation System is only based on selected value and the state of recording frequency (that is 1 record of every N periodic recording) record.In the back dummy run phase, if the user need check a plurality of data around the simulation process mid point X that just finishes, then the user forwards closest approach X earlier to and temporarily is positioned on the measuring point before the X, for example measuring point Y.Then, the user simulates to obtain analog result from selected measuring point Y to impact point X.
Also the VCD random selection system will be described.This VCD random selection system allows the user to observe any simulated target scope (that is, simulated time) as required, and does not need to simulate again.
D. hardware implementations
Analog Simulation System realizes the fpga chip array on reconfigurable circuit board.Based on hardware model, Analog Simulation System with each selected part subregion of subscriber's line circuit design, shine upon, deposit, and route is routed on the fpga chip.Therefore, for example 4 * 4 arrays with 16 chips can be with the large scale circuit modelling that is deployed on these 16 chips.This interconnect scheme makes each chip conduct interviews to another chip within can or linking 2 times " jump ".
Each fpga chip be each input/output address space (that is, and REG, CLK, S2H H2S) provides an address pointer.Combination about all address pointers of specific address space all is linked at together.So, in data transmission procedure, main FPGA bus and pci bus are sequentially selected/be selected into to the digital data in each chip, for one next word of the selected address space in each chip, and next chip is till the required digital data that has access to corresponding to selected address space.Utilize a transmission word select to select the select progressively that signal is finished digital data.This word select is selected signal and is passed an address pointer in the chip, and then is delivered to the address pointer in the next chip, and chip or system that this process continues to the last carry out initialization to address pointer.
Bandwidth when the FPGA bus system in reconfigurable circuit board is worked is the twice of pci bus, but speed only is pci bus half.Therefore, fpga chip is divided into some groups to utilize the bus of bigger bandwidth.The processing power of this FPGA bus system can be comparable to the processing power of pci bus system, so do not lose performance because of the reduction of bus speed.Can extend group length by the on-board circuitry plate and realize expansion.
In another embodiment of the present invention, use more intensive fpga chip.A kind of more intensive chip is Altera 10K130V and 10K250V chip.The use of this chip has changed the design of circuit board, makes only to use four fpga chips on each circuit board, rather than eight not intensive fpga chips (as Altera 10K100).
FPGA array in the simulation system is manufactured on the mainboard by a special board interconnection structure.Each chip can have nearly 8 groups of interconnection, wherein be interconnected on the single circuit board and between the different plates according to the interconnection of the direct neighbor of adjacency (promptly, N[73:0], W[73:0], E[73:0]), and " single-hop " (one-hop) adjacent interconnection arrange (promptly, NH[27:0], SH[27:0], XH[36:0], XH[72:37]), do not comprise that local bus connects.Each chip can be directly be connected with adjacent neighbours' chip, or is connected with the non-adjacent chip that is positioned at its upper and lower, left and right in single-hop.(Dong-Xi), array is an annular at directions X.In Y direction (North-south), array is latticed.
Interconnection can connect logical unit and miscellaneous part separately on single circuit board.But mother daughter board connector can connect these plates and interconnection between various boards, so that (1) between pci bus and array board via mainboard, and (2) transmit signal between any two array boards.
A motherboard connector is connected circuit board with mainboard, and therefore with pci bus, power supply, be connected with earth potential.For some circuit boards, motherboard connector is not used in the direct of mainboard and is connected.In one six board structure of circuit, only plate 1,3 directly is connected with mainboard with 5, and plate 2,4 and 6 relies on its adjacent panels to realize and being connected of mainboard simultaneously.Therefore, just have one directly to be connected, and the interconnection of these plates and local bus interconnect to the mother daughter board connector of component side by being arranged in face of weld with mainboard every a plate.Pci signal only sends by a plate (being generally first circuit board).The motherboard connector that power supply and earth potential are connected to other is used for those circuit boards.The pci bus parts have been realized at face of weld to each mother daughter board connector between the component side, communicating by letter between fpga logic device, storage arrangement and each simulation system control circuit.
E. emulating server
In another embodiment of the present invention, has emulating server to allow the identical reconfigurable hardware cell of a plurality of user captures.In a system architecture, a plurality of users/processing in a plurality of workstations in the network or the non-network environment can be visited the identical reconfigurable hardware unit based on server, so that identical or different subscriber's line circuit design is checked/debugged.Visit is finished by time-division processing, and one of them scheduler program is determined a plurality of users' access privileges, exchanging operation, and between the predesignated subscriber, optionally pin the hardware model visit.Under a kind of sight, each user can be mapped to reconfigurable hardware model for the first time so that his/her isolated user is designed by access server, system compiles to generate the hardware and software model design in the case, the execution operation of trooping, carry out layout and wiring operations, generate the bit stream configuration file, and in reconfigurable hardware cell, reconfigure fpga chip with hardware components modelling with user's design.When a user uses hardware model to quicken its design and hardware state downloaded in his storer for software simulation, can discharge hardware cell for another user capture.
Server provides visit to reconfigurable hardware cell to a plurality of users or processing, with the purpose that realizes quickening and hardware state exchanges.Emulating server comprises scheduler program, one or more device drivers, and reconfigurable hardware cell.Scheduler program in the emulating server is based on preferential round-robin algorithm.The server scheduling program comprises a simulation job queue table, a priority classification device, and an operation exchanger.Recovery of the present invention and playback function make non-network multiprocessing environment and network multi-user environment convenient more, wherein can download the status data of previous checkpoint, and can recover whole emulation mode, be used for the playback debugging or connect a stepping circularly by a circulation about this checkpoint.
F. storer simulation
Storer of the present invention simulation or memory transactions mode provide the effective way of a kind of simulation system management about the different memory areas of the model of configure hardware of user's design, and hardware model is stylized in the fpga chip array on the reconfigurable hardware unit.Storer analog form of the present invention provides a kind of structure and scheme, wherein designs in the SRAM storage arrangement that relevant a large amount of memory blocks are mapped to simulation system with the user, rather than is used for disposing in the logical unit with modelling user design.The storer simulation system comprises a memory state machine, an estimated state machine, with and relevant logical circuit, be used for controlling and being connected: (1) host computer system and its associative memory system, (2) the SRAM storage arrangement that is connected with the FPGA bus in the simulation system, and (3) fpga logic device, it comprises the user who disposes and the stylize design of debugging.According to one embodiment of present invention, the operation of storer simulation system is as follows usually.Simulation Writing/Reading circulation is divided into three cycles---DMA (direct memory access (DMA)) data transmission, estimation, and storage access.
The fpga logic device end of storer simulation system comprises an estimated state machine, a FPGA bus driver, and logic interfacing, be used for each memory block N and design being connected of user self memory interface with the user, to handle: the data estimation between (1) fpga logic device, and the Writing/Reading storage access between (2) fpga logic device and SRAM storage arrangement.Together with the fpga logic device end, FPGA i/o controller end comprises a memory state machine and interface logic, handling (1) host computer system and SRAM storage arrangement, and the DMA (direct memory access (DMA)) between (2) fpga logic device and the SRAM storage arrangement, write and read operation.
G. collaborative check system
One embodiment of the present of invention are collaborative check system, and it comprises a reconfigurable computing system (being called " rcc computing system " hereinafter) and a reconfigurable computing hardware array (being called " RCC hardware array " hereinafter).They in certain embodiments, do not need goal systems and outside input-output apparatus, because can be simulated among software.In further embodiments, in fact goal systems is connected with acquisition speed with outside input-output apparatus and uses real data with collaborative check system, rather than the analog testing platform data.Therefore, collaborative check system can comprise rcc computing system and RCC hardware array and in conjunction with other function, with when using actual goal systems and/or input-output apparatus, and the software section and the hardware components of debugging user design.
Rcc computing system also comprises clocked logic (being used for detection of clock edge and software clock generates), the test platform program that is used for test subscriber's design, and device model, it is used for any input-output apparatus that user's decision is simulated at software, to replace using real physics input-output apparatus.Certainly, the user can determine to use the input-output apparatus of real input-output apparatus and simulation in a debug procedures.Software clock is provided for external interface as the external clock pulse source that is used for goal systems and outside input-output apparatus.The use of this software clock realized handling the input and output data essential synchronously.Because the software clock that rcc computing system generates is the time base of debug procedures, thus simulation and hardware-accelerated data with working in coordination with any data sync of transmitting between check system and external interface.
When goal systems is connected with collaborative check system with outside input-output apparatus, must between collaborative check system and its external interface, provide the output connecting pin data.Collaborative check system comprises a steering logic, and it provides: (1) rcc computing system and RCC hardware array, and the flow control between (2) external interface (it is connected with goal systems and outside input-output apparatus) and the RCC hardware array.Because rcc computing system has the model of whole design in software, comprise the part of designing a model of user in RCC hardware array, so rcc computing system also must have the inlet that leads to through all data between external interface and the RCC hardware array.Steering logic has guaranteed that rcc computing system has the inlet of these data.
II. system description
Fig. 1 has shown the high-level overview figure of one embodiment of the invention.Workstation1 0 is connected with emulation interface 30 with reconfigurable hardware model 20 by pci bus system 50.Reconfigurable hardware model 20 is connected with emulation interface 30 by pci bus 50 and cable 61.Goal systems 40 is connected with emulation interface 30 by cable 60.In further embodiments, when need be under the goal systems environment in specific test/debug procedures during the design of emulation subscriber's line circuit, in this device, then do not have the interior circuit simulation apparatus 70 (as shown in frame of broken lines) that comprises emulation interface 30 and goal systems 40.Do not have interior circuit simulation apparatus 70, reconfigurable hardware model 20 communicates by pci bus 50 and workstation1 0.
Circuit simulation apparatus 70 in collaborative, reconfigurable hardware model 20 simulates or imitates the circuit design of some electronic sub-systems of user in goal systems.In order to ensure under the goal systems environment to the proper operation of user's electronic sub-system circuit design, must offer reconfigurable hardware model 20 at the input and output signal between goal systems 40 and simulation electronic subsystem for estimation.Therefore, goal systems 40 is transmitted by way of emulation interface 30 and pci bus 50 by cable 60 for the input and output signal of reconfigurable hardware model 20.Perhaps, the input/output signal of goal systems 40 can transfer to reconfigurable hardware model 20 by emulation interface 30 and cable 61.
Control data and some independently simulated data are transmitted between reconfigurable hardware model 20 and workstation1 0 by pci bus 50.In fact, the software kernel of the whole Analog Simulation System operation of workstation1 0 operation control, and must have interface (read/write) with reconfigurable hardware model 20.
Workstation1 0 comprises computing machine, keyboard, and mouse, display and suitable bus/network interface make the user can enter and revise the data of describing the electronic system circuitry design.The workstation example comprises the SPARC of Sun Microsystems company or ULTRA-SPARC workstation or based on the computer installation of Intel/Microsoft.Known to the technical staff in the technical field, workstation1 0 comprises a CPU11,12, one main frames of a local bus/13, memory buss 14 of PCI bridge, and primary memory 15.Have the software simulation relevant, hardware-accelerated simulation, interior circuit simulation in the workstation1 0, reconfigurable hardware model 20 and emulation interface 30 with the present invention, and the various forms of back sunykatuib analysis.Being embodied in algorithm in the software is stored in the primary memory 15 in a test/debug procedures and the operating system of utilizing CPU11 to pass through workstation is carried out.
Known to the technical staff in the technical field, after in the storer that operating system is written into workstation1 0 by the startup firmware, control system forwards its initialization codes to set up the data necessary structure, is written into and the initialization apparatus driver.Then control system forwards command line interpreter (CLI) to, and CLI points out the user to point out the program that will move.Next operating system determine the required amount of memory of working procedure, positioning memory block, or allocate memory block and directly or by BIOS (Basic Input or Output System (BIOS)) reference-to storage.After finishing storer and being written into process, the beginning executive utility.
One embodiment of the present of invention are a kind of specific analog simulation application programs.In its implementation, application program needs operating system that multiple service is provided, and includes but not limited to reading and writing, execution data communication, and connection display/keyboard/mouse disk file.
Workstation1 0 has the appropriate users interface, and to allow user's input circuit design data, editor's circuit design data monitors simulation and simulation process, obtains the result simultaneously, and controls simulation and simulation process in essence.Although do not show among Fig. 1, user interface comprises the menu-driven options and the command set that can utilize keyboard and mouse input and be presented at the user-accessible on the display.The user uses the computer installation 80 with keyboard 90 usually.
The user sets up the specific circuit design of electronic system and usually with HDL (hardware description language) (being generally structural rtl) the coding key input service station 10 of its designed system.Analog Simulation System of the present invention is the execution unit type analysis in other operation, is used to divide the modelling between the hardware and software.Analog Simulation System software modularity behavioral scaling, RTL level and gate leve coding.For hardware modeling, system can modelling RTL level and the gate leve coding; But the design of RTL level must be synthetic with the gate leve design before hardware simulation.The gate leve coding can directly be processed into available source design data library format, is used for hardware simulation.Use RTL level and gate leve coding, system automatically performs the unit type analysis to finish partiting step.Based on occurring in the division analysis of software translating in the time, system is mapped into hardware to simulate fast by hardware-accelerated with the some parts of circuit design.The user also can link to each other modeled circuit design to carry out the interior circuit simulation under the true environment with goal systems.Because software simulation and hardware acceleration engine close-coupled, so pass through software kernel, the user can use the design of software simulation simulation entire circuit, come accelerated test/debug process by the hardware model that uses institute's mapping circuit design, return the simulation part, and return and hardware-acceleratedly finish up to test/debug process.By each cycle period pattern and by user intention software simulation and hardware-accelerated between conversion be one of valuable characteristics of present embodiment.These characteristics are particularly useful in debug process, and it makes the user can use hardware-accelerated pattern to enter specific point or cycle fast, and then use software simulation to detect thereafter a plurality of and design with debug circuit.In addition, Analog Simulation System makes the user can see all parts, and no matter the inside of parts realizes being with hardware or using software.Analog Simulation System is by reading register value and when then this read when customer requirements, the use software model rebulid combiner and finishes this process from hardware model.These and other characteristics will be hereinafter be discussed more fully.
Workstation1 0 links to each other with bus system 50.Bus system can be any available bus system, and it makes different subjects, and for example workstation1 0, and reconfigurable hardware model 20 is realized exercisable the connection with emulation interface 30.Bus system thinks soon enough that preferably the user provides in real time or approaching real-time result.A kind of this type of bus system is the bus system described in Peripheral Component Interconnect (PCI) standard, and its content is incorporated this paper by reference into.At present, 2.0 of the PCI standard editions bus speeds that 33MHz is provided.2.1 version provides the support to the 66MHz bus speed.Thereby, workstation1 0, reconfigurable hardware model 20 and emulation interface 30 will be followed the PCI standard.
In one embodiment, in the communication between work of treatment station 10 and the reconfigurable hardware model 20 on the pci bus.In this bus system, also can find other PCI suitable device.These equipment can with workstation1 0, reconfigurable hardware model 20 is connected identical or different grade with emulation interface 30 and is connected to pci bus.The pci bus of each different brackets, for example pci bus 52, are connected to the pci bus of PCI bridge 51 with other grade by PCI, for example pci bus 50 (if present).On pci bus 52, may be connected with two PCI equipment 53 and 54.
Reconfigurable hardware model 20 comprises field programmable gate array (FPGA) chip array, and it can be disposed and reconfigure the hardware components with the analog subscriber electronic system design by stylizing.In this embodiment, hardware model is reconfigurable; That is to say of specific calculation or the subscriber's line circuit design of its reconfigurable its hardware to be fit to be about to begin.For example, many if desired totalizers and multiplier, then system just disposes many totalizers and multiplier.When needs other computing unit or during function, they also will be simulated or be formed among the system.Like this, can optimization system to carry out special calculating or logical operation.Reconfigurable system also has dirigibility simultaneously, makes the user make, the less hardware fault that runs in test or the use.In one embodiment, reconfigurable hardware model 20 comprises a computing element dyadic array of being made up of fpga chip, to provide computational resource to different user circuit design and application.More detailed hardware configuration process hereinafter will be discussed.
Two kinds of these type of fpga chips comprise the chip that Altera and Xilinx company are sold.In certain embodiments, by using field-programmable equipment that reconfigurable hardware model is reconfigured.But other embodiment of the present invention may realize by using special IC (ASIC) technology.Can be the form of custom layout at some other embodiment.
Under typical test/debugging enironment,, make and before real original shape manufacturing, can carry out suitable change using reconfigurable equipment simulating/emulation user's circuit design.But, under some other situation, can use actual ASIC or custom layout, even now has been deprived the possible NOT-function circuit design of the change of user's fast, economical with the simulation and the ability of emulation again again.Yet sometimes, this type of ASIC or custom layout have been made and have been finished and obtain easily, and the emulation that the chip of reconfigurable combination is carried out is more desirable.
According to the present invention, the software in the workstation, in conjunction with its external hardware model, being in a ratio of the terminal user with existing system provides degree higher dirigibility, controlled and performance.In order to move simulation and emulation, with determine the model of circuit design and correlation parameter (as, the excitation of input test platform, total system output, intermediate result) and offer the simulation softward system.The user can obtain instrument or synthetics define system circuit design by use principle figure.The user is beginning with the circuit design (being generally the form of rough schematic) of electronic system, then uses synthetics to change it into HDL (hardware description language) form.Also can directly write HDL by the user.HDL language as example comprises Verilog and VHDL (VHSIC hardware description language); But, also can use other language.The circuit design of representing with HDL comprises many concurrent section.Each parts is a coded sequence, and the execution of simulation has been controlled in its behavior that has promptly defined circuit component again.
Analog Simulation System is analyzed these parts with definite its unit type, and program compiler utilizes these unit type information to set up execution patterns different in the software and hardware.After this, the user can use Analog Simulation System of the present invention.The deviser can be by using a plurality of excitations to analogy model, for example the simulation of input signal and the test vector pattern accuracy of coming proof scheme.If not running according to plan of circuit in simulation process, then the user redefines circuit by revising circuit diagram or hdl file.
Process flow diagram among Fig. 2 has shown the application of this embodiment of the invention.Algorithm starts from step 100.After with the hdl file loading system, system compiles circuit design, divides and be mapped to suitable hardware model.To go through compiling hereinafter, divide and mapping step.
Before the operation simulation, system must move a homing sequence, to remove all the unknown " x " values in the software before hardware-accelerated model works.One embodiment of the present of invention use one 2 bit wide data paths to provide that 4 state values of bus signals---" 00 " is logic low, and " 01 " is logic high, and " 10 " are " z ", and " 11 " are " x ".Known to the person of ordinary skill in the field, software model can be handled " 0 ", " 1 ", " x " (bus conflict or unknown-value) and " z " (no driver or high impedance).Comparatively speaking, hardware can't be handled unknown-value " x ", so according to specific suitable coding and different homing sequences resets register value and helps " 0 " or complete " 1 ".
In step 105, the user determines whether Analog Circuit Design.Usually, the user at first begins start-up system with software simulation.Therefore, if the decision of step 105 is a "Yes", then at step 110 beginning software simulation.
The user can stop simulation and check the value (shown in step 115).In fact, the user can stop simulation any time in test/debug procedures, shown in the dotted line that launches from step 115, goes to hardware-accelerated pattern, each node in ICE pattern and the back simulation model.Execution in step 115 has been taken the user to step 160.
After stopping, system kernel reads back the state of hardware register parts regenerating the whole software model, if user expectation inspection combination component values then also comprises combiner.After recovering the whole software model, any signal value of user in can check system.After stopping and checking, the user can continue only moving in simulation model or the hardware-accelerated pattern.As shown in process flow diagram, step 115 forwards to and stops/the value scrutiny program.Stop/the value scrutiny program starts from step 160.In step 165, the user must determine whether to stop simulation and check the value at this point.If the result of step 165 is a "Yes", then step 170 stops the ongoing simulation of current possibility and checks the correctness of each value with the checking circuit design.In step 175, algorithm returns take-off point, i.e. step 115.At this, the user can continue simulation and stop for remaining test/debug procedures/check the value or advance to interior circuit simulation step.
Same, if the result of step 105 is a "No", then algorithm will continue hardware-accelerated determining step 120.In step 120, the user determines whether to accelerate test/debug process by the hardware components acceleration simulation of modelling circuit design.If the result of step 120 is a "Yes", quicken at step 125 beginning hardware model so.In system's compilation process, Analog Simulation System is mapped to some parts in the hardware model.At this, when needs were hardware-accelerated, system moved on to register and combiner in the hardware model and will import with estimated value and moves on in the hardware model.Therefore, in hardware-accelerated process, estimation occurs in the hardware model over a long time with the speed that increases.Kernel writes hardware model with test platform output, and the update software clock then connects a circulation pattern by a circulation and reads the hardware model output valve.If the user needs, can regenerate combiner with output register value and combiner by utilizing register value, make from the value of the whole software model of subscriber's line circuit design (entire circuit design) available.Regenerate these combiners because need software to get involved, so be not the output that the whole software model value all is provided at each cycle period; But only when needing, the user provides these values.This instructions will discussed the process that regenerates of combiner thereafter.
In addition, shown in step 115, the user can stop hardware-accelerated pattern at any time.If the user wants to stop, algorithm enters step 115 and 160 to be stopped forwarding to/the value scrutiny program.At this, in step 115, the user can stop hardware-accelerated simulation process at any time and check the end value of simulation process, and perhaps the user can continue hardware-accelerated simulation process.Stop/the value scrutiny program forwards step 160 to, and 165,170 and 175, its existing hereinbefore introduction.Return master routine after step 125, the user can determine whether to continue hardware-accelerated simulation or carry out pure simulation in step 135.If the user wants further simulation, then algorithm enters step 105.If not, then algorithm enters the back sunykatuib analysis of step 140.
In step 140, Analog Simulation System provides a plurality of back sunykatuib analysis features.All inputs of system log (SYSLOG) to hardware model.For hardware model output, system is with all values of user-defined recording frequency (as, 1/10,000 record/cycle) record hardware register parts.Recording frequency has determined the frequency that output valve is recorded.For the recording frequency in 1/10,000 record/cycle, output valve of per 10,000 periodic recordings.Recording frequency is high more, and the information of noting that is used for the back sunykatuib analysis is also many more.Because selected recording frequency and analog simulation speed have cause-effect relationship, so the user should carefully select recording frequency.Higher recording frequency will lower analog simulation speed because before carrying out further simulation, system must spended time and resource write down output valve by storer execution I/O is operated.
About the back sunykatuib analysis, the user will select the specified point of expectation simulation.The user can then will write down the internal state of input hardware model with calculated value variation and all hardware parts behind analog simulation then, simulate by operating software and analyze.The data that should note having used hardware accelerator to simulate selected measuring point are with the analysis mode result.This back analog analysing method could be attached to any analog waveform visualizer to be used for the back sunykatuib analysis.Subsequently more detailed discussion will be arranged.
In step 145, the user can be chosen in the circuit design of analogue simulation in its goal systems environment.If the result of step 145 is a "No", algorithm stops and the analog simulation process ends at step 155.If expectation combining target system carries out emulation, then algorithm enters step 150.This step relates to activation emulation interface plate, cable and chip pin joint are inserted goal systems, and the operational objective system is to obtain the system's I/O from goal systems.Comprise the signal between the emulation of goal systems and circuit design from system's I/O of goal systems.The circuit design of emulation is handled these signals from the goal systems receiving inputted signal, send signal to Analog Simulation System and be used for further processing, and the signal after will handling is exported to goal systems.On the other hand, the circuit design of emulation sends output signal to goal systems, and goal systems is handled signal, and the signal after handling may be exported back the circuit design of emulation.By this method, performance that can evaluation circuits design under its natural goal systems environment.After the combining target system carried out emulation, the result that the user has can design or disclose its NOT-function feature by proof scheme.At this moment, shown in step 135, the user can simulate/emulation once more, all stops with the improvement circuit design, or carries out the manufacturing of integrated circuit based on the circuit design of checking.
III. simulation/hardware-accelerated pattern
Fig. 3 shown according to one embodiment of the invention, in the compilation time and the high-level diagram of software translating and hardware configuration in working time.Fig. 3 has shown two groups of information: one group of data separation in the operation of compilation time and simulation/carry out in the simulation run time; Another group information has shown the division between software model and the hardware model.During beginning, Analog Simulation System needs the subscriber's line circuit design as input data 200 according to an embodiment of the invention.Subscriber's line circuit is designed to the form of certain hdl file (as Verilog, VHDL).Analog Simulation System is analyzed hdl file, and with the behavioral scaling sign indicating number, register transfer sign indicating number and gate leve coding reduce can be for the form of Analog Simulation System use.System generates the source design database and is used for front-end processing step 205.Now, the hdl file after the processing can use for Analog Simulation System.Known to the person of ordinary skill in the field, analytic process converts ascii data to the intrinsic BINARY data structure.See also ALFRED V.AHO, the COMPILERS:PRINCIPLES of RAVI SETHI and JEFFREYD.ULLMAN, TECHNIQUES, ANDTOOLS (1988), its content is incorporated this paper by reference into.
Represented by process/unit 230 by working time by process 225 expressions for compilation time.Shown in process 225, in compilation time, Analog Simulation System is by the hdl file after the execution unit type analysis process of compilation.The unit type analysis is divided into combiner with the HDL parts, register parts, clock unit, memory member and test platform parts.In fact, system is divided into control and estimation components with the subscriber's line circuit design.
Analog simulation program compiler 210 is mapped to the control assembly of simulation in the software in fact, and estimation components is mapped in the software and hardware.Program compiler 210 generates the software model that is used for all HDL parts.Software model is formed in the coding 215.In addition, analog simulation program compiler 210 is used the unit type information of hdl file, selects or generate hardware logic blocks/element from routine library or module generator, and generates the hardware model that is used for specific HDL parts.Net result is so-called " bit stream " configuration file 220.
In the preparatory stage of working time, the software model of coding form is stored in the primary memory, and the application program relevant with analog simulation program according to an embodiment of the invention also is stored in the primary memory.In general processor or workstation 240, handle this coding.Basically meanwhile, the configuration file 220 that is used for hardware model is used to the subscriber's line circuit design map to reconfigurable hardware plate 250.At this, circuit design in hardware modeled part mapped and be assigned in the fpga chip in the reconfigurable hardware plate 250.
As mentioned above, to the excitation of general processor or workstation 240 user application test platforms and test vector data and other test platform resources 235 to be used to simulate purpose.In addition, the user can be by the emulation of software control executive circuit design.Reconfigurable hardware plate 250 comprises user's artificial circuit design.This Analog Simulation System makes optionally conversion between software simulation and simulation hardware of user, and connects a circulation pattern by a circulation and stop simulation or simulation process at any time, to check the value of each parts (register or combiner) in the model.Therefore, Analog Simulation System transmits data and is used for simulation between test platform 235 and processor/workstation 240, transmit data by data bus 245 and processor/workstation 240 and be used for emulation between test platform 235 and reconfigurable hardware plate 250.If comprise an ownership goal system 260, then emulated data can transmit between reconfigurable hardware plate 250 and goal systems 260 by emulation interface 255 and data bus 245.Kernel is present in the software simulation model of storer of processor/workstation 240, so data must transmit between processor/workstation 240 and reconfigurable hardware plate 250 by data bus 245.
Fig. 4 has shown compilation process process flow diagram according to an embodiment of the invention.Compilation process among Fig. 3 is by process 205 and 210 expressions.Compilation process among Fig. 4 starts from step 300.Step 301 is handled front-end information.At this, generate gate leve HDL coding.The user is by direct hand-written coding or use the schematic diagram of some forms to obtain or synthetics generates gate leve HDL coding so that the initial circuit design is converted into the HDL form.Analog Simulation System resolves to binary format with hdl file (ASCII fromat), thereby with the behavioral scaling coding, register transfer (RTL) level coding and gate leve coding reduce can be for the internal data structure form of Analog Simulation System use.System generates and comprises the source design database of resolving back HDL coding.
Step 302 is by being divided into the HDL parts combiner shown in unit type resource 303, register parts, clock unit, memory member and test platform parts execution unit type analysis.Analog Simulation System generates and is used for the hardware model of register and combiner, and follows some exceptions, will discuss hereinafter.Test platform and memory member are mapped in the software.Some clock units (as, the clock of derivation) are simulated in hardware, and other then are positioned at software/hardware boundary (as, software clock).
Combiner is the stateless logical block, and its output valve is the function of current input value and the history that does not rely on input value.The example of combiner comprise basic gate circuit (as, with, or, XOR, non-), selector switch, totalizer, multiplier, shift unit, and bus driver.
The register parts are simple memory unit.The state-transition of register is controlled by clock signal.A kind of form of register is the edge-triggered formula, and its generation state changes when detecting the edge.The form of another kind of register is the latch form, and it is a level triggers.Demonstration example comprises trigger (D-type, JK-type) and level-sensitive latch.
Clock unit is for to send periodic signal to logical unit to control the parts of its behavior.Usually, the renewal of clock signal control register.Major clock is generated by the test platform program from sequential.For example, it is as follows to be used to generate the typical test platform Verilog program of clock:
always?begin
Clock=0;
#5;
Clock=1;
#5;
end;
According to this coding, clock signal is initiated at logical zero.After 5 chronomeres, clock signal becomes logical one.After 5 chronomeres, clock signal is returned logical zero again.Master clock signal generally generates with software and minority (that is, 1-10) major clock is only arranged in typical user's circuit design.From the network of the combinational logic that drives successively by major clock and register, generate derive from or gated clock.The clock that in typical user's circuit design, has many (that is, 1000 or more) to derive from.
Memory member is the piece memory unit, and it has address and control line with the independent data in the visit particular memory location.Example comprises ROM (ROM (read-only memory)), asynchronous RAM (random memory), and synchronous random access memory.
The test platform parts are the software processes that is used to control and monitor simulation process.Therefore, these parts are not the parts of hardware circuit design in test.The test platform parts pass through to generate clock signal, the initialization simulated data, and from disk/storer, read simulation test vector pattern and control simulation.The test platform parts are the variation by check the value also, carries out the variate dump, checks that signal value closes the constraint of asserting of fastening, and output test vector is write disk/storer, and is connected with debugged program with different waveform viewer and monitors simulation process.
Analog Simulation System follows these steps to the execution unit type analysis.Systems inspection binary source design database.Based on the source design database, system can be classified as element a kind of of above-mentioned unit type.Assignment statement is classified as combiner continuously.Door primitive (gate primitives) can be the latch form of composite type or register type according to language definition.Initialization codes is counted as the test platform of initialization type.
Do not use network and drive the test platform of the processing all the time (always process) of network as type of driver.Do not drive network and read the test platform that is treated to the display monitor central monitoring system type all the time of network.Has the test platform that is treated to universal class all the time that postpones control or the control of multiple incident.
Have single incident control and drive handling all the time of single network and can be a kind of in following: (1) if incident is controlled to be the edge-triggered incident, this program is edge-triggered type register parts so.(2) if driven network is undefined in all possible execution path in a program, this network is the latch type of register so.(3) if driven network is defined in all possible execution path in a program, this network is a combiner so.
Have single incident control but drive handling all the time of a plurality of networks and can resolve into the program of each network of individual drive to obtain its corresponding unit type respectively.Can use decomposable process to determine unit type.
Step 304 generates the software model that is used for all HDL parts, does not consider its unit type.By the appropriate users interface, the user can use complete software model simulation entire circuit design.The use test platform program drives the excitation input, and the test vector pattern is controlled total n-body simulation n, and monitors simulation process.
Step 305 is carried out clock analysis.Clock analysis comprises two general steps: (1) extracts Clock Extraction and order mapping, and (2) clock network analysis.Extract Clock Extraction and comprise that with the order mapping step register parts with the user are mapped in the hardware register model of Analog Simulation System, and then from system hardware register parts, extract clock signal.The clock network analytical procedure comprises the clock of determining major clock and derivation based on the clock signal of extracting, and separates gate clock network and gate data network.To be described in detail in conjunction with Figure 16.
Step 306 is carried out to be detained and is selected (residence selection).Combine with the user, system is the hardware model alternative pack; That is to say that in all the possible hardware componenies that can realize, some hardware componenies are owing to multiple reason will can not simulated in hardware in the hardware model of subscriber's line circuit design.These reasons comprise unit type, hardware resource limitations (promptly, rest on floating-point operation and large-scale multiplying in the software), the simulation and communicate by letter spending (promptly, rest on the small-sized bridge logic between the test platform program in the software, and rest in the software signal by the test platform sequential monitoring), and user's preferences.Owing to the numerous reasons that comprise performance and analog monitoring, the user can force the specific features that should simulate in hardware to be stayed in the software.
Step 307 is mapped to selected hardware model on the reconfigurable simulation hardware plate.Specifically, step 307 is obtained wire list and circuit design is mapped in the specific fpga chip.This step comprises the logic element grouping or troops.System is fitted on each component one unique fpga chip then and several components is fitted in the fpga chip.On the whole, system is fitted on the element component in the fpga chip.Hereinafter will be described in detail in conjunction with Fig. 6.System puts into the fpga chip grid to minimize the spending of interchip communication with the hardware model parts.In one embodiment, array comprises one 4 * 4 FPGA array, a pci interface unit, and a software clock control module.The FPGA array has been realized the part of subscriber's line circuit design, as determined among the step 302-306 of above-mentioned this software translating process.The pci interface unit has allowed reconfigurable hardware simulation model to communicate by pci bus and workstation.Race state when software clock has avoided a plurality of clock signals to enter the FPGA array.In addition, step 307 connects up to fpga chip according to the communication progress sheet between hardware model.
Step 308 is inserted control circuit.These control circuits comprise I/O address pointer and data bus logical, it is used to get in touch the DMA engine (hereinafter will be in conjunction with Figure 11 to simulator, 12 and 14 discuss), and the estimation steering logic, with control hardware state-transition and the multiple transmission of lead (hereinafter will discuss) in conjunction with Figure 19 and 20.Known to the technical staff in the technical field, a direct memory access (DMA) (DMA) unit provides the additional data channel between peripherals and the primary memory, peripherals can directly be visited (that is, read, write) primary memory and do not needed the intervention of CPU therein.It is mobile that address pointer in each fpga chip allows data based bus size to be limited between software model and the hardware model.The estimation steering logic is essentially a finite state machine, and it guarantees that Clock enable is to the input of register to be asserted before the input of clock and data enters these registers.
Step 309 generates the configuration file that is used for hardware model is mapped to fpga chip.In fact, step 309 is distributed to discrete cell or gate leve parts on each chip with the circuit design parts.In view of step 307 is determined the mapping of hardware model group to specific fpga chip, step 309 obtains this mapping result and is each fpga chip generation configuration file.
Step 310 generates the software kernel code.Kernel is the software code sequence of the whole Analog Simulation System of control.Just can generate kernel up to this point, because hardware component need be upgraded and estimate to the part of code.Only after step 309, just occur to the correct mapping of hardware model and fpga chip.Hereinafter will discuss in more detail in conjunction with Fig. 5.Compiling ends at step 311.
Described as mentioned in conjunction with Fig. 4, after definite software and hardware model, generate the software kernel code in step 310.Kernel is one section software of control total system operation in the Analog Simulation System.The execution of simulation of kernel Control Software and simulation hardware.Because kernel also is present in the center of hardware model, so simulator combines with emulator.Compare with other known co-simulation systems, simulation system does not need simulator to interact with emulator from outside according to an embodiment of the invention.An embodiment of kernel is a control loop shown in Figure 5.
Referring to Fig. 5, kernel starts from step 330.Step 331 pair initialization codes is estimated.Control loop starts from step 332 and ends at steps in decision-making 339, constantly begins and circulates up to systematic observation less than effective test platform program, and expression simulation or simulation process are finished in the case.The effective test platform parts of step 332 estimation are used for simulation or emulation.
Step 333 estimation clock unit.These clock units are from the test platform program.Usually, the clock signal type of the supply simulation system that is generated by user regulation.(discussed and be replicated in this when bonded block type analysis above) in an example, the clock unit that the user designs in the test platform program is as follows:
always?begin
Clock=0;
#5;
Clock=1;
#5;
end;
In the clock parts example, user's decision at first produces the logical zero signal, and after 5 simulated times, will produce a logical one signal at this moment.This clock generative process will constantly circulate up to being stopped by the user.These simulated times are by interior nuclear propulsion.
Steps in decision-making 334 inquiry whether detect any effectively, will cause forming the logic estimation of some types in the software and clock edge that may hardware model (if emulation moves).The clock signal that kernel is used for detecting the efficient clock edge is the clock signal from the test platform program.If the estimation result of steps in decision-making 334 is a "No", then kernel enters step 337.If the estimation result of steps in decision-making 334 is a "Yes", then cause step 335 to upgrade RS, step 336 transmits combiner.Step 336 is safeguarded combinational logic in fact, and combinational logic is being asserted after the clock signal, needs some times by the combinational logic network delivery value.In case value has transmitted by combiner and be stable, kernel enters step 337.
Should notice that register and combiner also simulate in hardware, therefore, the emulator section of interior nuclear control Analog Simulation System.In fact, kernel can quicken the estimation to hardware model in step 334 and 335, no matter whether detect any effective clock edge.Therefore, be different from prior art, according to an embodiment of the invention Analog Simulation System can by software kernel and based on unit type (as, register, combiner) the accelerating hardware emulator.In addition, kernel is by the execution of each cycle period pattern Control Software and hardware model.In fact, the emulator hardware model can be characterized as the simulation coprocessor of the general processor of utilization simulation kernel, and it can move the simulation kernel.Coprocessor has quickened the simulation task.
The effective test platform parts of step 337 estimation.The step 338 propulsion module pseudotime.Step 339 provides the border of the control loop that starts from step 332.Step 339 determines whether that any test platform program is effective.If have, will continue operation simulation and/or emulation so and estimate more data.Like this, kernel is recycled to step 332 and removes to estimate any effective test platform parts.If not having the test platform program is effectively, then finish simulation and simulation process.Step 340 stops simulation/simulation process.Generally speaking, kernel is the main control loop of the whole Analog Simulation System operation of control.As long as it is that effectively kernel is just estimated effective test platform parts that any test platform program is arranged, the estimation clock unit detects the clock edge with renewal RS and transmission combinational logic data, and the propulsion module pseudotime.
Fig. 6 has shown an embodiment who automatically hardware model is mapped to the method on the reconfigurable circuit board.The wire list file provides the input of hardware implementation procedure.Wire list has been described logic function and its interconnection.Hardware model comprises three independently tasks to the FPGA implementation procedure: mapping, place and route.Instrument is commonly called " layout and wiring " instrument.The design tool that uses can be Viewlogic Viewdraw (a kind of schematic diagram obtains system) and Xilinx Xact layout and wiring software, or the MAX+PLUS II system of Altera.
The mapping task is divided into logical block with circuit design, I/O piece and other FPGA resources.Although some logic functions, for example trigger and impact damper can map directly in the corresponding FPGA resource, other logic function, and for example combinational logic then must use mapping algorithm to realize in logical block.The user can mapping inject the row selection to obtain optimal density or best performance usually.
The layout task relates to and obtains logic and I/O piece and they are assigned to physical location in the FPGA array from the mapping task.The general combination of using three kinds of technology of present FPGA instrument: minimum cutting (minicut), simulated annealing (simulating annealing) and conventional power point to lax (general force-directed relaxation, GFDR).Determined the optimal layout based on different cost functions on these technological essences, cost function depends on the overall network length of interconnection or along the delay of one group of key signal path, and other variable.Xilinx XC4000 Series FPGA instrument uses a kind of modification of minimum cutting technique to carry out initial layout, re-uses the GFDR technology layout is carried out trickle improvement.
The wiring task relates to the routing paths of a plurality of pieces through mapping and layout that are identified for interconnecting.A kind of this type of router is called as the labyrinth router, can seek the shortest path of point-to-point transmission.Because the wiring task provides the direct interconnection of chip chamber, so the circuit layout relevant with chip is just very crucial.
When beginning, can in gate leve wire list 350 or RTL357, hardware model be described.RTL level coding can further be synthesized the gate leve wire list.In mapping process, can use compositor server 360, for example MAX+PLUS II FPGA (Field Programmable Gate Array) tool System and the software of Altera produce and are used to shine upon the output file of purpose.Compositor server 360 can be with the existing logic element of user's circuit design parts and any standard in the routine library 361 (for example, standard totalizer or standard multiplication device) be complementary, the logic module 362 that generates any parametrization and frequently use (as, off-gauge multiplier or off-gauge totalizer), and synthetic random logic element 363 (as, the logic based on look-up table of execution custom logic function).The compositor server is also removed unnecessary logic and untapped logic.Output file synthesizes or has optimized the logic of subscriber's line circuit design in fact.
When some or all of HDL was in the RTL level, the circuit design parts were in sufficiently high rank, made Analog Simulation System can utilize analog simulation register or parts with these partial modelizations easily.When some or all of HDL was in gate leve wire list level, the circuit design parts may be circuit design specialization more, made user's circuit design parts difficult more to the mapping of analog component.Therefore, the compositor server have the ability to generate and anyly not to have the logic element of any similar random logic element or routine library standard logic element based on standard logic element variant or with these variants.
Ifs circuit is designed to the form of gate leve wire list, and Analog Simulation System will at first be carried out grouping or sort operation 351.The hardware model structure is based on assorting process, because combinational logic separates with clock with register.Therefore, the logic element of sharing a public major clock or door controling clock signal can be by better service by they being gathered together and being placed in jointly on the chip.Sorting algorithm drives based on connectedness, and classification is extracted, and regular texture is extracted.If describing is in structurized RTL358, Analog Simulation System can be with the unit of Function Decomposition Cheng Gengxiao so, as logic function operation splitting 359 is represented.In any stage, logic is synthesized or logic optimization if desired, and then available compositor server 360 changes circuit design into more effective expression according to user's instruction.For sort operation 351, by dotted arrow 364 expression its with being connected of compositor server.For structuring RTL358, by arrow 365 expression its with being connected of compositor server 360.For logic function operation splitting 359, by arrow 366 expression its with being connected of compositor server 360.
The operation 351 of hiving off is formed logical block in mode optionally based on function and size and is in the same place.Operation may only relate to that of miniature circuit design troops or a plurality of the trooping of large scale circuit design.In any case trooping of these logic elements used it is mapped in the fpga chip of appointment in the step in the back; That is to say, troop for one and will point to a specific chip, and another is trooped and points to a different chip or may troop with first and point to identical chip.Logic element in trooping is placed in the chip usually together, but for the optimization purpose, also may have to one trooped decompose in a plurality of chips.
After formation was trooped in the operation 351 of hiving off, system carried out layout and wiring operations.At first, the execution coarse grain layout operation 352 that will troop and be disposed into fpga chip.Coarse grain layout operation 352 at first is placed in trooping of logic element in the selected fpga chip.If desired, system can offer compositor server 360 coarse grain layout operation 352 and use, shown in arrow 367.Carry out the operation of particulate layout with the initial layout of trickle adjustment in coarse grain layout operation back.Analog Simulation System is used based on the requirement of pin utilization rate, and the cost function that the gate circuit utilization rate requires and door-to-door is jumped is determined the optimal layout of coarse grain and the operation of particulate layout.
The layout how of trooping is based in certain chip that the layout cost determines, the layout cost by two or more circuit (promptly, CKTQ=CKT1, CKT2 ... .., CKTN) cost function f (P, G, D) the relevant position calculating in the fpga chip array forms with circuit, and wherein P refers generally to pin use/availability, G refers generally to gate circuit use/availability, distance or quantity (as shown in Figure 7 and Figure 8) that D jumps for the door-to-door of interconnectedness matrix M definition.The subscriber's line circuit design of modelling in hardware model comprises total combination of circuit CKTQ.Define each cost function, the calculated value of the feasible layout cost that calculates helps generally: (1) is in the FPGA array, realize the minimum number of " jump " between any two circuit CKTN-1 and the CKTN, and the layout of circuit CKTN-1 and CKTN in (2) FPGA array, to obtain minimum pin utilization rate.
In one embodiment, cost function F (P, G D) are defined as:
This equation can be reduced to following form:
f(P,G,D)=C0*P+C1*G+C2*D
First (that is, C0*P) generates the first layout value at cost based on the quantity and the available pin number of use pin.Second (that is, C1*G) generates the second layout value at cost based on the quantity and the available gate circuit quantity of use gate circuit.The 3rd (that is, and C2*D) based on circuit CKTQ (be CKT1, CKT2 ... .., the quantity of the jump that exists between different interconnection gate circuits in CKTN) generates a layout value at cost.These three total layout value at costs of layout value at costs generation iteratively add up.Constant C 0, C1 and C2 represent weighting constant, by these three weighting constants, can optionally make the total layout value at cost that draws from this cost function, deflection is an of paramount importance factor or a plurality of factor (that is, pin uses, gate circuit uses or door-to-door is jumped) in any layout pricing process that adds up.
Along with system is weighting constant C0, C1 selects different correlations, double counting layout cost with C2.Therefore, in one embodiment, in the operating process of coarse grain layout, system is that C0 and C1 select higher value with respect to C2.In this repetitive process, system determines in the initial layout of circuit CKTQ in the fpga chip array that optimization pin use/availability and gate circuit use/availability are jumped more important than optimization door-to-door.In repetitive process subsequently, system is that C0 and C1 select smaller value with respect to C2.In this repetitive process, system determines that the jump of optimization door-to-door is more important than optimization pin use/availability and gate circuit use/availability.
In the operating process of particulate layout, system uses identical cost function.In one embodiment, about selecting C0, the step repeatedly of C1 and C2 is identical with step during the coarse grain layout is operated.In another embodiment, particulate layout operation relates to and allows the system be that C0 and C1 select smaller value with respect to C2.
To explain these variablees and equation now.For determining whether to arrange particular electrical circuit CKTQ in fpga chip x or fpga chip y (in other fpga chips), cost function will be checked pin use/availability (P), gate circuit use/availability (G) and door-to-door jump (D).Based on cost function variable P, G and D, (D) ad-hoc location that is created on the FPGA array is arranged the layout value at cost of particular electrical circuit CKTQ to cost function f for P, G.
Pin use/availability P also represents the I/O capacity.P
UsedThe employed pin number of circuit CKTQ for each fpga chip.P
AvailableBe pin number available in fpga chip.In one embodiment, P
AvailableBe 264 (44 pin x6 interconnection/chips), and in another embodiment, P
AvailableBe 265 (44 a pin x6 interconnection/chip+1 additional pins).But the specific quantity of usable pins depends on the type of employed fpga chip, the total quantity of interconnection that every chip uses, and each employed pin number that interconnects.Therefore, P
AvailableCan great changes have taken place.So (D) first of equation (that is, C0*P), be calculated the P of each fpga chip for P, G for the function F that assesses a fee
Used/ P
AvailableRatio.Like this, for 4 * 4FPGA chip array, calculate 16 P
Used/ P
AvailableRatio.For a given usable pins quantity, employed pin number is many more, and ratio is just high more.In 16 ratios that calculate, select the highest rate value.By with selected maximum rate P
Used/ P
AvailableC0 multiplies each other with weighting constant, calculates the first layout value at cost from first C0*P.Because this first depends on the ratio P that calculates
Used/ P
AvailableWith the specific maximum rate in the ratio that calculates for each fpga chip, so under the identical situation of other factors, the pin utilization rate is high more, the layout value at cost is also high more.The minimum layout of layout cost is selected by system.It has been generally acknowledged that, under all identical situation of every other factor, reach a minimum maximum rate P in the maximal value of promising different layout calculation
Used/ P
AvailableSpecified arrangement be optimal layout in the FPGA array.
The gate circuit quantity that gate circuit use/availability G allows based on each fpga chip.In one embodiment, based on the position of circuit CKTQ in the array, if in each chip employed gate circuit quantity G
UsedBe higher than a fixed threshold, this second layout cost (C1*G) will be endowed a value so, show that layout is infeasible.Similarly, employed gate circuit quantity is equal to or less than fixed threshold in the chip of circuit CKTQ if comprise at each, and this second (C1*G) will be endowed a value so, show that layout is feasible.Therefore, if system is desirably in cloth circuits CKT1 in the certain chip when beginning, and this chip do not have abundant gate circuit to hold circuit CKT1, and system will draw the infeasible conclusion of this specified arrangement by cost function so.Usually, G have very high numeral (as, unlimited) guaranteed that cost function will produce high layout value at cost, show that the layout of desired circuit CKTQ is infeasible, and should determine the layout that substitutes.
In another embodiment, based on the position of circuit CKTQ in the array, calculate the ratio G of each chip
Used/ G
Available, G wherein
UsedBe the employed gate circuit quantity of circuit CKTQ in each fpga chip, G
AvailableBe gate circuit quantity available in fpga chip.In one embodiment, system is used for the FPGA array with the FLEX10K100 chip.The FLEX10K100 chip comprises about 100,000 gate circuits.Therefore, in this embodiment, G
AvailableEqual 100,000 gate circuits.Like this, for 4 * 4FPGA chip array, calculate 16 G
Used/ G
AvailableRatio.For a given available gate circuit quantity, employed gate circuit quantity is many more, and ratio is just high more.In 16 ratios that calculate, select the highest rate value.By with selected maximum rate G
Used/ G
AvailableC1 multiplies each other with weighting constant, calculates the second layout value at cost from second C1*G.Because this second depends on the ratio G that calculates
Used/ G
AvailableWith the specific maximum rate in the ratio that calculates for each fpga chip, so under the identical situation of other factors, the gate circuit utilization rate is high more, the layout value at cost is also high more.The minimum layout of layout cost is selected by system.It has been generally acknowledged that, under all identical situation of every other factor, reach a minimum maximum rate G in the maximal value of promising different layout calculation
Used/ G
AvailableSpecified arrangement be optimal layout in the FPGA array.
In another embodiment, some values are at first selected for C1 by system.If ratio G
Used/ G
AvailableGreater than " 1 ", this specified arrangement infeasible (that is, at least one chip do not have abundant gate circuit be used for this particular electrical circuit layout) then.Thereby, system with C1 be modified as very large numeral (as, unlimited) and therefore, second C1*G also will be very large numeral, (P, G will be very high also D) to total layout value at cost f.On the other hand, if ratio G
Used/ G
AvailableBe less than or equal to " 1 ", so this specified arrangement feasible (that is, each chip has abundant gate circuit to support the realization of circuit).Thereby also therefore system does not revise C1, and second C1*G will have a particular value.
The 3rd C2*D represents the quantity of the jump between all gate circuits that need interconnection.The quantity of jumping also depends on interconnection matrix.The interconnectedness matrix provides need to determine chip-to the basis of circuit path between any two gate circuits of-chip interconnect.Be not that each gate circuit all needs door-to the interconnection of-Men.Based on the design of user's ifq circuit with will troop and be divided into certain chip, some gate circuits are without any need for interconnection, are arranged in identical chips because import (a plurality of input) with it separately and export the logic element (a plurality of logic element) that (a plurality of output) link to each other.But other gate circuit then needs interconnection, is arranged in different chips because import (a plurality of input) separately with it with the logic element (a plurality of logic element) that output (a plurality of output) links to each other.
In order to understand " jump ", please referring to the interconnectedness matrix of image format among form among Fig. 7 and Fig. 8.In Fig. 8, each interconnection of chip chamber, 44 pins or 44 wire lines are represented in for example interconnection 602 between chip F11 and chip F14.In other embodiments, each interconnection representative surpasses 44 pin.Again in other embodiments, each interconnection representative is less than 44 pin.
Utilize this interconnect scheme, data can pass to another chip from a chip in twice " jump " or " redirect ".Therefore, data can be utilized once to jump and 601 pass to chip F12 from chip F11 by interconnecting, and data can utilize twice jump by interconnecting 600 and 606 or interconnect and 603 and 610 pass to chip F33 from chip F11.These jumps be exemplified as the shortest hop path between these chipsets.In some cases, a plurality of chips will be passed in the path of signal, make the quantity of jumping between gate circuit in a chip and the gate circuit in another chip surpass the shortest hop path.Have only the circuit path that needs interconnection must be detected to determine door-to-quantity that Men jumps.
Interconnectedness is represented by the summation that all need to jump between the gate circuit of chip chamber interconnection.Use the interconnectedness matrix of Fig. 7 and Fig. 8, the shortest path of any two chip chambers can " jump " representative by once or twice.But, realize for the specific hardware model, the I/O capacity may limit in the array the direct-connected quantity of shortest path between any two gate circuits, and therefore, these signals will be through longer path (therefore more than twice jump) to arrive the destination.Thereby for some doors-connect to-Men, the quantity of jump may be above twice.Usually, under the situation that other conditions equate, few more number of skips will produce low more layout cost.
The 3rd (that is, detailed form C2*D) is as follows:
This 3rd is the product of weighting constant C2 and summation part (S...).Summation partly is essentially needs chip summation of all jumps between each gate circuit i and the gate circuit j in the design of the subscriber's line circuit of chip interconnect.As mentioned above, be not that all gate circuits all need the chip chamber interconnection.Need the gate circuit i and the gate circuit j of chip chamber interconnection for those, the number of times of jump is determined.For all gate circuit i and gate circuit j, with total number of skips addition.
Distance calculation also can be defined as follows:
At this, M is the interconnectedness matrix.An embodiment of interconnectedness matrix as shown in Figure 7.Calculate the distance that each door-to-door that need interconnect connects.Therefore, for the relation of each gate circuit i and gate circuit j, check interconnectedness matrix M.More be explicitly shown as,
Foundation comprises the matrix of all chips in the array, makes each chip have discernible numbering.These identiflication numbers are arranged on the top of matrix as column heading.Same, these identiflication numbers are arranged on a side of matrix as row headers.The particular table train value in the place that row and column intersects in this matrix provides the direct connection data between two chips that line number and column number intersect.For any distance calculation between chip i and the chip j, matrix M
I, jIn tabulated value comprise " 1 " (directly connect) or " 0 " (non-direct connection).Index k refers among the chip i that connection need interconnect the required number of skips of any gate circuit in any gate circuit and chip j.
At first, should test the interconnectedness matrix M of k=1
I, jIf tabulated value is " 1 ", then exist directly between the selected gate circuit among gate circuit among this chip i and the chip j to connect.Therefore, designated index or jump k=1 are as M
I, jResult and this result be two distances between the gate circuit.At this moment, can test other door-connect to-Men.But,, then do not have direct connection if tabulated value is " 0 ".
If there is no directly connect, then should test next k.This new k (that is, k=2) can pass through matrix M
I, jMultiply each other with himself and calculate; In other words, M
2=M*M, wherein k=2.
This will continue M and the process that himself multiplies each other up to the tabulated value of the specific row and column of chip i and chip j, be " 1 " up to the result who calculates, and this moment, selection index k was as the number of times that jumps.Operation comprises carries out AND-operation to matrix M, and the result who follows AND-operation carries out inclusive-OR operation.If matrix m
I, lAnd m
L, jBetween the result of AND-operation be logical value " 1 ", in chip i, exist between the selected gate circuit among selected gate circuit and the chip j so to connect, this connects by any chip l and within k jump; If not, then within jumping, this specific k time do not exist to connect and further calculating of needs.Matrix m
I, lAnd m
L, jFor being defined the interconnectedness matrix M that is used for hardware modeling.For any given gate circuit i and gate circuit j that needs interconnection, comprise and be used for matrix m
I, lGoing and comprising of the fpga chip of middle gate circuit i is used for gate circuit j and m
L, jThe row of fpga chip carry out the logical operation.To independent " with " ground, back component carries out OR operation to determine the M as a result for index or jump k
I, jValue is " 1 " or " 0 ".If the result is " 1 ", then exist to connect and the number of times of designated index k for jumping.If the result is " 0 ", then there is not connection.
Following Example has shown these principles.Referring to Figure 35 (A) to 35 (D).Figure 35 (A) has shown the circuit design by the user of cloud 1090 expressions.This circuit design 1090 can be simple or complicated.The part of circuit design 109 comprises OR-gate 1091 and two AND gates 1092 and 1093.AND gate 1092 is connected with the input of OR-gate 1091 with 1093 output.These gate circuits 1091,1092 also can be connected with other parts of circuit design 1090 with 1093.
Referring to Figure 35 (B), the parts of circuit 1090 comprise the part that comprises three gate circuits 1091,1092 and 1093, be set up and layout on fpga chip 1094,1095 and 1096.Interconnect scheme shown in the concrete demonstration example of this fpga chip array has; That is, one group of interconnection 1097 connects chip 1094 and chip 1095, and another group interconnection 1098 connects chip 1095 and chip 1096.Not directly interconnection between chip 1094 and chip 1096.When the parts with this circuit design 1090 were placed in the chip, system used interconnect scheme connecting circuit passage between different chips of pre-design.
Referring to Figure 35 (C), possible structure and layout are that OR-gate 1091 is placed on the chip 1094, AND gate 1092 is placed on the chip 1095, and AND gate 1093 is placed on the chip 1096.Other parts of display circuit 1090 are not as demonstration.Connection between OR-gate 1091 and the AND gate 1092 needs an interconnection, because they are arranged in different chips, in being to use one group of interconnection 1097.The number of skips of this interconnection is " 1 ".Connection between OR-gate 1091 and the AND gate 1093 also needs an interconnection, in being to use interconnected set 1097 and 1098.Number of skips is " 2 ".For this layout example, do not consider other gate circuits of not shown circuit 1090 remaining parts and the effect of interconnection, the jump total degree is " 3 ".
Figure 35 (D) has shown another layout example.At this, OR-gate 1091 is placed on the chip 1094, and AND gate 1092 and 1093 is placed on the chip 1095.Also not other parts of display circuit 1090 as demonstration.Connection between OR-gate 1091 and the AND gate 1092 needs an interconnection, because they are arranged in different chips, in being to use one group of interconnection 1097.The number of skips of this interconnection is " 1 ".Connection between OR-gate 1091 and the AND gate 1093 also needs an interconnection, in being to use interconnected set 1097.Number of skips also is " 1 ".For this layout example, do not consider other gate circuits of not shown circuit 1090 remaining parts and the effect of interconnection, the jump total degree is " 2 ".So only based on distance parameter D and suppose that any other factor all equates, the cost function of Figure 35 that is calculated (D) layout example will be lower than the cost function of Figure 35 (C) layout example.But other factor is also not all equal.Probably, the cost function of Figure 35 (D) is also based on gate circuit use/availability G.In Figure 35 (D), used a gate circuit than the identical chips among Figure 35 (C) in the chip 1095 more.In addition, the pin use/availability P of the chip 1095 in the layout example of Figure 35 (C) is greater than the pin use/availability of the identical chips in another layout example of Figure 35 (D).
After the coarse grain layout, for the trickle adjustment of the cluster layouts that the flattens result that further to optimize distribution.The layout of having selected when this particulate layout operation 353 has improved by coarse grain layout operation 352 beginnings.At this, if can reach more optimization effect, may be with the initial separation of trooping.For example, logic of propositions element X and Y are originally the part of the A that troops and are assigned to fpga chip 1.Because particulate layout operation 353, now logic element X and Y may be designated as independently troop B or become the part of another C that troops and layout in fpga chip 2.Then the generic connection subscriber's line circuit is designed and specifies the FPGA wire list 354 of FPGA.
How separation is trooped and its layout also is based on the layout cost and determined in certain chip, the layout cost then the cost function f by circuit CKTQ (P, G D) calculate.In one embodiment, the employed cost function of particulate layout process is identical with coarse grain layout process employed cost function.Only difference is the size of trooping of their layouts between two layout process, rather than process itself.Coarse grain layout process compares bigger the trooping of use with particulate layout process.In another embodiment, coarse grain is different with particulate layout process employed cost function, introduces weighting constant C0 as mentioned, and is described the same during the selection of C1 and C2.
In case layout is finished, just carry out the wiring task 355 of chip chamber.Surpassed these fpga chips and distributed to the usable pins quantity of circuit if connect the wiring route quantity be arranged in different chip circuit, then can use time-division multiplex to change (TDM) circuit to wiring.For example, if each fpga chip only allows 44 pins to be used for connecting the circuit of two different fpga chips, and a special model realizes having 45 leads at chip chamber, will settle special time-division multiplex change-over circuit so in each chip.This special TDM circuit connects two leads at least.An embodiment of TDM circuit is presented at Fig. 9 (A), among 9 (B) and 9 (C), will discuss hereinafter.Therefore, owing to pin can be arranged to the time-division multiplex transition form of chip chamber, so the wiring task always can be finished.
In case determined the place and route of each FPGA, then each FPGA can be configured to best operating circuit and system thereby generation " bit stream " configuration file 356.According to the term of Altera, system generates one or more Programmer Object Files (purpose file able to programme) (.pof).The file of other generations comprises SRAM Object Files (SRAM purpose file) (.sof), JEDEC Files (JEDEC file) (.jed), Hexadecimal (Intel form) Files (hex file) (.hex), and Tabular Text Files (table text file) is (.ttf).The MAX+PLUS II sequencer of Altera uses POFs, and SOFs and JEDEC file are the FPGA array program in conjunction with the hardware programmable equipment of Altera.Perhaps, system generates one or more former binary files (.rbf).CPU revises the .rbf file and is the FPGA array program by pci bus.
At this moment, the hardware that is disposed is hardware-initiated 370 to get ready.On reconfigurable plate, finished the automatic structure of hardware model like this.
Get back to the TDM circuit, the TDM circuit allows the conversion of many group pin output being carried out together time-division multiplexes, so in fact only use a pin output, the TDM circuit comes down to a multiplexer, it has at least two inputs (being used for two leads), an output, and be configured to a pair of register of loop as selector signal.If simulation system needs more lead to connect in groups, can provide more input and loop register so.As the selector signal of this TDM circuit, several registers that are configured to the loop provide appropriate signals to multiplexer, make one period, and an input is selected as output, and in another section period, another input is selected as output.Therefore, for example, the TDM circuit manages only to use an outlet line at chip chamber, make that the realization of circuit hardware model can be used 44 pins in certain chip, rather than 45 is finished.Like this, owing to pin can be arranged to the time-division multiplex transition form of chip chamber, so the wiring task always can be finished.
Fig. 9 (A) has shown the general survey of output connecting pin problem.Owing to need the TDM circuit, Fig. 9 (B) provides the TDM circuit of transmission ends, and Fig. 9 (C) provides the TDM circuit of receiving end.These figure have only shown a concrete example, and wherein Analog Simulation System need have a lead at chip chamber, rather than two.Surpass two leads if must connect in the time-division multiplex conversion equipment, then the person of ordinary skill in the field can carry out suitable improvement according to following content.
Fig. 9 (A) has shown an embodiment of TDM circuit, and wherein Analog Simulation System connects two leads in the TDM structure.Wherein have two chips, 990 and 991.As the circuit 960 of the part of complete subscriber's line circuit design simulated and layout in chip 991.As the circuit 973 of the part of complete subscriber's line circuit design simulated and layout in chip 990.Between circuit 960 and circuit 973, have a plurality of interconnection, comprise interconnected set 994, interconnection 992 and interconnection 993.In this example, interconnection add up to 45.If in one embodiment, each chip only provides 44 pins to be used for these interconnection at the most, then one embodiment of the present of invention provide at least two interconnection for the treatment of to be changed by time-division multiplex, only to need an interconnection between chip 990 and 991.
In this example, interconnected set 994 will continue to use 43 pins.For the 44th, also be last pin, can use TDM circuit according to an embodiment of the invention to connect interconnection 992 and interconnection 993 by the form of time-division multipath conversion.
Fig. 9 (B) has shown an embodiment of TDM circuit.Precircuit in the fpga chip 991 (or its part) 960 provides two signals on lead 966 and 967.For circuit 960, lead 966 and 967 is output.These outputs precircuit 973 general and in the chip 990 is connected (seeing Fig. 9 (A) and 9 (C)).But, only hindered pin contacting directly to pin for these two output leads 966 and 967 provide a pin.Because export 966 and 967 other chips are adopted one-way transmission, so must provide suitable transmission and receiver TDM circuit to be connected these circuits.Fig. 9 (B) has shown an embodiment of transmission ends TDM circuit.
Transmission ends TDM circuit comprises AND gate 961 and 962, and their output separately links to each other with the input of OR-gate 963.The output 972 of OR-gate 963 is to distribute to a pin and the output of the chip that links to each other with another chip 990.Provide respectively one group of input 966 and 967 by circuit model 960 to AND gate 961 and 962.Another group input 968 and 969 is provided by the loop register circuit as time-division multiplex change over selector signal.
The loop register circuit comprises register 964 and 965.The output 995 of register 964 is provided for the input of register 965 and the input 968 of AND gate 961.The output 996 of register 965 is connected with the input 969 of the input of register 964 and AND gate 962. Register 964 and 965 is by a common clock impulse source control.At any given time instant, only there is one to be logical one in the output 995 or 996.Another is a logical zero.Therefore, after each clock edge, logical one output 995 and export 996 between conversion.This is not to provide a logical one to AND gate 962 exactly to AND gate 961 conversely speaking,, with the signal on " selection " lead 966 or the lead 967.Therefore, the data on the lead 972 by lead 966 or lead 967 from circuit 960.
Fig. 9 (C) has shown an embodiment of TDM circuit receivers end parts.Signal from circuit 960 (Fig. 9 (A) and 9 (B)) in the chip 991 on lead 966 and the lead 967 must link to each other with suitable lead 985 or 986 to arrive the circuit 973 among Fig. 9 (C).Time-division multiplex switching signal from chip 991 enters from lead/pin 978.Receiver end TDM circuit can link to each other these signals on lead/pin 978 to arrive circuit 973 with suitable lead 985 or 986.
The TDM circuit comprises input register 974 and 975.Signal on lead/pin 978 offers these input registers 974 and 975 by lead 979 or 980 respectively.The output 985 of input register 974 is provided for port suitable in the circuit 973.Same, the output 986 of input register 975 is provided for port suitable in the circuit 973.These input registers 974 and 975 are by loop register 976 and 977 controls.
The output 984 of register 976 links to each other with the clock input 981 of the input of register 977 and register 974.The output 983 of register 977 links to each other with the clock input 982 of the input of register 976 and register 975.Register 976 and 977 is by a common clock impulse source control.At any given time instant, starting only has to be a logical one in the input 981 or 982.Another is a logical zero.Therefore, after each clock edge, logical one conversion between starting input 981 and exporting 982.Conversely speaking, this " selection " signal on lead 979 or the lead 980.Therefore, the data from circuit 960 correctly are connected with circuit 973 by lead 985 or lead 986 on the lead 978.
To go through the simple address pointer of introducing according to an embodiment of the invention now in conjunction with Fig. 4.Reiterate, be mounted with a plurality of address pointers in each fpga chip in hardware model.Usually, settling the fundamental purpose of address pointer is to make the system can be by transmitting data (referring to Figure 10) between the specific fpga chip of 32 pci buss 328 in software model 315 and hardware model 325.More particularly, the fundamental purpose of address pointer is that the bandwidth constraints according to these 32 pci buss optionally is controlled at each address space (that is, REG, the S2H in the software/hardware border, H2S, and CLK) and fpga chip group 326a-326d in each fpga chip between data transmit.Even 64 pci buss have been installed, still need these address pointers to come control data to transmit.Therefore, if software model has 5 address spaces (that is, REG reads, and REG writes, and S2H reads, and H2S writes and CLK writes), then each fpga chip has 5 address pointers corresponding to these 5 address spaces.Each FPGA needs this 5 address pointers, because specific institute word selection may be arranged in any one or a plurality of fpga chip in just processed selected address space.
FPGA i/o controller 381 is selected corresponding to software/hardware border specific address space (that is, REG, S2H, H2S, and CLK) by using the SPACE index.In case selected address space, then selected specific word corresponding to the particular address indicator of selected address space in each fpga chip corresponding to word identical in the selected address space.The full-size of the address pointer in the address space in the software/hardware border and each fpga chip depends on the memory capacity/word length of selected fpga chip.For example, one embodiment of the present of invention are used Altera FLEX 10K Series FPGA chip.Therefore, the expectation full-size of each address space is: REG, 3,000 words; CLK, 1 word; S2H, 10 words; H2S, 10 words.Each fpga chip approximately can hold 100 words.
The analog simulator system also has and allows any time of user in the analog simulation process, stops, and asserts the input value and the characteristics of check the value.In order to make simulator have dirigibility, to the user as seen analog simulator also must make all parts, and no matter parts are to finish inner the realization in software or hardware.In software, modelling combiner and calculated value in simulation process.Therefore, these are worth clear for the user " as seen ", can any time in simulation process carry out access.
But the combiner value in the hardware model is like this directly " as seen " not.Although register can be easy to directly be visited by software kernel (that is, read/write), combiner is more difficult to be determined.In FPGA, most of combiners are become look-up table to obtain high gate circuit utilization factor by model.Thereby the look-up table mapping provides effective hardware model, but has lost the observability of most of combinational logic signals.
Although have the problem that combiner lacks observability, Analog Simulation System can rebulid or generate combiner for customer inspection after hardware-accelerated pattern.If user's circuit design only has combination and register parts, then can from the register parts, obtain the value of all combiners.That is to say, the specific logical function required according to circuit design, combiner with multi-form by register structure or comprise register.Analog simulator only has the hardware model of register and combiner, and therefore, analog simulator will be read all register values from hardware model, rebulid or generate all combiners then.Regenerate the process need extra expenses because carry out this, so always do not carry out regenerating of combiner; And be based on the user need carry out.In fact, using a benefit of hardware model is to quicken simulation process.Determine that in each circulation (or most of circulation) the combiner value has further reduced the speed of simulation.Under any circumstance, only the inspection of register value just can be satisfied the requirement of most of sunykatuib analyses.
The process hypothetical simulation analogue system that regenerates the combiner value from register value is in hardware-accelerated pattern or ICE pattern.Otherwise software simulation has offered the user with the combiner value.Analog Simulation System kept residing in combiner value and the register value in the software model before hardware-accelerated beginning.These values are retained in the software model up to being rewritten once more by system.Because software model is from just in time having had register value and combiner value the period before hardware-accelerated operation begins, relates to according to the input register value of upgrading and upgrade in these values some or all so regenerate the process of combiner.
Combiner to regenerate process as follows: at first, if the user needs, software kernel can be read all output valves of hardware register parts the REG impact damper from fpga chip.This process relates to register value in the fpga chip by DMA (direct memory access (DMA)) conversion of address pointer to the link of REG address space.Register value in the hardware model is placed in the REG impact damper that is arranged in the software/hardware border, allows the software model visit data to be used for further processing.
The second, before the more hardware-accelerated operation of software kernel and the register value after the hardware-accelerated operation.If the register value before the hardware-accelerated operation is identical with hardware-accelerated operation value afterwards, then the value in the combiner does not change.Can from software model, read these values, rather than expend time in and resource regenerate combiner, software model has had the combiner value that just in time was stored in wherein before hardware-accelerated operation.On the other hand, if the one or more of these register values change, the one or more combiner values that depend on the register value of change also will change.Must regenerate these combiners by following third step.
The 3rd, for register has different value with quickening afterwards before quickening, software kernel is arranged into the combiner of its fan-out in the event queue.At this, those registers that changed value in quickening operational process detect an incident.Probably, the combiner that depends on the register value of these changes will produce different values.No matter how these combiner values change, and system guarantees that these combiners estimate the register value of these changes in next procedure.
The 4th, software kernel is followed operative norm event simulation algorithm, will be worth all combiners in changing from the register transfer to the software model.In other words, the register value that changed in the time interval after quickening before quickening is sent in all downstream combiners that depend on these register values.Then, the new register value of these combiner estimations.According to fan-out and transmission principle, other secondary combiners that are placed in the one-level combiner downstream that directly relies on the register value that changes successively also must be estimated the data that change, if any.This is sent to the end that process that other may affected components downstream is extended to fan-out network with register value.Therefore, only upgraded the combiner that the register value that is positioned at the downstream in the software model and changed influences.Be not that all combiner value is all influenced.Therefore, if only have a register value to change in the time interval before quickening after quickening, and the influence that only has a combiner changed by this register value, so only this combiner will be estimated its value again according to the register value of this change.Other parts of precircuit are unaffected.Hereto little variation, it is very fast relatively that the process that regenerates of combiner will be carried out.
At last, after the incident transmission was finished, system all set carried out the operation of any pattern.Usually, user expectation check the value after long-time running.After the process that regenerates of combiner, the user will proceed the pure software simulation, for debugging/test.But at other times, the user may wish to proceed hardware-accelerated to arrive next impact point.In other cases, the user may wish to continue implementation under the ICE pattern.
Generally speaking, combiner regenerates to relate to and uses register value to go combiner value in the update software model.When any register value changed, the register value that changes during on duty being updated will be transmitted by the fan-out network of this register.When not having register value to change, the value in the software model can not change yet, so system need not regenerate combiner.Usually, hardware-accelerated operation will continue for some time.Thereby many register values will change, and influence is positioned at a lot of combiner values in the fan-out network downstream of the register that these values change.In the case, the process that regenerates of combiner is with relatively slow.In other cases, after hardware-accelerated operation, have only a few registers value to change.The fan-out network of the register that value changes may be less, and therefore, the process that regenerates of combiner will be very fast relatively.
IV. utilize the goal systems mode simulation
Figure 10 has shown Analog Simulation System structure according to an embodiment of the invention.Figure 10 has also shown when system operates in interior circuit simulation pattern, software model, a relation between hardware model and emulation interface and the goal systems.As previously mentioned, this Analog Simulation System comprises a general purpose microprocessor and a reconfigurable hardware plate that is interconnected by the high-speed bus such as pci bus.The circuit design of this Analog Simulation System compiling user and be that hardware model generates the simulation hardware configuration data to the mapping process of reconfigurable circuit plate.The user can pass through the general processor mimic channel then, and hardware-accelerated simulator program utilizes the design of goal systems artificial circuit by emulation interface, carries out the back sunykatuib analysis afterwards.
In compilation process, determine software model 315 and hardware model 325.Emulation interface 382 and goal systems 387 also are provided in the system of interior circuit simulation pattern.Under user's judgement, emulation interface and goal systems do not need to be coupled in the system when beginning.
Software model 315 comprises kernel 316, these kernel 316 control total systems, and four address space-REG, S2H, H2S and CLK being used for the software/hardware border.Analog Simulation System is mapped to 4 address spaces in the primary memory according to different unit types and control function with hardware model: specify REG space 317 to be used for the register parts; Specify CLK space 320 to be used for software clock; Specify S2H space 318 to be used for of the output of software test platform parts to hardware model; Specify H2S space 319 to be used for the output of hardware model to the software test platform parts.In the initialization time of system, these special-purpose I/O cushion spaces are mapped in the primary storage space of kernel.
Hardware model comprises the several FPGA group 326a-326d and the FPGA i/o controller 327 of fpga chip.Each group (for example, 326b) comprises a fpga chip at least.In one embodiment, each group comprises 4 FGPA chips.In a 4X4FPGA chip array, group 326b and group 326d may be the low side groups, and group 326a and group 326c may be high-end group.Mapping has been discussed, layout and from the modeled subscriber's line circuit design part of specific hardware to specific chip with their wiring of interconnection among Fig. 6.Interconnection 328 between software model 315 and the hardware model 325 is pci bus systems.Hardware model also comprises FPGA i/o controller 327, and this controller 327 comprises a pci interface 380 and a control module 381 that is used for controlling the data communication between pci bus and the fpga chip group 326a-326d when keeping the pci bus throughput.Each fpga chip also comprises several address pointers, wherein each address pointer (is REG corresponding to each address space in the software/hardware border, S2H, H2S and CLK), join data between each fpga chip among each address space and the fpga chip group 326a-326d with this coupling.
Communication between software model 315 and the hardware model 325 takes place by DMA engine in the hardware model or address pointer.Perhaps, communication also takes place by DMA engine in the hardware model and address pointer.Kernel is by directly mapping I/O control register startup DMA transmission and estimation request.REG space 317, CLK space 320, S2H space 318 and H2S space 319 use I/O data routing circuit 321,322,323 and 324 to be used for data transmission between software model 315 and the hardware model 325 respectively.
All primary inputs in S2H and CLK space all need double buffering, because several clock period of these space requirements are finished renewal process.Double buffering has avoided causing the interference to inner hardware model state of race state.
S2H and CLK space are the primary inputs from the kernel to the hardware model.As mentioned above, hardware model holds all register parts and combiners of subscriber's line circuit design in fact.In addition, software clock modelling and in the CLK input/output address space, be provided for interface in software to hardware model.The kernel propulsion module pseudotime, seek effective test platform parts and estimation clock unit.When kernel detects any clock edge, upgrade RS and transmit numerical value by combiner.Like this, if select hardware-accelerated pattern, any variation of numerical value will trigger hardware model change logic state in these spaces.
For interior circuit simulation pattern, emulation interface 382 is coupled to pci bus 328 and communicates by letter with software model 315 with hardware model 325 with this.In the process of hardware-accelerated simulation model and interior circuit simulation pattern, kernel 316 is Control Software pattern but also control hardware pattern not only.Emulation interface 382 also is coupled in the goal systems 387 by cable 390.Emulation interface 382 also comprises interface port 385, emulation I/O control 386, and target arrives input/output (i/o) buffer (T2H) 384 of hardware and the input/output (i/o) buffer (H2T) 383 that hardware arrives target.
Goal systems 387 comprises 389, one signal input/signal output interfaces of connector socket 388 and other modules or the chip that belong to goal systems 387.For example, goal systems 387 can be an EGA Video Controller, and the subscriber's line circuit design can be a special i/o controller circuit.Be used for the EGA Video Controller i/o controller subscriber's line circuit design software model 315 complete modelizations and in hardware model 325 part topotype typeization.
Kernel 316 in the software model 315 is also controlled interior circuit simulation pattern.The control of simulated clock simulation clock is still in software undertaken, so problem with holding time in interior circuit simulation pattern, can not occur assembling by software clock, gated clock logic sum gate control data logic.Like this, the user can open in any time in the interior breadboardin process, stops, and single step is carried out, and asserts numerical value and check numerical value.
To move like this, all clock nodes between recognition objective system and the hardware model.Clock generator in the forbidding goal systems disconnects the clock port from goal systems, or stops the clock signal from goal systems to enter hardware model.On the contrary, clock signal generates other form of clock from a test platform program or software, and software kernel can detect effective clock edge and estimates with trigger data like this.Therefore, in the ICE pattern, Analog Simulation System uses software clock rather than goal systems clock to come the control hardware model.
Operation for the design of analog line circuit in the environment of goal systems offers hardware model 325 for estimation with primary input between goal systems 40 and the modeled circuit design (signal input) and output signal (signal output).This realizes to target buffer (H2T) 383 these two impact dampers to hardware buffer (T2H) 384 and hardware by target.Goal systems 387 uses T2H impact damper 384 that input signal is applied to hardware model 325.Hardware model 325 uses H2T impact damper 383 that output signal is transported to goal systems 387.In this in circuit simulation pattern, hardware model is by T2H and H2T impact damper rather than S2H and the H2S impact damper receives and the transmission input/output signal, because system uses the test platform program in goal systems 387 rather than the software model 315 to come the estimated data now.Because goal systems is with a speed operation that is higher than software simulation speed in fact, interior circuit simulation pattern also will be with a fair speed operation.The transmission of these input and output signals occurs in the pci bus 328.
In addition, a bus 61 is also arranged between emulation interface 382 and hardware model 325.Bus 61 among this bus and Fig. 1 is similar.This bus 61 allows emulation interface 382 to communicate by letter with H2T impact damper 383 by T2H impact damper 384 with hardware model 325.
Usually, goal systems 387 is not coupled to pci bus.But if emulation interface 382 is merged in the design of goal systems 387, a such coupling connection is feasible.In this assembling, cable 390 will not exist.Signal between goal systems 387 and the hardware model 325 will pass emulation interface.
V. sunykatuib analysis pattern after
Analog Simulation System of the present invention can be supported revaluate dump (VCD), a kind of simulator function that is widely used in the back sunykatuib analysis.In essence, VCD provides all inputs of hardware model and the historical record of selected register output, makes that afterwards the user can check the output that the difference of simulation process is imported and caused in the sunykatuib analysis of back.In order to support VCD, system records all inputs in the hardware model.For output, system is with all values of a user-defined recording frequency (for example, 1/10,000 record/cycle) record hardware register parts.How long recording frequency decision output valve writes down once.For the recording frequency in 1/10,000 a record/cycle, output valve of per 10,000 periodic recordings.Recording frequency is high more, and the information of the back sunykatuib analysis after being used for just record is many more.Recording frequency is low more, and the canned data of the back sunykatuib analysis after being used for is just few more.Because recording frequency and the analog simulation speed selected have cause-effect relationship, so the user should carefully select recording frequency.A higher recording frequency will reduce analog simulation speed, because before further simulation, system must expend time in and resource writes down output data by carrying out the I/O of storer operated.
About the back sunykatuib analysis, the user selects a specified point that simulation is required.If recording frequency is 1/500 record/cycle, register value is being recorded every the point 0,500,1000,1500 in 500 cycles or the like.If the user need be for example in the result who puts 610, the point 500 that user's selection has been recorded, and simulate forward in time up to point 610.In the analysis phase, analysis speed is the same with analog rate, because the user visits 500 data at the beginning, simulates a little 610 then forward.Attention is on higher recording frequency, for more data has been stored in the back sunykatuib analysis.Like this, for the recording frequency in 1/300 a record/cycle, store data at point 0,300,600,900 or the like every 300 cycles.In order to obtain in the result of point on 610, the point 600 that the user selects to be recorded at the beginning simulates a little 610 then forward.Notice that when recording frequency was 1/300 rather than 1/500, system can reach the point 610 of expectation quickly in the sunykatuib analysis of back.But such was the case with for situation.Special analysis site can arrive back sunykatuib analysis point with speed how soon together with the recording frequency decision.For example, if the VCD recording frequency is 1/500 rather than 1/300, system can reach a little 523 quickly so.
The user can calculate the revaluate dump of all hardware parts with this, thereby carry out the analysis after the analog simulation by using the input record executive software simulation of hardware model then.The user also can in time select any register measuring point and begin the revaluate dump forward from that measuring point in time.This revaluate dump method can be linked to any analog waveform visualizer and be used for the back sunykatuib analysis.
The VCD random selection system
One embodiment of the present of invention are VCD random selection systems that do not need to rerun simulation.According to one embodiment of present invention, VCD described herein comprises following high standard characteristic with selecting technology: (1) is based on the historical compress and record of parallel simulation of RCC, (2) generate based on the historical decompress(ion) of parallel simulation of RCC and VCD file, and (3) not have to simulate under the situation of reruning to the simulated target scope of a selection and design review (check) (DR) with selecting software regeneration to become.Each characteristic will go through hereinafter.
In a debug procedures, eda tool (following finger RCC system, it has comprised different aspect of the present invention) record can reproduce any part of simulation like this from the primary input of a test platform program.The user can selectively order eda tool or RCC system that hardware status information is dumped to a VCD file from any simulated time scope to be used for later analysis then.After this, the user can begin to debug his design immediately in the simulated time scope of selecting.If the simulated time scope of selecting does not comprise the fault that the user seeks to solve, he can select another simulated time scope to be dumped to the VCD file.The user can analyze this new VCD file then.Because VCD's is this with selecting characteristic, the user can stop to simulate and requiring generating the VCD file of another selection with choosing from any required simulated time starting point to any simulated time terminal point at any point.
In a typical debug procedures, the user uses his design of RCC system debug shown in Figure 83.In the dry run first time, the user simulates his design apace from the simulated time of an expectation to the termination simulated time of any expectation, is called a simulation Process Extents (simulation session range) at this during this period of time.In this dry run fast, a high compression form of primary input is recorded in " input is a historical " file can reproduce any part of this simulation process with this.At the end of simulation Process Extents, the RCC system can be stored in hardware status information " simulation is a historical " file from this terminal point, makes the user can cross this terminal point if necessary and recovers the debugging design.
At the quick end of dry run, the user is analysis result, and inevitably detects some problems that exist in his design.The user makes conjecture then, and the root of problem (being fault) is present in the specific narrow simulated time scope, is called the simulated target scope here, and this simulated target scope is in wideer simulation Process Extents.For example, if the simulation Process Extents comprises 1,000 simulated time step, narrower simulated target scope may only comprise 100 simulated time steps by a privileged site in the simulation Process Extents of broad.
In case the user makes conjecture with isolated fault to the exact position of simulated target scope, the RCC system primary input by the compression in the decompress(ion) input the history file at the very start and primary input of decompress(ion) is sent to is used for estimation in the hardware model is simulated apace.When the RCC system reached the simulated target scope, result's (for example, hardware node value and buffer status) of its dump estimation was to a VCD file.After this, the user can come more carefully to analyze this zone by using from reset his design of the initial VCD file of simulated target scope, rather than must be from the beginning of simulation Process Extents or even from the simulation of reruning at the beginning of simulation.This feature that hardware state gets up with a VCD file storage from the simulated target scope has been saved a large amount of debug time of user, not so the time will be wasted in the simulation rerun on.
Referring now to Figure 83, Figure 83 has described a high-level diagram of the RCC system that comprises one embodiment of the present of invention.The RCC system comprises a rcc computing system 2600 and a RCC hardware accelerator 2620.As other local descriptions of patent specification, rcc computing system 2600 comprises computational resource, and the hardware-accelerated of hardware modeling part is essential to this computational resource in user's modeled design of analog subscriber whole software and the controlling Design for allowing in software.For this reason, rcc computing system 2600 comprises CPU2601, the various clocks 2602 (being included in other local software clocks of describing of patent specification) that the various parts of RCC system are required, test platform program 2603 and system disk 2604.Compare with the event history impact damper based on hardware of some routines, system disk is used for data rather than little hardware RAM impact damper of recording compressed.Although do not show in the drawings, rcc computing system 2600 comprises other logical blocks and for circuit designers provides the bus sub of computing power, with this operational diagnostics program in other tasks that a computing system is carried out, and different software and handle file.
RCC hardware accelerator 2620, other parts in patent specification also are known as the RCC array, it comprises the reconfigurable array (for example FPGA) in the logical block, this array can be at least the part of modelling user design in hardware make the user can quicken debug process.For this reason, RCC hardware accelerator 2620 comprises the array 2621 of reconfigurable logical block, and this array provides the hardware model of the part that the user designs.Rcc computing system 2600 is by software clock and bus systems as other local descriptions in the patent specification, and its part is shown in the circuit among Figure 83 2610 and 2611, with RCC hardware accelerator 2620 close-coupled.
VCD of the present invention is with selecting the aspect to be discussed with reference to Figure 84.Figure 84 has shown several simulated time-t0, t1, the time series of t2 and t3.The simulation Process Extents wherein comprises simulated time t1 and t2 certainly between simulated time t0 and simulated time t3.Simulated time t0 representative is the first interior simulated time of simulation Process Extents of simulation beginning fast.This simulated time t0 represents first simulated time of any separable simulation process or simulation Process Extents.In other words, suppose that the debug procedures of today comprises from t=10,000 to t=12, an inspection of 000 simulation Process Extents.The user guesses that specific fault is positioned at t=10,500 and t=10, and certain between 750.For this simulation Process Extents, simulated time t0 is t=10,000.Be assumed to this simulation Process Extents t=10,000 to t=12,000 finds and solves specific fault.Tomorrow, the user then moves to next simulation Process Extents t=12,000 to t=15,000.Here, simulated time t0 is t=12,000.Sometimes, the initial simulated time of first debug procedures of simulated time t0 representative of consumer design, that is, t0 is corresponding to t=0.
Similarly, the last simulated time of the simulation Process Extents of simulated time t3 representative selection.In other words, suppose that the debug procedures of today comprises from t=14,555 to t=16, a detection of 750 simulation Process Extents.For this simulation Process Extents, simulated time t3 is t=16,750.Suppose at simulation Process Extents t=14,555 to t=16,750 find and solve specific fault.The user moves on to next simulation Process Extents t=16,750 to t=19,100 then.Here, simulated time t3 is t=19,100.Sometimes, the last simulated time of last debug procedures of simulated time t3 representative of consumer design.
If this moment not anxious needs, the user outside this simulated time t3, can continue the simulation, he concentrates on simulated time t0 to t3, promptly current simulation Process Extents is debugged his design.Usually, when solving fault in current simulation Process Extents, the user will enter into next simulation Process Extents and continue his design of simulation outside simulated time t3.
In this abstract representation of simulation Process Extents, these simulated time sections t0-t3 needn't be adjacent one another are, and promptly simulated time t0 and t1 are not adjacent immediately.Really, simulated time t0 and t1 thousands of the simulated time section of can being separated by.
Because will in the RCC system, implement one embodiment of the present of invention, so will be with reference to the various parts of the RCC system shown in Figure 83.The input of RCC system at first will be discussed and simulate historical generating run.This generating run comprises the record of the primary input of some forms of data compression of primary input and compression.Secondly the VCD generating run of RCC system will be discussed.This VCD generating run comprises that the decompress(ion) primary input is to duplicate simulation history and hardware state to be dumped in the VCD file of a simulated target scope.The 3rd, VCD file trace routine is discussed.Although use term " simulation is historical " sometimes, this does not mean that whole debug procedures relates to software simulation.Really, the RCC system generates the VCD file and only uses software model for VCD file analysis afterwards from hardware state.
Input and simulation are historical to be generated---compress and record
At first, modelling design in the software of user in the rcc computing system 2600 of Figure 83.For the some parts of design, rcc computing system 2600 (for example, VHDL) automatically generates a hardware model of design based on hardware description language.The configure hardware model carries out in the array of reconfigurable logical block 2621, and this array is the part of RCC hardware accelerator 2620.This device has been arranged, the user can be in the software in the rcc computing system 2600 board design and use RCC hardware accelerator 2620 to quicken the part (being the unique physical part of simulated time step or circuit) of design or a simulation and hardware-accelerated combination.
The user has just finished his up-to-date circuit design.Should debug design now and seek defective.If the user had before debugged of design than older version, he knows the place that a fault perhaps occurs.On the other hand, if this newly-designed initial debug procedures that is this, the user must make conjecture for the appearance position of an incipient fault.No matter which kind of situation needs some conjectures probably to determine abort situation.In order to discuss, suppose that this is the situation that initial debugging designs.
In the debugging design, the user selects a simulation Process Extents.In theory, this simulation Process Extents simulated time that can be any length.Yet in fact, the simulation Process Extents should be selected enough shortly in the several faults in the isolation design, and should select long enough to carry out debugged program fast and the debug procedures quantity of a design of abundant debugging minimized.Obviously, two or three simulated time steps existence that the simulation Process Extents can not disclose any fault.And this little simulation Process Extents will force the user to carry out the iterative task of many debugged programs that slow down.If the simulation Process Extents of selecting is 1,000,000 simulated time steps, too many fault can display, and the user will find to be difficult to concentrated problem of solution like this.
Shown in Figure 84, in case the user has selected a simulation Process Extents, he just orders the RCC system to be simulated from simulated time t0 to simulated time t3 fast.As previously mentioned, simulated time t0 can be the scope of any selection to the interval of t3, but the beginning of simulated time t0 representative simulation and simulated time t3 represents the last simulated time of this simulation Process Extents.
At simulated time t0, beginning simulation fast in rcc computing system 2600.Carry out simulation fast rather than normal simulation model from simulated time t0 to simulated time t3, because in this time period, do not need the regeneration software model.As described in other places of patent specification, regeneration action need rcc computing system 2620 receive hardware status information (as nodal value, buffer status) make in software can regeneration more complex logic parts (as combinational logic) think that the user does further analysis.Certainly, some users may want to check software model in simulation process, and in this case, rcc computing system 2600 is not simulated fast.In the case, because rcc computing system 2600 needs regeneration software model from the main output of hardware model of extra time, simulation process is slower.
At the beginning, the complete state of design is stored in one at simulated time t0 and is called the file of " simulation is historical " in system disk such as software model state and hardware model register and nodal value.This has allowed the user in any time in the future the state that designs to be written into the usefulness of RCC system for debugging.During the simulation fast of the simulation Process Extents from simulated time t0 to simulated time t3, rcc computing system 2600 is applied to primary input I concurrently with two distinct programs
pIn.On circuit 2610, will be provided to RCC hardware accelerator 2620 from the original primary input of test platform program 2603 to estimate.Simultaneously, will make that the whole history of primary input is brought together, reproduce any part of simulation to allow the user from the identical primary input of test platform program as the File Compress of independent being called " input is historical " and be recorded in the system disk.Especially, be compressed and be stored in the system disk to the corresponding primary input of simulated time t3 with simulated time t0.
As the primary input I of RCC hardware accelerator 2620 receptions from test platform program 2603
pThe time, it handles primary input.Therefore, when Different Logic and other circuit arrangements estimated data, the hardware state in the hardware model probably changes.From simulated time t0 to simulated time t3 during this,,, the RCC system do not carry out its logic regeneration so not needing to wait rcc computing system 2600 because the user designs and loses interest in debugging meticulously in fast during the simulation at this.The RCC system is the not main output of storage (for example, hardware node value and buffer status) also.Note when rcc computing system 2600 compression primary inputs are used to record " input is historical " file the primary input of RCC hardware accelerator 2620 estimation original, uncompressed.In other embodiments, the RCC system does not compress primary input, is used for it is recorded the input history file.
Is not is is why rcc computing system 2600 sent to primary input and used for estimation in the RCC hardware accelerator and storing these outputs between the simulation process fast simultaneously? the RCC system need store based on coming self simulation to begin to the hardware state of the design of the estimation of the primary input of simulated time t3.Unless hardware model estimated from beginning to the whole history of the primary input of a t3, and just from the input of simulated time t3, otherwise on simulated time t3, can not obtain the accurate snapshot of hardware model state.Logical circuit has the memory attribute based on input sequence influence estimation result.Like this, be used for estimation if only will supply with hardware model from the primary input of simulated time t3 (or the simulated time before next-door neighbour's simulated time t3), hardware model perhaps will present the state of mistake on this simulated time t3.
Why at simulated time t3 storage hardware model state? in a relatively short time, can not debug a large-scale design that the simulated time step that surpasses 1,000,000 gate circuit and surpass 1,000,000 is arranged.The user needs a plurality of simulation processes to debug this design.In order to move to next simulation process apace from a simulation process, the next one simulation Process Extents that the RCC system makes the user to debug to begin at simulated time t3 from simulated time t3 storage hardware state (together with the primary input of compression).The hardware model state that storage has been arranged, the user does not need from the simulation at the beginning of simulation, and on the contrary, the user can be fast and get back to simulated time t3 easily debugged design during simulated time t3 from simulated time t0 after.In the simulation history file and represent the correct snapshot of his design, this snapshot is a reflection up to the whole history of the primary input of that point in the hardware model state storage on the simulated time t3.
If necessary and the words that need of user, hardware model in the RCC hardware accelerator 2620 provides the internal hardware state to rcc computing system 2600 on circuit 2611, makes the rcc computing system 2600 different logical block (as combinational logic) of can setting up on software model or regenerate.But as mentioned above, the user also is indifferent to during the quick simulation of simulation Process Extents and observes software simulation.Therefore, because the user can not check the internal hardware state in order to find out fault at present, so be not stored in the system disk from these internal hardware states of RCC hardware accelerator.
At simulated time t3 or at the end of simulating Process Extents, stop this specific quick simulated operation.Estimation result or primary input (for example register value) in the simulation history file in storage and the corresponding RCC hardware accelerator 2620 of simulated time t3 from the hardware model that designs.Like this, when user when design debugging from simulated time t0 to simulated time t3, he can continue forward to simulated time t3 to do further debugging if necessary.The user does not need surpassing certain his design of some debugging of simulated time t3 from the simulated time t0 simulation of reruning with this.
In a word, (promptly simulate Process Extents) from simulated time t0 to simulated time t3, the user supplies with RCC hardware accelerator 2620 by the primary input of the platform program of self-test in the future 2603 on circuit 2610, thereby compresses same primary input simultaneously and store them into system disk for substantially acceleration design of following reference.Rcc computing system 2600 need be stored primary input (compression or other mode) to reproduce debug procedures in the input history file.Squeeze operation also takes place with the data estimation in RCC hardware accelerator 2620 is parallel.At last, on the simulated time t3 of simulation Process Extents end, the RCC system stores the status information of hardware model in the simulation history file into.
In one embodiment of the invention, come the primary input of compression of all records of self simulation Process Extents will be modified after being to be used for a part from the same file of the hardware status information of simulated time t3.In another embodiment, the canned data of self simulation Process Extents and be stored as independently file respectively in the future in system disk from the hardware status information of simulated time t3.Similarly, can revise above-mentioned any file with the VCD on-demand information, this VCD on-demand information is the generation of simulated target scope after being.Perhaps, the VCD on-demand information can be stored in the different VCD file of system disk, and this VCD file separates with simulated time t3 hardware status information file with the master import file of compression.In other words, according to one embodiment of present invention, the input history file, simulation history file and VCD file may be incorporated in the file.Simultaneously, the input history file and simulate history file also may be incorporated in one with file that the VCD file separates in.
Compression scheme is discussed now.According to one embodiment of present invention, to allow the compressibility of the primary input incident of each simulated time step 10% incoming event be 20X to the compressed logic of RCC system.Like this, a large-scale ASIC design that has above 1,000,000 gate circuits may need 200 primary input incidents.For 10% incoming event that each simulated time step takes place, nearly 20 inputs need be compressed and record.If each input signal is 2 byte longs, 20 input signals cause needing to handle the data of 40 bytes in the primary input of each simulated time step.For a compressibility 20X, each simulated time step can shorten the data that 40 bytes are pressed into the data of 2 bytes.Like this, for the design of about 1,000,000 the simulated time steps of needs, the RCC system is compressed into primary input the data of 2 megabyte.This big or small file can easily be handled by any calculation document system and waveform viewer.In one embodiment, use the ZIP compression.
According to an embodiment, the primary input compression is carried out simultaneously with the primary input estimation that RCC hardware accelerator 2620 carries out; The input history file generates with the primary input estimation and takes place simultaneously.Therefore, compression scheme does not produce direct negative effect to the performance of RCC system.Unique contingent bottleneck be will compression primary input record process in the system disk.Yet because data by high compression, design for the great majority with the operation of per second 50,000 simulated time steps, the deceleration of RCC system will be less than 5%.
As for the concrete mode of controlling recording in the RCC system, according to one embodiment of present invention, user at first Shi Yong $rcc (record) orders with initialization RCC writing function:
$rcc(record,name,<disk?space>,<checkpoint?control>);
<disk space will be discussed now for argument name〉and<checkpoint control explanation." name " argument is the record name of current simulation Process Extents.Need different titles to distinguish the different dry runs of same design.VCD for off line especially needs an independently record name with the examination of recruiting.
<disk space〉argument is the optional parameter of the regulation maximum magnetic flux disk space (is unit with Mb (megabyte)) of distributing to RCC system log (SYSLOG) program.Default value is 100MB.The RCC system only is recorded in the up-to-date part of current simulation Process Extents in the disk space of regulation.In other words, if general<disk space〉value is defined as 100MB but current simulation Process Extents occupies 140MB, then the RCC system only the last 100MB of the primary input of recording compressed delete preceding 40MB simultaneously.Of the present invention this is characterized as fault analysis a benefit is provided.In one embodiment of the invention, the test board program have some self-test functions detect simulated failure and stop the simulation.The up-to-date history of RCC simulation can provide most information for such fault analysis.
<checkpoint control〉argument is the optional parameter that regulation is carried out the quantity of the required simulated time step in complete state checkpoint.Default is 1,000,000 time step.As the most conventional compression algorithm, the compression primary input is also based on the state difference between the continuous simulated time step.For long-term dry run, the checkpoint of complete RCC state can promote to simulate history widely and extracts on a set low frequency.For one in the RCC system per second 20K to the decompress(ion) rate of 200K simulated time step and each 1,000,000 step 1 checkpoint, RCC system (simulation of promptly duplicating the VCD file generation of primary input and selection) any simulation history of can in 5 to 50 seconds, extracting.
When calling this $rcc (record) order, it is historical that the RCC system will write down simulation; That is, the compression and in a file record primary input be used for being stored in system disk.Because it is do not need to rebuild software logic, so out in the cold from the main output of RCC hardware accelerator in this time.Logging program can be terminated with order $rcc (stop) Huo $rcc (off), and at this point, the RCC system will simulate to control and switch the part model that eases back.At this moment, handle main output and be used for the software logic reconstruction.
VCD generates---decompress(ion) and dump
As mentioned above, the RCC system has stored the software model that begins to locate and the hardware model of simulation Process Extents at simulated time t0, in the input history file, write down the primary input that is used for whole simulation Process Extents of compression, and at simulated time t3 place, in the simulation history file, stored the hardware model state of the design at simulation Process Extents end.Now the user have enough information always the design information of self simulation time t0 be written into the design that the simulation Process Extents begins to locate.The primary input of compression has been arranged, any part that the user can software simulation his design.But because VCD is with selecting feature, the user does not perhaps want in his design of this software simulation.On the contrary, the user wants to generate a VCD file that is used for selected simulated target scope and is used for meticulous analysis and isolates and solve fault.Really, the compression primary input of record has been arranged, the RCC system can be reproduced in any point in the simulation Process Extents.And if need, the RCC system can simulate outside current simulation Process Extents by being written into the previous hardware status information from simulated time t3 storage.
After quick board design, the customer inspection result is to determine whether to exist a fault.If there is not tangible fault, may there be fault for current simulation Process Extents design so.The user can continue to continue simulation to next simulation Process Extents outside current simulation Process Extents then, and no matter what selected scope is.But, if the user has determined design certain problem being arranged, he must more carefully simulate and isolate and solve fault.Because whole simulation Process Extents is too big for careful detailed analysis, thus the user must aim at one specific than close limit to do further research.Based on the debugging effort being familiar with and perhaps passing by of user to design, the user makes rational conjecture for location of fault in the simulation Process Extents.The user will concentrate on the simulated target scope of a selection, and this simulated target scope should be corresponding to the conjecture (or fault is with the position that occurs) of abort situation with the user.The user determines that the simulated target scope is between simulated time t1 shown in Figure 84 and simulated time t2.
The RCC system is with previous being written into the software model of design and being written into hardware model in RCC hardware accelerator 2620 from emulation mode t0 stored configuration information rcc computing system 2600.The RCC system is simulated to simulated time t1 fast from simulated time t0 then.In quick simulated operation, rcc computing system is written into the previously stored file that contains the primary input of compression.The primary input of the primary input of rcc computing system decompress(ion) compression and transmission decompress(ion) is used for estimation to RCC hardware accelerator 2620.As the quick simulated operation compression of beginning and the primary input of storage simulation Process Extents, quick simulated operation, do not store main output (for example, hardware model nodal value and buffer status) as the estimation result from simulated time t0 to simulated time t1.
In case simulated operation reaches the beginning or the simulated time t1 of simulated target scope fast, the RCC system will be from estimation result (the promptly main output O of the hardware model in the RCC hardware accelerator 2620
p) be dumped in the VCD file of system disk.Different with the initial quick simulated operation of simulation Process Extents, rcc computing system 2600 is not carried out any compression.And, because the user need be in this time check estimation result, so rcc computing system 2600 is not carried out the reconstruction operation of any software model.By not carrying out the reconstruction operation of any software model, the RCC system can generate the VCD file apace.
But in other embodiments, the user may check his software model of design of this simulated time section from t1 to t2 in the main output of storage.If at that rate, rcc computing system 2600 executive software Model Reconstruction operations is to allow customer inspection all states from any aspect of his design.
At simulated time t2, rcc computing system 2600 stops at the estimation output of storing in the VCD file from RCC hardware accelerator 2620.At this point, the user can stop quick simulation.VCD file and user that present RCC system has complete simulated target scope can continue more at large to analyze the VCD file.
When the user wanted to analyze the VCD file, he did not need to rerun from the beginning the simulation of (for example, simulated time t0).On the contrary, the user can order the RCC system to be written into the hardware status information that begins to store from the simulated target scope, and checks Simulation result with software model.This part will illustrate in greater detail in the historical part of checking of simulation.
Based on the analysis of VCD file, the user may find or may not find fault.If the discovery fault, the user can begin to adjust design certainly.If do not find fault, the user may exist the simulated target scope of fault to make wrong conjecture to suspection.The user must utilize the same program about decompress(ion) and VCD file dumping of his above-mentioned use.The user has done another conjecture, and wishing has a better simulated target scope in the simulation Process Extents.Then, the RCC system from the simulation Process Extents begin simulate apace to the beginning of new simulated target scope, the decompress(ion) primary input and transfer them to RCC hardware accelerator 2620 be used for the estimation.When the RCC system reaches the section start of new simulated target scope, be dumped in the VCD file from the main output of RCC hardware accelerator 2620.At the end of new simulated target scope, the RCC system stops hardware status information being dumped in the VCD file.At this point, the user can check that the VCD file is used for isolated fault.
In a word, from simulated time t0 to simulated time t1, the RCC system by the previous compression of decompress(ion) primary input and transfer them to hardware model and be used for estimation, board design apace.The simulated target scope process from simulated time t1 to simulated time t2, the RCC system will be dumped in the VCD file from the main output of hardware model.At the end of simulated target scope, the user can stop board design apace.At this point, the VCD file is checked in the simulation that the user can not reruned from the beginning at simulated time t0 by directly entering simulated time t1.
When finishing the inspection of this simulated target scope and isolation and eliminating fault, the user can proceed to next simulation Process Extents.This new simulation Process Extents begins at simulated time t3.The user selects the length-specific of new simulated target scope, and the length-specific of the simulated target scope that this is new can be long equally with previous simulation Process Extents.The RCC system is written into the corresponding previously stored hardware status information with simulated time t3.The RCC system quick simulation of this new simulation Process Extents all set now.Notice that this new simulation Process Extents is corresponding with the scope from simulated time t0 to t3, the hardware state that wherein is written into is now corresponding with simulated time t0.Simulation fast, VCD is with selecting dump similar to above-mentioned content with the VCD checking process.
According to one embodiment of present invention, the decompress(ion) step does not cause negative effect to performance.The RCC system can be historical with the simulation of the speed decompress(ion) of per second 20,000 to 200,000 simulated time steps (i.e. compression with primary input record).The control of suitable checkpoint has been arranged, and (promptly reproducing the simulation that the VCD file by primary input and the selection generates) simulation of can extracting in 50 seconds of RCC system is historical.
With the concrete mode of selecting feature, the user must Shi Yong $axis_rpd order as for control VCD in the RCC system.$axis_rpd is an interactive command, and a VCD file is write down and generates as required in the RCC that is used for extracting estimation.The internal simulation state be can not recoil with the execution that the different , $axis_rpd of simulation recoil technology of routine order and outside and file I/O state do not destroyed yet.The user can continue simulation after calling order, it is the same that used mode and user can simulated mode after ordering.
, $axis_rpd order is presented at all the available simulated time sections in the simulation Process Extents when not stipulating argument, that is, the user can select the simulated target scope.Chronomere is a chronomere same in command line interface.The example of an analog record is as follows:
C1>$rcc(record,r1);
C2>#1000$rcc(xt0,run);
C3>#50000$rcc(off);
C4>#50500$rcc(run);
C5>#60000$rcc(stop);
---Start?RCC?engine?at?100500.
---Back?to?SIM:stop?RCC?engine?at?5000000.
---Start?RCC?engine?at?5050500.
---Back?to?SIM:stop?RCC?engine?at?6000000.
Interrupt?at?simulation?time?60000.0000ns
C6>$axis_rpd;
available?simulation?history:
1005.000000?to?50000.000000
50505.000000?to?60000.000000
Interrupt?at?simulation?time?60000.0000ns
From this analog record, the user use the RCC engine next-door neighbour 1000 to 50000 after formation time and next-door neighbour 50500 to 60000 after formation time.The analogue window of , $axis_rpd displayed record like this.
In order to generate a VCD file from simulating history, the user uses has following control argument De $axis_rpd order:
$axis_rpd(start-time,end-time,“dump-file-name”,<level?and?scopecontrol>);
Start-time and end-time have stipulated the simulated time window of VCD file, perhaps simulated target scope.The unit of time control argument is the chronomere that is used for command line interface." dump-file-name " is the VCD filename.Dump<level and scope control〉Biao Zhun $dumpvars order among parameter and the IEEE Verilog is equal to.
Below be the example of Ge $axis_rpd order:
C7>$axis_rpd(50505,50600,“f1.dump”);
---Start?RCC?VCD?at?50505.010000!!
---End?RCC?VCD?at?50600.000000!!
Interrupt?at?simulation?time?60000.0000ns
This $axis_rpd order generated one " f1.dump " by name, from the VCD file of the simulated target scope of simulated time 50505 to 50600.Zheng Ru $dumpvars is not if provide level and scope control parameter , $axis_rpd order with whole hardware state of dump or main output.
Another uses the example of $axis_rpd order as follows:
C8>$axis_rpd(40444,50600,“f1.dump”,2,dp0);
---Start?RCC?VCD?at?40000.000000!!
---Skipat?time?50000.000000.
---Continue?at?time?50505.000000!!
---End?RCC?VCD?at?50600.000000!!
Interrupt?at?simulation?time?60000.0000ns
This $axis_rpd order has generated one at 2 grades VCD file " f2.dump " from the scope dp0 of time 40000 to 50600.Because simulation exchanges the part control , $axis_rpd that eases back and skips that window during the time 50000 to 50500, because there is not available analog record.
After user's end simulation process, also can obtain VCD with choosing.In order to obtain off-line VCD with choosing, user usefulness+rccplay option starts the simulator program of " vlg " by name.This option has been arranged, can indicate the extract analog record rather than carry out the normal initialization sequence of simulation of RCC system.In case the user enters simulator program, the user can use same De $axis_rpd order to obtain VCD with choosing.An example of this process is as follows:
axis15:3-dpO_rtlc>vlg+rccplay+rl-s
---Start?replay?record./Axis?Work/rl?at?time?100500
C1>$axis_rpd;
available?simulation?history:
1005.000000?to?50000.000000
50505.000000?to?60000.000000
Interrupt?at?simulation?time?100500
C2>$axis_rpd(40000,45000,“f2.dump”);
---Start?RCC?VCD?at?40000.000000!!
---End?RCC?VCD?at?45000.000000!!
Interrupt?at?simulation?time?4500000
C3>
In above-mentioned example, historical and be created in VCD from the whole design of time 40000 to 45000 with analog record " r1 " simulation of extracting.
The simulation history review
In case the VCD file of simulated target scope (for example simulated time t1 is to t2) is generated by the RCC system, the user does not need to simulate apace to t3 from simulated time t2.On the contrary, the permission user of RCC system stops to simulate and directly running to the beginning of simulated target scope, or simulated time t1.Like this, compared with prior art, the user needn't be reruned from the beginning the simulation of (for example, simulated time t0).Be dumped to hardware state in the VCD file and reflected, comprised primary input from simulated time t1 to t2 from the estimation of the whole history of the primary input of simulated time t0.
The RCC system is written into the VCD file.After this, sending the main output of storing to rcc computing system 2600 makes and can rebuild software model and its all combinational logic circuits with correct status information.The user debugs with a waveform viewer inspection software model then.Use existing VCD, the user can very carefully debug his software model step by step up to isolated fault.
This VCD has been arranged with selecting feature, the user can select any simulated target scope and executive software simulation with isolated fault in the simulation Process Extents.If can not find fault in the simulated target scope of selecting, the user can select another different simulated target scope as required.Because all primary inputs of having write down from the test board program are used for whole simulation Process Extents, can reproduce and check any part of this simulation as required.This feature allows the user to concentrate multiple different simulated target scope to repair fault up to the user in this simulation Process Extents repeatedly.
In addition, this VCD is supported with selecting feature under the online situation in simulation process, after simulation process finishes, also supports this VCD with selecting feature under the off-line situation.Online support is feasible, because the hardware state on simulated time t0 can be stored in the system disk and can compress and record be used for the primary input of the simulation Process Extents of any length.Therefore, the user can stipulate that then a simulated target scope does more concentrated analysis with this to main output.
The off-line support is feasible, because the hardware state on simulated time t0, the whole primary inputs and the hardware state on simulated time t1 of simulation Process Extents all are stored in the system disk.Like this, the user can be by being written into the corresponding design of simulated time t0 and following regulation simulated target scope and get back in his design of debugging.Simultaneously, the user can directly proceed to next simulated target scope by being written into the corresponding hardware state of simulated time t3.
VI. hardware implementations
A. general introduction
Analog Simulation System realizes the fpga chip array on reconfigurable circuit board.Based on hardware model, Analog Simulation System is shone upon each selected portion subregion of subscriber's line circuit design, and place and route is on fpga chip.Therefore, 4 * 4 arrays that for example have 16 chips can be simulated the large scale circuit that is deployed on these 16 chips.Interconnect scheme can make each chip visit another chip 2 times " redirect " or within connecting.
Each fpga chip be each input/output address space (that is, and REG, CLK, S2H H2S) is provided with an address pointer.All address pointers relevant with specific address space are linked at together mutually.So, in data transmission procedure, order from/select digital data each chip to main FPGA bus and pci bus, at one next word of the selected address space in each chip, and next chip is up to being selected address space access till the desired digital data.The select progressively of this digital data is selected signal by the transmission word select and is finished.This word select is selected signal and is passed the address pointer in the chip and then be sent in the address pointer of next chip, and to the last chip or system carry out initialization to address pointer in continuation like this.
Bandwidth when the FPGA bus system in reconfigurable circuit board is worked is the twice of pci bus, but speed only is pci bus half.Therefore, fpga chip is divided into some groups to utilize the bus of bigger bandwidth.The processing power of this FPGA bus system can be comparable to the processing power of pci bus system, so do not lose performance because of the reduction of bus speed.Can adopt the bigger circuit board or the on-board circuitry plate extension group length that comprise more fpga chips to realize expansion.
B. address pointer
Figure 11 has shown an embodiment of address pointer of the present invention.All I/O operations all will be through DMA stream.Because only there is a bus in system, so system is by the visit data of the mode order of next word.Therefore, address pointer embodiment uses the selected word in these address spaces of visit of shift register chain sequence.Address pointer 400 comprises trigger 401-405, AND gate 406, and pair of control signal, INITIALIZE 407 and MOVE 408.
Each address pointer have n output (W0, W1, W2 ..., Wn-1), be used for from the n of each fpga chip possible word, selecting a word corresponding to the same word in the selected address space.According to modeled specific user's circuit design, the quantity n of the word of different circuit design is also different, and for given circuit design, the n of different fpga chips is also different.In Figure 11, address pointer 400 only is 5 words (that is address pointer n=5).Therefore, this comprise be used for specific address space 5 word address pointers specific fpga chip only to have 5 words available.Much less, address pointer 400 can have the word of any amount n.This output signal Wn also can be called as word select and select signal.When this word select was selected signal and arrived the output of last trigger in this address indicator, it was called as the OUT signal, and is transmitted to the input of the address pointer of next fpga chip.
When asserting the INITIALIZE signal, the initialization address indicator.First trigger 401 is set to " 1 ", and every other trigger 402-405 is set to " 0 ".At this moment, the initialization of address pointer can not start any word select and selects; That is to say that after the initialization, all Wn outputs still are " 0 ".The initialization procedure of address pointer will be in conjunction with Figure 12 discussion.
The process that the word select of MOVE signal controlling indicator is selected.This MOVE signal derives from index control signal READ, WRITE and the SPACE from the FPGA i/o controller.Because each operation all is once to read or write in essence,, the SPACE exponential signal will be applied to which address pointer so having determined the MOVE signal in fact.Therefore, system once only activate one with the relevant address pointer of a selected input/output address space, and in this process, system only is used for this address indicator with the MOVE signal.The generation of MOVE signal will further be discussed in conjunction with Figure 13.According to Figure 11, when asserting the MOVE signal, the MOVE signal is offered input of AND gate 406 and the startup input of trigger 401-405.Like this, at every system clock cycle, a logical one will be exported Wi from word and move to Wi+1; That is to say that in per clock period, indicator will move to Wi+1 to select specific word from Wi.When the output 413 (being labeled as " OUT " at this) that signal advances to last trigger 405 is selected in the displacement word select, unless initialization address indicator once more, after this this OUT signal should enter next fpga chip (will these processes be discussed in conjunction with Figure 14 and 15) by the multiplex chip address indicator link of striding.
Set forth the initialization procedure of address pointer now.Figure 12 has shown the initialized state transition diagram of address pointer shown in Figure 11.During beginning, state 460 is idle.When DATA_XSFR was set to " 1 ", system got the hang of 461, and address pointer here is initialised.At this, assert the INITIALIZE signal.First trigger in each address pointer is set to " 1 ", and the every other trigger in the address pointer is set to " 0 ".At this moment, the initialization of address pointer can not start any word select and selects; That is to say that all Wn outputs still are " 0 ".Next state is a waiting status 462, and DATA_XSFR still is " 1 " simultaneously.When DATA_XSFR was " 0 ", the initialization procedure and the system that finish address pointer returned idle condition 460.
Be illustrated as the MOVE signal generator that address pointer produces different MOVE signals now.By FPGA i/o controller (327 among Figure 10; Figure 22) the SPACE index that produces is selected specific address space (that is, REG reads, and REG writes, and S2H reads, and H2S writes and CLK writes).In this address space, the specific word of the selection of systematic order of the present invention is for visit.Alphabetic word is chosen in each address pointer and finishes by the MOVE signal.
Figure 13 has shown an embodiment of MOVE signal generator.Each fpga chip 450 has the address pointer corresponding to different software/hardware boundary address spaces (that is, REG, S2H, H2S, and CLK).Except address pointer and simulation and be implemented in subscriber's line circuit design in the fpga chip 450, also has MOVE signal generator 470 in the fpga chip 450.MOVE signal generator 470 comprises an address space code translator 451 and some AND gate 452-456.Input signal is the FPGA read signal (F_RD) on the wire line 457, the FPGA write signal (F_WR) on the wire line 458, and address space signal 459.The output MOVE signal that is used for each address pointer is according to the address pointer of using which address space, corresponding to the REGR-move on the wire line 464, REGW-move on the wire line 465, S2H-move on the wire line 466, H2S-move on the wire line 467, and the CLK-move on the wire line 468.These output signals are corresponding to the MOVE signal (Figure 11) on the wire line 408.
Address space code translator 451 receives 3 input signals 459.It also can receive 2 input signals.These 2 signals provide 4 possible address spaces, and 3 signals provide 8 possible address spaces.In one embodiment, CLK is assigned as " 00 ", S2H is assigned as " 01 ", and H2S is assigned as " 10 ", and REG is assigned as " 11 ".According to input signal 459, one " 1 " of output corresponds respectively to REG on the output terminal of address space code translator in wire line 460-463, S2H, and H2S, and CLK, Sheng Xia wire line is set to " 0 " simultaneously.Therefore, if any one is " 0 " among these output lead circuits 460-463, the output of its corresponding AND gate 452-456 also is " 0 " so.Same, if any one is " 1 " among these input lead circuits 460-463, the output of its corresponding AND gate 452-456 also is " 1 " so.For example, if address space signal 459 is " 10 ", then selected address space H2S.Wire line 461 is that the wire line 460,462 and 463 that " 1 " is left is " 0 ".Accordingly, wire line 466 is that the output lead circuit 464,465,467 and 468 that " 1 " is left is " 0 ".Equally, if wire line 460 is " 1 ", then selected address space REG and according to selected be read (F_RD) still write (F_WR) operation, REGR-move signal on the wire line 464 or the REGW-move signal on the wire line 465 will be " 1 ".
As explaining in the preamble, the SPACE index is produced by the FPGA i/o controller.Use coded representation, MOVE is controlled to be:
REG reads indicator in the space: REGR-move=(SPACE-index=#REG) ﹠amp; READ;
REG writes indicator in the space: REGW-move=(SPACE-index=#REG) ﹠amp; WRITE;
S2H reads indicator in the space: S2H-move=(SPACE-index==#S2H) ﹠amp; READ;
H2S writes indicator in the space: H2S-move=(SPACE-index==#H2S) ﹠amp; WRITE;
CLK writes indicator in the space: CLK-move=(SPACE-index==#CLK) ﹠amp; WRITE;
This is the coding with the logical diagram equivalence of MOVE signal generator shown in Figure 13.
As mentioned above, each fpga chip has the address pointer of equal number as the address space in the software/hardware border.If have 4 address spaces (that is, REG, S2H, H2S, and CLK) in the software/hardware border, then each fpga chip has 4 address pointers corresponding to these 4 address spaces.Each FPGA needs this 4 address pointers, because processed specific selection word can be arranged in any one or a plurality of fpga chip in selected address space, or the different circuit components of in each fpga chip, simulating and realizing of the data influence in the selected address space.For guaranteeing in suitable fpga chip with the selected word of suitable circuit element processes, with given software/hardware boundary address space (that is, REG, a S2H, H2S is with CLK) relevant every group address indicator crosses over a plurality of fpga chips and is " linked in " together.Described in conjunction with Figure 11 as mentioned, still use the word select system of selecting a good opportunity of carrying out specific displacement or transmission by the MOVE signal, except in this " link " embodiment, a relevant address pointer " links " with same address space in address pointer relevant with a specific address space and the next fpga chip in fpga chip.
Utilize 4 input pins and 4 output pin chained address indicators can realize same purpose.But this embodiment has been wasted very much with regard to effectively utilizing resource; That is to say, between two chips, need 4 leads, in each chip, need 4 input pins and 4 output pins.An embodiment according to system of the present invention uses the multiplexed chip address indicator link of striding, and it makes hardware model only use a lead between chip and only uses 1 input pin and 1 output pin (2 I/O pins in a chip) in each chip.A multiplexed embodiment who strides chip address indicator link as shown in figure 14.
In the embodiment shown in fig. 14, user's circuit design is mapped and be divided among three fpga chip 415-417 on the reconfigurable hardware plate 470.By piece 421-432 presentation address indicator.Each address pointer, for example address pointer 427 has the 26S Proteasome Structure and Function that is similar to address pointer shown in Figure 11, therefore different except the quantity Wn of word, and the quantity of trigger also may be according to the quantity of each the chip word that is used for the customization circuit design and different.
For the REGR address space, fpga chip 415 has address pointer 421, and fpga chip 416 has address pointer 425, and fpga chip 417 has address pointer 429.For the REGW address space, fpga chip 415 has address pointer 422, and fpga chip 416 has address pointer 426, and fpga chip 417 has address pointer 430.For the S2H address space, fpga chip 415 has address pointer 423, and fpga chip 416 has address pointer 427, and fpga chip 417 has address pointer 431.For the H2S address space, fpga chip 415 has address pointer 424, and fpga chip 416 has address pointer 428, and fpga chip 417 has address pointer 432.
Each chip 415-417 has a multiplexer 418-420 respectively.Should notice that these multiplexers 418-420 may be a model, real realization may be the combination of register and logic element then, known to the person of ordinary skill in the field.For example, multiplexer can be the form that as shown in figure 15 a plurality of AND gates are input to an OR-gate.Multiplexer 487 comprises four AND gate 481-484 and an OR-gate 485.Multiplexer 487 be input as OUT and MOVE signal from each address pointer in the chip.The output 486 of multiplexer 487 connects signal for the chain that is sent to next fpga chip input end.
In Figure 15, this specific fpga chip has four address pointer 475-478 corresponding to input/output address space.The output of address pointer, OUT and MOVE signal are the input of multiplexer 487.For example, address pointer 475 has an OUT signal on wire line 479, has a MOVE signal on wire line 480.These signals are transfused to AND gate 481.AND gate 481 is output as an input of OR-gate 485.The output of OR-gate 485 i.e. the output of multiplexer 487 for this reason.In operation, the OUT signal of the output terminal of each address pointer 475-478 serves as the selector signal of multiplexer 487 together with its corresponding M OVE signal and SPACE index; That is to say that OUT and MOVE signal (it derives from the SPACE exponential signal) must all be asserted effectively (that is logical one) and send out multiplexer and arrive chain and connect wire line word select is selected signal.To periodically assert the MOVE signal, select signal, thereby make it have the feature of input MUX data-signal to move word select by the trigger in the address pointer.
Referring to Figure 14, these multiplexers 418-420 has four groups of inputs and an output.Every group of input comprises: the OUT signal on last output Wn-1 wire line (for example, the wire line 413 of address pointer among Figure 11) of the address pointer that (1) is relevant with specific address space, and (2) MOVE signal.Each multiplexer 418-420 is output as chain and connects signal.When selecting the output terminal of last trigger in the signal Wn arrival address indicator by the word select of trigger in each address pointer, it becomes the OUT signal.Only work as an OUT signal and MOVE signal relevant with the identical address indicator and all be asserted to when effectively (that is, being asserted as " 1 "), the chain on the wire line 433-435 connects signal and just is " 1 ".
For multiplexer 418, be input as and correspond respectively to from the OUT of address pointer 421-424 and the MOVE signal 436-439 and the OUT signal 440-443 of MOVE signal.For multiplexer 419, be input as and correspond respectively to from the OUT of address pointer 425-428 and the MOVE signal 444-447 and the OUT signal 452-455 of MOVE signal.For multiplexer 420, be input as and correspond respectively to from the OUT of address pointer 429-432 and the MOVE signal 448-451 and the OUT signal 456-459 of MOVE signal.
In operation, for the displacement of any given word Wn, it is effective having only those address pointer or address pointer links relevant with a selected input/output address space in the software/hardware border.Therefore, in Figure 14,, have only in chip 415,416 and 417 and address space REGR for a given displacement, REGW, a relevant address pointer among S2H or the H2S is only effectively.Equally, select the given displacement of signal Wn, because the selected word of restriction of bus bandwidth is by the visit of order for a word select by trigger.In one embodiment, bus bandwidth is that 32 and a word also are 32, thus once only can visit a word, and give suitable resource with it.
When the address indicator just carries out transmission or when displacement that signal is selected in word select by its trigger, discharging chain connects signal and is not activated (promptly, be not " 1 "), and therefore, this multiplexer in this chip is unripe to be selected signal with word select and sends next fpga chip to.When the OUT signal was asserted to effectively (that is, " 1 "), chain connected signal and is asserted to effectively (that is, " 1 "), showed that system all set selects word select signal and transmit or be displaced to next fpga chip.Therefore, once a chip is conducted interviews; That is to say, carrying out for another chip before word select selects shifting function that the word select in chip is selected signal and is shifted by trigger.In fact, only select and assert that chain connects signal when signal arrives the terminal of address pointer in each chip when word select.With coded representation, chain connects signal and is:
Chain-out=(REGR-move®R-out)|(REGW-move®W-out)|(S2H-move&S2H-out)|(H2S-move&H2S-out);
In a word, for the input/output address space of the X in the system (that is, REG, S2H, H2S, and CLK), each FPGA has X address pointer, and an address pointer is corresponding to an address space.The size of each address pointer depends on the quantity of the word of analog subscriber custom circuit design in each fpga chip.Suppose that a specific fpga chip needs n word, and thereby, address pointer also has n word, this particular address indicator have n output (that is, and W0, W1, W2 ..., Wn-1).These output Wi is also referred to as word select and selects signal.When having selected specific word Wi, the Wi signal is asserted to effectively (that is, " 1 ").The end of signal address pointer in arriving this chip to displacement of the downstream of this chip address pointer or transmission is selected in this word select, herein, it triggers the generation that chain connects signal, and chain connects signal to be made word select select signal Wi to begin to transmit in the address pointer of next chip.In this way, on all fpga chips of crossing on this reconfigurable hardware plate, can realize a series of address pointers relevant with given input/output address space.
C. gate data/clock network analysis
Different embodiments of the invention execution clock analysis that combines with the analysis of gate data logic sum gate control clocked logic.Gated clock logic (or clock network) and gate data network determine to successful realizations of software clock and in simulation process the logic in the hardware model estimate very key.As described in conjunction with Fig. 4, carry out clock analysis in step 305.For further setting forth this clock analysis process, Figure 16 has shown process flow diagram according to an embodiment of the invention.Figure 16 has also shown the gate data analysis.
Analog Simulation System has the complete model of subscriber's line circuit design in software, have the some parts of subscriber's line circuit design in hardware.These hardware components comprise clock unit, especially the clock of Pai Shenging.Because sequence problem is transmitted and produced clock in the border between this software and hardware.Because have complete model in the software,, software influences the clock of register value edge so can detecting.Except the software model of register, these registers also necessary being in hardware model.Also estimate its corresponding input (that is, the data that D is imported move on to Q output) in order to ensure hardware register, the software/hardware border comprises a software clock.Software clock is guaranteed correct the estimating of the register in the hardware model.Software clock is the startup input of control hardware register in fact, rather than control is to the clock input of hardware register parts.Also therefore this software clock has avoided the race state, does not need to avoid holding time to upset with accurate sequential control.Clock network shown in Figure 16 and gate data logic analysis process provide a kind of method of simulating and realizing clock and data transmission system to hardware register, and it has been avoided the race state and software/hardware border embodiment flexibly is provided.
As previously mentioned, major clock is the clock signal from the test platform program.Every other clock for example is derived from those clock signals of combiner, for derive from or gated clock.Major clock can derive from gated clock and gate data-signal.For most of parts, only have that seldom (as, 1-10) derived from or gated clock is present in user's the circuit design.The clock of these derivations can be realized with the form of software clock and be present in the software.If in circuit design, exist relative populations bigger (as, above 10) clock that derives from, Analog Simulation System can with its modelling among hardware to reduce the I/O spending and to keep the performance of Analog Simulation System.The gate data are the control input of data or register, and it is different from the clock that drives from major clock by some combinational logics.
Gate data/clock analysis process starts from step 500.Step 501 is obtained useful source design database coding that results from the HDL coding and the register parts that user's register element are mapped to Analog Simulation System.The modelling step that this user register helps subsequently to being mapped with one to one of analog simulation register.In some cases, need this mapping to handle the subscriber's line circuit design of describing register element with specific original language.Therefore,, the analog simulation register can be quite easily used,, the embodiment of lower grade can be changed because the grade of RTL coding is enough high for the coding of RTL level.For the gate leve wire list, Analog Simulation System is made amendment with the cell library of access component and to it, makes the special logic element of its suitable specific circuit design.
Step 502 clock signal of from the register parts of hardware model, extracting.This step permission system determines the clock of major clock and derivation.This step is also determined all required clock signals of different parts in the circuit design.Coming since then, the information of step helps software/hardware clock models step.
Step 503 is determined the clock of major clock and derivation.Master clock source self-test platform component and only by modelling in software.The clock that derives from is from combinational logic, and it is driven by major clock conversely.According to default, the clock that Analog Simulation System of the present invention will keep deriving from is in software.If the negligible amounts of the clock that derives from (as, be less than 10), the clock models of these derivations can be turned to software clock so.Because it is less to generate the quantity of combiner of clock of these derivations, do not increase a large amount of I/O spendings in the software so these combiners are kept at.But, if the quantity of the clock that derives from big (as, above 10), the clock modelsization of these derivations can be paid wages to minimize I/O in hardware so.Sometimes, user's circuit design is used the clock unit of a large amount of derivations that derives from major clock.Therefore, system sets up clock in hardware, to keep less software clock quantity.
Steps in decision-making 504 needs system to remove to determine whether to find the clock of any derivation in user's circuit design.If no, then the result of step 504 ends at step 508 for "No" and clock analysis because in user's the circuit design all clocks all be major clock and these clocks all simple analog among software.If find the clock of derivation in user's circuit design, then the result of step 504 enters step 505 for "Yes" and algorithm.
Step 505 is determined the fan-out combiner from major clock to the clock that derives from.In other words, this step is by the clock signal data path of combiner tracking from major clock.Step 506 is determined the fan-in combiner from the clock that derives from.In other words, this step is followed the tracks of the clock signal data path from combiner to the clock that derives from.Determining of system's output and fan-in group carried out in circulation in software.The fan-in group of network N is as follows:
FanIn Set of a net N: find all the components driving net N; for each component X driving net N do: if the component X is not a combinational component then return; else for each input net Y of the component X add the FanIn set W ofnet Y to the FanIn Set ofnet N end for add the component X into N; end if end for
Fan-in (fan-in) group of the definite network N by repeatedly and fan-out (fan-out) is organized and their common factor is determined gated clock or data logical network.Final goal herein is to determine the so-called fan-in group of network N.Network N is a clock input node normally, is used for determining the gated clock logic from the angle of input.For determining gate data logic from the angle of input, network N is one and imports relevant clock input node with the data of closing on.If node is on register, network N is the clock input to this register, and this register is used for relative data input.All drive the parts of network N system discovery.For each parts X that drives network N, system determines whether parts X is combiner.If each parts X all is not a combiner, the fan-in group of network N does not contain combiner and network N is a major clock so.
But, be combiner if having a parts X at least, then system then determines the fan-in network Y of parts X.At this, system can further inquire after in circuit design backward by the input node that is found to parts X.For each fan-in network Y of each parts X, may there be the fan-in group W that is connected with network Y.With this fan-in group W of network Y and the fan-in group Fan-In Set addition of network N, then parts X adding group N.
Determine the fan-out group of network N in a similar fashion.The fan-out group of network N is as follows:
FanOut Set ofa net N: find all the components using net N; for each component X using the net N do: if the component X is not a combinational component then return; else for each output net Y of the component X add the FanOut Set of net Y to the FanOut Set ofnet N end for add the component X into N; end if end for
Again, the fan-in group of the definite network N by repeatedly and fan-out group and their common factor are determined gated clock or data logical network.Final goal herein is to determine the so-called fan-out group of network N.Network N is a clock output node normally, is used for determining the gated clock logic from the angle of fan-out.Therefore, the group of all logic elements of use network N will be determined.For determining the gated clock logic from the angle of fan-out, network N be one with the relevant clock output node of data output that closes on.If node is on register, network N is the clock output of register for this reason, and this register is used for relative major clock and drives input.All use the parts of network N system discovery.For each parts X that uses network N, system determines whether parts X is combiner.If each parts X all is not a combiner, the fan-out group of network N does not contain combiner and network N is a major clock so.
But, be combiner if having a parts X at least, then system then determines the output network Y of parts X.At this, system by find from the output node of parts X further the major clock from circuit design inquire after forward.For each output network Y of each parts X, may there be the fan-out group W that is connected with network Y.With the fan-out group W of this network Y and the fan-out group Fan-Out Set addition of network N, then parts X adding group N.
Step 507 is determined clock network or gated clock logic.Clock network is the common factor of fan-in and fan-out combiner.
Similarly, can use identical fan-in and fan-out principle to determine gate data logic.Similar with gated clock, the gate data are by data or the control input (except clock) of a major clock by a register of some combinational logics drivings.Gate data logic is the fan-in of gate data and from the common factor of the fan-out of major clock.Therefore, clock analysis and gate data analysis produce gate clock network/logic by some combinational logics and a gate data logic.As mentioned below, gated clock network and gate data network determine to successful realizations of software clock and in simulation process the logic in the hardware model estimate very key.Clock/data network analysis ends at step 508.
Figure 17 has shown the basic building block piece of hardware model according to an embodiment of the invention.For the register parts, Analog Simulation System uses the D flip-flop with asynchronous load control as the fundamental block that constitutes edge-triggered device (that is trigger) and level induction (that is latch) register hardware model.This register model component piece has following port: Q (output state); A_E (asynchronous starting); A_D (asynchronous data); S_E (starting synchronously); S_D (synchrodata); Certainly also has Svstem.clk (system clock).
This analog simulation register model is triggered by the just edge of system clock or the positive level of asynchronous starting (A_E) input.When just edge or positive level trigger event took place, the register model was sought asynchronous starting (A_E) input.If asynchronous starting (A_E) input is activated, then export the value that Q has asynchronous data (A_D); Otherwise, be activated if start (S_E) input synchronously, then export the value that Q has synchrodata (S_D).On the other hand, if asynchronous starting (A_E) or start synchronously (S_E) input and all be not activated is not then estimated the value of output Q, although the just edge of detection system clock.According to said method, the input of these its enable port has been controlled the operation of basic building block block register model.
System uses software clock (it is special startup register) to control the startup input of these register models.In the design of the subscriber's line circuit of complexity, in circuit design, have millions of elements and therefore, the analog simulator system will realize millions of elements in hardware model.The cost of single all these elements of control will be very high, because the operation that transmits millions of control signals to hardware model will spend the longer time than these elements of estimation in software.But, even this complex circuit design is usually also only called (1-10) clock seldom, and only with regard to clock enough control only have the state-transition of the system of register and combiner.The hardware model of analog simulator system only uses register and combiner.The analog simulator system also passes through the estimation of software clock control hardware model.In the analog simulator system, the hardware model that is used for register does not have the clock of direct other hardware componenies of connection; But control the value of all clocks by software kernel.By controlling several clock signals, kernel has the comprehensive control to the hardware model estimation, and coprocessor interferes expense to ignore simultaneously.
Being taken as latch according to the register model still is that trigger uses, and software clock is imported asynchronous starting (A_E) or started (S_E) wire line synchronously.The software clock of rim detection triggering from the software model to the hardware model by clock unit used.When software kernel detected the edge of clock unit, it was provided with clock edge register by the CLK address space.This clock edge register controlled is to the startup input of hardware register model, rather than the clock input.The global system clock still provides the clock input to the hardware register model.But clock edge register provides the software clock signal by a double buffer interface to the hardware register model.As hereinafter explaining, the double buffer interface from the software clock to the hardware model has guaranteed that all register models will be upgraded synchronously about the global system clock.Therefore, the danger that holding time is upset has been eliminated in the use of software clock.
Figure 18 (A) and 18 (B) have shown the building block register model of realizing latch and trigger.These register models are subjected to the control of software clock by suitable startup input.Being taken as latch according to the register model still is that trigger uses, asynchronous port (A_E, A_D) and synchronous port (S_E S_D) is used to software clock or I/O and operates.Figure 18 (A) has shown the realization of the register model that is taken as the latch use.Latch is the level induction; That is to say, if asserted clock signal (as, " 1 "), then export Q and follow input (D).At this, the software clock signal is provided for asynchronous starting (A_E) input, and the data input is provided for asynchronous data (A_D) input.For the I/O operation, software kernel uses and starts the input of (S_E) and synchrodata (S_D) synchronously, and value is downloaded to the Q port.The S_E port is used as REG space address indicator, the S_D port be used for to/from local data's bus access data.
Figure 18 (B) has shown the realization of the register model that is taken as the use of design trigger.The design trigger uses following port to determine next state logic: data (D) are provided with (S) reset (R), and startup (E).All next state logics of design trigger are all as the factor of the hardware combinations parts that enter synchrodata (S_D) input.Software clock is imported into synchronous startup (S_E) input.For the I/O operation, software kernel uses asynchronous starting (A_E) and asynchronous data (A_D) input, and value is downloaded to the Q port.The A_E port is used as REG space write address indicator, the A_D port be used for to/from local data's bus access data.
Software clock is discussed now.An embodiment of software clock of the present invention is the clock enabling signal to the hardware register model, makes the data of these hardware register model input ends together be estimated and synchronous with system clock like this.This has eliminated the race state and holding time is upset.An embodiment of software clock logic comprises that the clock edge in the software detects logic, and it detects the additional logic that triggers in the hardware according to the clock edge.This enabling signal logic was the startup input generation enabling signal of hardware register model before data arrive these hardware register models.Gated clock network and gate data network determine to successful realizations of software clock and in hardware-accelerated pattern the logic of hardware model estimate very key.As mentioned before, clock network or gated clock logic are the common factor of gated clock fan-in and major clock fan-out.Similarly, gate data logic also is the common factor of gate data fan-in and data-signal major clock fan-out.The notion of fan-in and fan-out above has been discussed in conjunction with Figure 16.
As indicated above, major clock is generated by the test platform program in the software.Derive from or gated clock is generated by the combinational logic and the register network that driven by major clock.According to default, Analog Simulation System of the present invention also is retained in the clock that derives from the software.If the negligible amounts of the clock that derives from (as, be less than 10), the clock models of these derivations can be turned to software clock so.Because it is less to generate the quantity of combiner of clock of these derivations, so these combiner modellings are increased a large amount of I/O spendings in software.But, if the quantity of the clock that derives from big (as, above 10), the clock and their the combiner modelling of these derivations can be paid wages to minimize I/O in hardware so.
Finally, according to one embodiment of present invention, the clock edge detection (by the input to major clock) that takes place in software can be translated into the clock detection (by the input to clock edge register) in the hardware.Clock edge in the software detects an incident that triggers in the hardware, makes register receive clock enabling signal before receiving data-signal in the hardware model, guarantees that the estimation of data-signal and system clock take place synchronously to avoid the holding time upset.
As mentioned before, Analog Simulation System has the complete model of subscriber's line circuit design in software, has the some parts of subscriber's line circuit design in hardware.As defined in the kernel, software can detect influences the clock of hardware register value edge.For guaranteeing that hardware register also estimates its corresponding input, the software/hardware border comprises a software clock.Software clock guarantees that the estimation of the register in the hardware model and system clock are synchronous, and does not have holding time to upset.Software clock is the startup input of control hardware register parts in fact, rather than control is to the clock input of hardware register parts.The double buffering method that realizes software clock has guaranteed that the estimation of register and system clock are synchronous, has avoided the race state, and has eliminated the needs to accurate sequential control, thereby avoided the holding time upset.
Figure 19 has shown an embodiment according to clock executive system of the present invention.During beginning, as described in conjunction with Figure 16, determine gated clock logic sum gate control data logic by the analog simulator system.Then separate gate clocked logic and gate data logic.When realizing double buffer, also must separate drive source and double buffering main logic.Therefore, according to fan-in and fan-out analysis, gate data logic 513 has been separated with gated clock logic 514.
The major clock register 510 of simulation comprises one first impact damper 511 and one second impact damper 512, and it is the D register.This major clock by modelling in software, but the double buffer device by modelling in software and hardware.The clock edge detects in the major clock register 510 that occurs in the software to trigger the software clock signal of hardware model generation to hardware model.Data and address enter first impact damper 511 respectively on wire line 519 and 520.The Q output of first impact damper 511 on wire line 521 links to each other with the D input of second impact damper 512.The Q output of first impact damper 511 also is provided for the clock input of gated clock logic 514 with first impact damper 516 of final drive clock edge register 515 by wire line 522.The Q output of second impact damper 512 is provided for gate data logic 513 finally to drive the input of the register 518 in the circuit model of custom-designed by wire line 530 by wire line 523.The startup of second impact damper 512 of major clock register 510 is input as on the wire line 533 the INPUT-EN signal from state machine, and its definite estimation cycle is also correspondingly controlled different signals.
Clock edge register 515 also comprises one first impact damper 516 and one second impact damper 517.Clock edge register 515 is implemented in the hardware.When the detection of clock edge takes place in software (by the input of major clock register 510), can trigger clock edge identical in hardware and detect (by clock edge register 515).D input to first impact damper 516 on the wire line 524 is set to " 1 ".Clock signal on the wire line 525 is from gated clock logic 514 and final from first output of impact damper 511 on wire line 522 in the major clock register 510.This clock signal on the wire line 525 is a door controling clock signal.Signal on the startup wire line 526 of first impact damper 516 for from the control I/O of state machine and estimation cycle (will introduce hereinafter)~the EVAL signal.First impact damper 516 also has the RESET signal on wire line 527.This identical RESET signal also will be provided for second impact damper 517 of clock edge register 515.The Q output of first impact damper 516 on wire line 529 is provided for the D input of second impact damper 517.Second impact damper 517 also has the input of startup CLK-EN signal on wire line 528, have a RESET input on wire line 527.The Q of second impact damper 517 output is provided for the startup input of the register 518 in the circuit model of custom-designed by wire line 532.Impact damper 511,512 and 517 is controlled by system clock together with register 518.Only the impact damper 516 in the clock edge register 515 is by the gated clock control from gated clock logic 514.
Register 518 is the typical D type register model of simulation in hardware, and is the part of customization circuit design.Its estimation process of the strict control of this embodiment by clock embodiment of the present invention.The final goal that this clock is set is to guarantee that the clock enable signal on the wire line 532 arrived register 518 before the data-signal on the wire line 530, makes the estimation of this register pair data-signal and the generation that the race state takes place and do not have system clock synchronously.
Reiterate, the major clock register of simulation 510 by modelling in software, but its double buffer device by modelling in software and hardware.Clock edge register 515 is implemented in the hardware.According to fan-in and fan-out analysis, gate data logic 513 and gated clock logic 514 are also separated being used for modeled purpose, and they can be by modelling in software in (if the quantity of gate data and gated clock is less) or the hardware (if the quantity of gate data and gated clock is bigger).Gated clock network and gate data network determine to successful realizations of software clock and in hardware-accelerated process the logic of hardware model estimate very key.
The realization of software clock mainly depends on clock setting shown in Figure 19 and signal~EVAL, INPUT-EN, and CLK-EN and RESET assert sequential.Major clock register 510 detects the generation that the clock edge triggers for hardware model software clock.This clock edge detection incident is by the input of the clock on the wire line 525, and gated clock logic 514 and wire line 522 trigger " activation " of clock edge register 515, make clock edge register 515 also detect identical clock edge.In this way, the clock edge that the clock detection that takes place in software (by the input 519 and 520 of major clock register 510) can be translated in the hardware detects (by the input 525 of clock edge register 515).At this moment, the CLK-EN wire line 528 of the INPUT-EN wire line 533 of second impact damper 512 of major clock register 510 and second impact damper 517 of clock edge register 515 also is not asserted, and does not therefore have data estimation to take place.Therefore, detecting the clock edge will take place before the estimated data in the hardware register model.Should note in this stage, also not be sent to gate data logic 513 from the data of data bus on the wire line 519 and enter the user register 518 of hardware modeling.In fact, data even also do not arrive second impact damper 512 of major clock register 510 are not because the INPUT-EN signal on the wire line 533 also is asserted.
In the I/O stage, assert on the wire line 526~the EVAL signal to be to start first impact damper 516 in the clock edge register 515.~EVAL signal, is also monitoring through 514 pairs of door controling clock signals of gated clock logic during to the input of the clock of the wire line 525 of first impact damper 516 by the gated clock logic.Therefore, in conjunction with as described in the 4 state estimated state machines, can keep as far as possible for a long time as required~the EVAL signal as hereinafter, with stable through partial data shown in Figure 19 and clock signal in the system.
Behind signal stabilization, I/O stops, or the preparation estimated data of system, and~EVAL is gone to assert that (deasserted) is to forbid first impact damper 516.Assert CLK-EN signal and be applied to second impact damper 517 starting second impact damper 517, and give the Q on the wire line 532 output, to the startup input of register 518 logical value on the wire line 529 " 1 " by wire line 528.Register 518 be activated now and wire line 530 on any data will be input to register 518 by the system clock synchronous clock.As the reader was observable, the enabling signal of register 518 was faster than the estimation of the data-signal that is input to this register 518 operation.
INPUT-EN signal on the wire line 533 is not asserted to second impact damper 512.And the RESET edge register signal on the wire line 527 is asserted to the impact damper 516 in the clock edge register 515 and 517 these impact dampers are resetted and guarantee that they are output as logical zero.The INPUT-EN signal has been asserted to impact damper 512 now, and the data on the wire line 521 are sent to gate data logic 513 to arrive subscriber's line circuit register 518 by wire line 530.Because the startup of register 518 input is a logical zero now, the data on the wire line 530 can't be by clock input register 518.But previous data were imported by clock by the enabling signal on the wire line 532 of before having asserted before the RESET signal is asserted to the register of forbidding 518.Therefore the input data of register 518, and the input of other registers (it is the part of user's hardware modeling circuit design) are stable for their relevant register input ports.When in software, detecting the clock edge subsequently, clock edge register 515 in major clock register 510 and the hardware activates the startup input of registers 518, makes the data of the input register 518 that clamps on and other wait for that the data of its corresponding registers of input are together imported by clock and synchronous with system clock.
As previously mentioned, software clock is realized mainly depending on clock setting shown in Figure 19 and signal~EVAL, INPUT-EN, and CLK-EN and RESET assert sequential.Figure 20 has shown the four condition finite state machine of controlling software clock logic shown in Figure 19 according to an embodiment of the invention.
At state 540, system's free time or some I/O operation are carried out.The EVAL signal is a logical zero.The EVAL signal is determined the estimation cycle, and it is generated by system controller, and can continue a lot of clock period as required with the logic in the systems stabilisation.Usually, time of continuing of EVAL signal is determined by the placement scheme in the compilation process and based on the length of long direct line and the length of the longest segmentation multipath transmission lead (that is TDM circuit).In estimation process, the EVAL signal is a logical one.
At state 541, clock is activated.The CLK-EN signal is asserted to logical one and therefore, has asserted the enabling signal of hardware register model.At this, gate data previous in the hardware register model are estimated synchronously, and the danger that does not have holding time to upset.
At state 542, when the INPUT-EN signal was asserted to logical one, new data were activated.Assert that also the RESET signal is to remove enabling signal from the hardware register model.But, be allowed to enter the new data of hardware register model by gate data logical network, be sent to the hardware register purpose of model ground of its expectation or arrived its destination continuing, and wait for when enabling signal is asserted once more by clock input hardware register model.
At state 543, the new data of transmission is stable in logic, and the EVAL signal remains on logical one simultaneously.In conjunction with Fig. 9 (A), 9 (B) and 9 (C) describe the time-division multiplex conversion as mentioned
(TDM) introduced during circuit, multiplexed lead also is a logical one.When the EVAL signal was gone to assert or is set to logical zero, system returned idle condition 540 and waits for and according to software the detection at clock edge being estimated.
D.FPGA array and control
The analog simulator system at first is compiled into the subscriber's line circuit design data in the software and hardware model based on a series of controls that comprise unit type.In the hardware compilation process, as described in conjunction with Fig. 6, system carries out mapping, the place and route process, and with the division of the best, the different parts of subscriber's line circuit design are formed in layout and interconnection.Use known programming tool, quote bit stream configuration file or purpose file able to programme (.pof) (perhaps, former binary file (.rbf)) and put the hardware plate that comprises many fpga chips again.Each chip comprises the part of the hardware model of respective user circuit design.
In one embodiment, the analog simulator system uses 4 * 4 fpga chip array, has 16 chips altogether.The example of fpga chip comprises Xilinx XC4000 Series FPGA logical unit and Altera FLEX 10K device.
Operable Xilinx XC4000 Series FPGA comprises XC4000, XC4000A, XC4000D, XC4000H, XC4000E, XC4000EX, XC4000L, and XC4000XL.Specific FPGA comprises Xilinx XC4005H, XC4025 and Xilinx4028EX.Xilinx XC 4028EX FPGA can drive 500,000 gate circuits nearly on a single PCI plate.The particulars of these Xilinx FPGA can be obtained in their databook, and [Xilinx, The Programmable Logic Data Book] (9/96), its content is incorporated this paper by reference into.For Altera FPGA, can in following databook, obtain particulars, [Altera, The 1996 Data Book] (in June, 1996), its content is incorporated this paper by reference into.
Below XC402 FPGA is briefly introduced.Each array chip is made up of the Xilinx chip of one 240 pin.The array board that is assembled with Xilinx XC4025 chip comprises about 440,000 configurable gate circuits, and can carry out the task of computation-intensive.The XilinxXC4025FPGA chip comprises 1024 configurable logical blocks (CLBs).Each CLB can realize 32 Asynchronous SRAM, or a spot of general boolean (Boolean) logic, and two strobe register.In the periphery of chip, has non-gating I/O register.Can substitute XC4025 with XC4005H.This is a lower-cost array board with 120,000 configurable gate circuits.The XC4005H device has powerful 24mA driving circuit, but lacks the I/O trigger of standard x C4000 series.Can obtain the particulars of these and other Xilinx FPGA by disclosed tables of data, its content is incorporated this paper by reference into.
By configuration data being written into the function that internal storage unit can customize Xilinx XC4000 Series FPGA.The value that is stored in these storage unit has been determined logic function and the interconnection among the FPGA.The configuration data of these FPGA can be stored in the on-chip memory and can be written into from external memory storage.FPGA can read configuration data from outside serial or parallel connection PROM, or from external unit configuration data is write FPGA.Can be many times with these FPGA reprogrammings, particularly change or user expectation hardware can adapt to different application the time in hardware dynamic.
The XC4000 Series FPGA generally has nearly 1024 CLB.Each CLB has the two-stage look-up table, and wherein two 4 input look-up tables (or function generator F and G) are that the 3rd 3 input look-up table (or function generator H) provides part input, and two triggers or latch.Can be independent of the output that these triggers or latch drive these look-up tables.CLB can realize the combination of following any boolean (Boolean) function: (1) has any function of four or five variablees, (2) has any function of four variablees, has nearly any second function of four irrelevant variables, and has nearly any the 3rd function of three irrelevant variables, (3) a function and functions with six variablees with four variablees, (4) have any two functions of four variablees, and (5) some have the function of nine variablees.The output that two D flip-flops or latch can be used for depositing the CLB input or store look-up table.Can be independent of look-up table and use these triggers.Can use direct input and H1 in DIN conduct or these two triggers or the latch to drive another by the H function generator.
In CLB each 4 input function generators (that is, F and G) comprise and are used for carry and the quick special-purpose arithmetical logic that generates of borrow signal, it can be configured to have 2 totalizers of carry input and carry output.These function generators also can be configured to read/write random access memory (RAM).4 input lead circuits can be used as the address wire of RAM.
Some is similar conceptive for AlteraFLEX 10K chip.These chips are the programmable logic device (PLD) (PLD) based on SRAM, have a plurality of 32 buses.More particularly, each FLEX 10K100 chip comprises about 100,000 gate circuit, 12 embedded Array blocks (EAB), 624 logic array blocks (LAB), each LAB has 8 logic elements (LE) (or 4,992 LE), 5,392 triggers or latch, 406 I/O pins and 503 pins altogether.
Altera FLEX 10K chip comprises the embedded Array of embedded Array block (EAB) and the logic array of logic array block (LAB).Can use an EAB realize multiple storer (as, RAM, ROM, FIFO) and the complex logic function (as, digital signal processor (DSP), microcontroller, multiplier, data converting function, state machine).For realizing memory function, EAB provides 2,048 positions.For realizing logic function, EAB provides 100 to 600 gate circuits.
By LE, can use LAB to realize medium sized logical block.Each LAB represents about 96 logic gates and comprises 8 LE and a local interlinkage.LE comprises one 4 input look-up table, and a programmable trigger device and being used to transmits the special signal path with cascaded functions.The typical logic function that can set up comprises counter, address decoder, or small status machine.
Can find AlteraFLEX 10K chip more detailed description in (in June, 1996) at [Altera, 1996 DATA BOOK], its content is incorporated this paper by reference into.Databook also comprises the particulars of the program development software of being supported.
Fig. 8 has shown 4 * 4 FPGA arrays and its interconnective embodiment.The embodiment that should note this analog simulator does not use crossbar switch or local crossbar switch to connect in fpga chip.Fpga chip comprises chip F11 in first row to F14, and second the chip F21 in capable is to F24, and the chip F41 of the chip F31 in the third line in F34 and the fourth line is to F44.In one embodiment, each fpga chip (as, chip F23) has the following pin that is used for the FPGA i/o controller interface of analog simulator system:
Interface | Pin |
Data bus |
| 32 |
The SPACE index | 3 |
?READ,WRITE,EVAL | 3 |
?DATA?XSFR | 1 |
The address pointer chain | 2 |
Therefore, in one embodiment, each fpga chip only is used for 41 pins the interface of analog simulator system.To these pins further be discussed in conjunction with Figure 22.
These fpga chips interconnect by non-crossbar switch or the interconnection of non local crossbar switch.Each interconnection of chip chamber, 44 pins or 44 wire lines are represented in for example interconnection 602 between chip F11 and the chip F14.In other embodiments, each interconnection representative surpasses 44 pin.Again in other embodiments, each interconnection representative is less than 44 pin.
Each chip has six interconnection.For example, chip F11 has interconnection 600 to 605.Equally, chip F33 has interconnection 606 to 611.The row of level and vertical row are lined up in these interconnection.Each interconnection provides the direct connection between two chips in two chip chambers in the delegation or the row.Therefore, for example, interconnection 600 directly connects chip F11 and F13; Interconnection 601 directly connects chip F11 and F12; Interconnection 602 directly connects chip F11 and F14; Interconnection 603 directly connects chip F11 and F31; Interconnection 604 directly connects chip F11 and F21; And 605 direct chip F11 of connection and the F41 that interconnect.
Same, for not being positioned at the array edges chip F33 of (as, chip F11), interconnection 606 is connection chip F33 and F13 directly; Interconnection 607 directly connects chip F33 and F23; Interconnection 608 directly connects chip F33 and F34; Interconnection 609 directly connects chip F33 and F43; Interconnection 610 directly connects chip F33 and F31; And 611 direct chip F33 of connection and the F32 that interconnect.
Because chip F11 is positioned within the once jump that begins from chip F13,600 be marked as " 1 " so interconnect.Because chip F11 is positioned within the once jump that begins from chip F12,601 be marked as " 1 " so interconnect.Same, because chip F11 is positioned within the once jump that begins from chip F14,602 be marked as " 1 " so interconnect.Same, for chip F33, all interconnection all are marked as " 1 ".
This interconnect scheme make each chip can twice " redirect " or the interconnection within array in other any chips get in touch.Therefore, chip F11 can be connected with chip F33 by any in following two paths: (1) interconnection 600 is arrived and is interconnected 606; Or (2) interconnection 603 is to interconnection 610.In a word, the path can be: (1) at first along row, again along row, or (2) are at first along row, again along row.
Although Fig. 8 has shown the fpga chip with level and perpendicular interconnection that is configured to 4 * 4 arrays, the actual physics on circuit board realizes it being to rely on low side and the high-end group of realization with expansion on-board circuitry plate.So in one embodiment, chip F41-F44 and F21-F24 are in the low side group.Chip F31-F34 and F11-F14 are in high-end group.The on-board circuitry plate comprises chip F11-F14 and chip F21-F24.Therefore, for array extending, can with contain a plurality of chips (as, 8) the on-board circuitry plate be added on the group, and be positioned at the current top that comprises the row of chip F11-F14.In another embodiment, on-board circuitry plate array extending below the current row that comprises chip F41-F44.Additional embodiments allows its expansion at chip F14, F24, the right of F34 and F44.Additional embodiments allows its expansion at chip F11 again, F21, the left side of F31 and F41.
Fig. 7 has shown the interconnectedness matrix that 4 * 4FPGA shown in Fig. 8 (field programmable gate array) array is represented in the mode of " 0 " and " 1 ".Utilize this interconnectedness matrix to generate the layout cost that produces by the cost function that in the hardware mapping of Analog Simulation System, place and route process, uses.Above introduced cost function in conjunction with Fig. 6.For example, chip F11 is positioned within the once jump that begins from chip F13, so the interconnectedness matrix table train value of F11-F13 is " 1 ".
Figure 21 has shown the interconnection output connecting pin of single fpga chip according to an embodiment of the invention.Each chip has six groups of interconnection, and wherein every group of interconnection comprises the pin of specific quantity.In one embodiment, every group of interconnection has 44 pins.The interconnection of each fpga chip is by level (Dong-Xi) arrange with vertical (North-south) direction.Interconnected set westwards is marked as W[43:0].Interconnected set eastwards is marked as E[43:0].Interconnected set northwards is marked as N[43:0].Interconnected set to the south is marked as S[43:0].The close set of these interconnection is used for the connection between adjacent chips; That is to say that these interconnection do not have " jump " to cross any chip.For example, in Fig. 8, the N[43:0 of chip F33] for interconnecting 607, E[43:0] for interconnecting 608, S[43:0] and be interconnection 609, W[43:0] be interconnection 611.
Get back to Figure 21, also have two additional interconnection groups.An interconnected set is used for vertical non-adjacent interconnection-YH[21:0] and YH[43:22].Another interconnected set is used for the non-adjacent interconnection-XH[21:0 of level] and XH[43:22].Each group, YH[...] and XH[...], be divided into two parts, wherein per half group comprises 22 pins.This configuration makes that the manufacturing of each chip is all identical.Therefore, each chip can with its top, below, the interconnection in once jumping of left and right-hand non-adjacent chip.This fpga chip has also shown and has been used for overall signal, the pin of FPGA bus and JTAG signal.
The FPGA i/o controller is discussed now.In Figure 10, briefly introduced this controller 327 before this.Data between FPGA i/o controller management pci bus and the FPGA array are communicated by letter with control.
Figure 22 has shown the FPGA controller embodiment between pci bus and the FPGA array, and the fpga chip group.FPGA i/o controller 700 comprises CTRL_FPGA unit 701, clock buffer 702, pci controller 703, EEPROM 704, FPGA series arrangement interface 705, boundary scan testing interface 706, and impact damper 707.The suitable power regulating circuit that provides the person of ordinary skill in the field to know.The example of power supply comprises Vcc, and it is connected with sensor amplifier with voltage-level detector/regulator, keeps voltage under varying environment.The film fuse that between each fpga chip and its Vcc, has snap action.Vcc-HI is offered the CONFIG# of all fpga chips and the LINTI# of LOCAL_BUS 708.
CTRL_FPGA unit 701 is the master controller of FPGA i/o controller 700, is responsible for handling different control, test, and the mass data between read/write different units and bus.CTRL_FPGA unit 701 is connected with high-end group with the low side of fpga chip.Fpga chip F41-F44 and F21-F24 (that is low side group) link to each other with low side FPGA bus 718.Fpga chip F31-F34 and F11-F14 (that is, high-end group) link to each other with high-end FPGA bus 719.These fpga chips F11-F 14, F21-F24, F31-F34 and the fpga chip of F41-F44 corresponding to identical numbering among Fig. 8.
At these fpga chips F11-F14, F21-F24 is the thick film chip resistor between F31-F34 and F41-F44 and low side group bus 718 and the high-end group of bus 719, is used for correctly being written into.Resistor group 713 comprises for example resistor 716 and resistor 717, links to each other with low side group bus 718.Resistor group 712 comprises for example resistor 714 and resistor 715, links to each other with high-end group of bus 719.
Expansion if desired, can be on low side group bus 718 and high-end group of bus 719 the more fpga chip of right-hand arrangement of fpga chip F11 and F21.In one embodiment, expand by the on-board circuitry plate of similar on-board circuitry plate 720.Therefore, if these fpga chip groups only have 8 fpga chip F41-F44 and F31-F34 at first, can realize further expansion by increasing on-board circuitry plate 720 so, on-board circuitry plate 720 is included in fpga chip F24-F21 in the low side group and the chip F14-F11 in high-end group.On-board circuitry plate 720 also comprises additional low-end and high-end group of bus and thick film chip resistor.
Pci controller 703 is the main interface between FPGA i/ o controller 700 and 32 pci buss 709.If pci bus expands to 64 and/or 66MHz, Adjustment System that can be suitable and can not depart from the spirit and scope of the present invention.Will be described herein-after these adjustment.Operable pci controller 703 PCI9080 or 9060 who is exemplified as PLXTechnology company in system.PCI9080 has suitable local bus interface, control register, and FIFO (first-in first-out), and to the pci interface of pci bus.The databook of PLX Technology company, the content of [PCI9080 Data Sheet] (0.93 edition, on February 28th, 1997) is incorporated this paper by reference into.
Pci controller 703 is by LOCAL_BUS 708 Data transmission between CTRL_FPGA unit 701 and pci bus 709.LOCAL_BUS comprises and is respectively applied for control signal, the control bus part of address signal and data-signal, address bus part, and data bus part.If pci bus expands to 64, the data bus of LOCAL_BUS 708 part also can expand to 64.Pci controller 703 is connected with EEPROM 704, and it comprises the configuration data of pci controller 703.The 93CS46 that example is National Semiconductor (National Semiconductor) of EEPROM 704.
Pci bus 709 provides the clock signal of 33MHz for FPGA i/o controller 700.Clock signal is provided for clock buffer 702 by wire line 710 and is used for synchronous purpose and low time lag purpose.This clock buffer 702 is output as global clock (GL_CLK) signal of 33MHz, and it is provided for all fpga chips by wire line 711, and is provided for CTRL_FPGA unit 701 by wire line 721.If pci bus expands to 66MHz, clock buffer also will provide the signal of 66MHz for system.
FPGA series arrangement interface 705 provides configuration data with configuration fpga chip F11-F14, F21-F24, F31-F34 and F41-F44.The Altera databook, [Altera, 1996DATA BOOK] provides the particulars of device for formulating and process (in June, 1996).FPGA series arrangement interface 705 also is connected with parallel port 721 with LOCAL_BUS 708.In addition, FPGA series arrangement interface 705 connects CTRL_FPGA unit 701 and fpga chip F11-F14, F21-F24, F31-F34 and F41-F44 by CONF_INTF wire line 723.
Boundary scan testing interface 706 provides the JTAG embodiment of the test command collection of appointment, to utilize the software externally logical block and the circuit of measurement processor or system.This interface 706 is observed ieee standard 1149.1-1990 standard.Referring to the Altera databook, [Altera, 1996 DATA BOOK] (in June, 1996) and [Application Note 39] (JTAG Boundary-Scan Testing in Altera Devices) is to obtain more information, and its content is incorporated this paper by reference into.Boundary scan testing interface 706 also is connected with parallel port 722 with LOCAL_BUS 708.In addition, boundary scan testing interface 706 connects CTRL_FPGA unit 701 and fpga chip F11-F14, F21-F24, F31-F34 and F41-F44 by BST_INTF wire line 724.
CTRL_FPGA unit 701 is respectively by 32 buses 718 of low side group and high-end group of 32 buses 719, and impact damper 707 and F_BUS 725 (be used for 32 FD[31:0 of low side group]), F_BUS 726 (be used for high-end group of 32 FD[63:32]) import data into or spread out of low side (chip F41-F44 and F21-F24) and high-end (chip F31-F34 and F11-F14) fpga chip group.
Embodiment has double pci bus 709 in low side group bus 718 and high-end group of bus 719 data throughout.Pci bus 709 bit wide when 33MHz is 32.Therefore handling capacity is 132MBXs (=33MHz*4 byte).Low side group bus 718 is 32 at a half (33/2MHz=16.5MHz) of pci bus frequency.High-end group of bus 719 also is 32 at a half (33/2MHz=16.5MHz) of pci bus frequency.The handling capacity of 64 low sides and high-end group of bus also is 132MBXs (=16.5MHz*8 byte).Therefore, the performance of low side and high-end group of bus is comparable to the performance of pci bus.In other words, performance limitations is pci bus, and not in low side and high-end group of bus.
According to one embodiment of the invention, all be that the number of address indicator is realized in each software/hardware boundary address space in each fpga chip.These address pointers are crossed over a plurality of fpga chips and are linked at together by the multiplexed chip address indicator link of striding.See also above in conjunction with Fig. 9 the description of 11,12,14 and 15 pairs of address pointers.In order to cross over the address pointer link relevant and to cross over a plurality of chips and move word select and select signal, must have chain and connect wire line with given address space.These chains connect wire line and represent with the arrow between the chip.It is wire line 730 between chip F23 and the F22 that chain that is used for the low side group connects wire line.It is wire line 731 between chip F31 and the F32 that another chain that is used for high-end group connects wire line.Chain on low side group end chip F21 connects wire line 732 and is connected with CTRL_FPGA unit 701, as LAST_SHIFT_L.Chain on high-end group end chip F11 connects wire line 733 and is connected with CTRL_FPGA unit 701, as LAST_SHIFT_H.When word select was selected signal and passed fpga chip and transmit, these signals LAST_SHIFT_L and LAST_SHIFT_H selected signal for word select of its corresponding group.When among these signals LAST_SHIFT_L and the LAST_SHIFT_H any one offers CTRL_FPGA unit 701 with logical one, show that word select selects the terminal chip that signal has advanced to respective sets.
CTRL_FPGA unit 701 imports into or from the fpga chip outgoing signal to fpga chip by following wire line, it on the wire line 734 write signal (F_WR), it on the wire line 735 read signal (F_RD), it on the wire line 736 the DATA_XSFR signal, being the EVAL signal on the wire line 737, is SPACE[2:0 on the wire line 738] signal.The EVAL_REQ# signal that CTRL_FPGA unit 701 receives on the wire line 739.Write signal (F_WR), read signal (F_RD), DATA_XSFR signal and SPACE[2:0] address pointer of signal common service in fpga chip.Utilizing write signal (F_WR), read signal (F_RD) and SPACE[2:0] signal is that address pointer with the selected address space correlation of being determined by SPACE index (SPACE[2:0]) generates the MOVE signal.Utilize DATA_XSFR signal initialization address indicator and begin word for word data transmission procedure.
If asserting, any fpga chip then utilizes this signal to restart the estimation circulation by the EVAL_REQ# signal.For example, be the estimated data, data be transferred to or write FPGA from the primary memory of primary processor computer installation by pci bus.Last in transmission begins the estimation circulation, and this operation that comprises the initialization of address pointer and software clock is to promote estimation process.But owing to multiple reason, specific fpga chip may need estimated data once more.This fpga chip asserts that EVAL_REQ# signal and CTRL_FPGA unit 701 begin the estimation circulation once more.
Figure 23 has shown the detailed view of CTRL_FPGA unit 701 and impact damper 707 shown in Figure 22.Figure 23 and Figure 22 use identical about the input/output signal of CTRL_FPGA unit 701 and corresponding numbering thereof.But, other signals and the lead/bus line that do not show among Figure 22 will be represented by new numbering, for example SEM_FPGA output starts 1016, local interruption output (local I NTO) 708a, local read 708b, local address bus 708c, local interruption input (local I NTI#) 708d, and the bus 708e of local data.
CTRL_FPGA unit 701 comprises transmission and finishes inspection logic (XSFR_DONE logic) 1000, estimation steering logic (EVAL logic) 1001, dma descriptor piece 1002, control register 1003, estimation timer logic (EVAL timer) 1004, address decoder 1005, write flag sequence generator logical one 006, fpga chip read/write steering logic (SEM_FPGA R/W logic) 1007, demultiplexer and latch (DEMUX logic) 1008, and latch 1009-1012, it is corresponding to the impact damper among Figure 22 707.Global clock signal (CTRL_FPGA_CLK) on lead/bus 721 is provided for logic element/pieces all in the CTRL_FPGA unit 701.
Transmission is finished and is checked that logic (XSFR_DONE logic) 1000 receives LAST_SHIFT_H 733, LAST_SHIFT_L 732 and local INTO 708a.XSFR_DONE logical one 000 is finished signal (XSFR_DONE) by lead/bus 1013 with transmission and is outputed to EVAL logical one 001.Based on the reception of LAST_SHIFT_H733 and LAST_SHIFT_L 732, XSFR_DONE logical one 000 will be checked finishing of data transmission, make to begin the estimation circulation as required.
EVAL_REQ# signal on EVAL logical one 001 reception lead/bus 739 and the WR_XSFR/RD_XSFR signal on lead/bus 1015, signal (XSFR_DONE) is finished in the transmission that adds on lead/bus 1013.EVAL logical one 001 generates two output signals, beginning EVAL on lead/bus 1014 and the DATA_XSFR on lead/bus 736.The EVAL logic shows that when the data transmission between FPGA bus and the pci bus will begin in the initialization address indicator.It receives the XSFR_DONE signal after data transmission is finished.The transmission of WR_XSFR/RD_XSFR signal indicating is read or write operation.In case I/O end cycle (or before an I/O cycle begins), EVAL logic can begin to estimate circulation and follow the EVAL signal that starts the EVAL timer.The EVAL timer has been stipulated the estimation round-robin duration and effective in stable data transfer to all registers and combiner by keeping estimation to circulate in the needed time, guarantees the successful operation of software clock mechanism.
The local bus address that dma descriptor piece 1002 receives on lead/bus 1019, the enabling signal of writing on lead/bus 1020, and the local bus data on lead/bus 1029 via the bus 708e of local data from address decoder 1005.It is output as the dma descriptor output on lead/bus 1046, and it enters DEMUX logical one 008 by lead/bus 1045.Dma descriptor piece 1002 comprises the descriptor block information corresponding to the primary memory internal information, comprises the PCI address, local address, transmission counting, the address of transmission direction and next descriptor block.Main frame will be set up the address of initial descriptor block in the descriptor indicator register of pci controller.Can start transmission by control bit is set.PCI is written into first descriptor block and begins data transmission.Pci controller continues to be written into descriptor block and transmits data to be arranged in next descriptor indicator register up to the end that it detects the position that is linked.
Local R/W control signal on address decoder 1005 receptions and the transfer bus 708b, the local address signal on reception and the transfer bus 708c.Address decoder 1005 generates the enabling signal of writing of input dma descriptor 1002 on lead/bus 1020, on lead/bus 1021, generate the enabling signal of writing of input control register 1003, on lead/bus 738, generate FPGA address SPACE index, on lead/bus 1027, generate control signal, and another control signal that on lead/bus 1024, generates input DEMUX logical one 008.
The enabling signal of writing that control register 1003 receives on lead/bus 1021, and the data on lead/bus 1030 via the bus 708e of local data from address decoder 1005.Control register 1003 generates the WR_XSFR/RD_XSFR signal of input EVAL logical one 001 on lead/bus 1015, what generate input EVAL timer 1004 on lead/bus 1041 is provided with EVAL time signal and the SEM_FPGA output enabling signal that generates the input fpga chip on lead/bus 1016.System uses SEM_FPGA output enabling signal optionally to connect or activate each fpga chip.System once activates a fpga chip usually.
EVAL timer 1004 receives the beginning EVAL signal on lead/buses 1014, and receives and on lead/bus 1041 the EVAL time signal is set.EVAL timer 1004 generates the EVAL signal on lead/bus 737, on lead/bus 1017, generate estimation and finish (EVAL_DONE) signal, and generate on lead/bus 1018 that input writes flag sequence generator logical one 006 begin to write marking signal.In one embodiment, the position of EVAL timer is long is 6.
Writing flag sequence generator logical one 006 receives and begins to write marking signal from EVAL timer 1004 on lead/bus 1018.Write flag sequence generator logical one 006 generates input local R/W lead/bus 708b on lead/bus 1022 local R/W control signal, on lead/bus 1023, generate the local address signal of input local address bus 708c, on lead/bus 1028, generate local data's signal of the input bus 708e of local data, on lead/bus 708d, generate local I NTI#.In case receive to begin to write marking signal, write flag sequence generator logic and begin the sequence of control signal with the write cycle time of beginning storer to pci bus.
The control signal that SEM_FPGA R/W steering logic 1007 receives on lead/bus 1027, and the local R/W control signal on lead/bus 1047 via local R/W control bus 708b from address decoder 1005.SEM_FPGA R/W steering logic 1007 generates the enabling signal of input latch 1009 on lead/bus 1035, on lead/bus 1025, generate the control signal of input DEMUX logical one 008, on lead/bus 1037, generate the enabling signal of input latch 1011, on lead/bus 1040, generate the enabling signal of input latch 1012, on lead/bus 734, generate the F_WR signal, and on lead/bus 735, generate the F_RD signal.SEM_FPGA R/W steering logic 1007 control to/from the different write and read data transmission of FPGA low side group and high-end group of bus.
DEMUX logical one 008 is a multiplexer and a latch, and it receives four groups of input signals and export one group of signal to the bus 708e of local data on lead/bus 1026.Selector signal be on lead/bus 1025 from the control signal of SEM_FPGA R/W steering logic 1007 and the lead/bus 1024 from the control signal of address decoder 1005.DEMUX logical one 008 receives from the EVAL_DONE signal on lead/bus 1042, the XSFR_DONE signal on lead/bus 1043, and one group of input signal of the EVAL signal on lead/bus 1044.This single sets of signals is marked as numbering 1048.In any one time cycle, these three signal EVAL_DONE, XSFR_DONE has only one will be provided for DEMUX logical one 008 and be used for possible selection among the EVAL.DEMUX logical one 008 also receives on lead/bus 1045 the dma descriptor output signal from dma descriptor piece 1002, come data output on lead/bus 1039 from latch 1012, and come on lead/bus 1034 to export from another data of latch 1010, as other three groups of input signals.
Data buffer between CTRL_FPGA unit 701 and low side and the high-end FPGA group bus comprises latch 1009 to 1012.Latch 1009 receives on lead/buses 1032 the local bus data via lead/bus 1031 and local data bus 708e, and on lead/bus 1035 from the enabling signal of SEM_FPGA R/W steering logic 1007.Latch 1009 outputs to latch 1010 by lead/bus 1033 with data.
Latch 1010 receives the data of coming on lead/buses 1033 from latch 1009, and on lead/bus 1036 via the enabling signal of lead/bus 1037 from SEM_FPGA R/W steering logic 1007.Latch 1010 outputs to data FPGA low side group bus and outputs to DEMUX logical one 008 by lead/bus 1034 by lead/bus 725.
Latch 1011 receives on lead/buses 1031 data from the bus 708e of local data, and on lead/bus 1037 from the enabling signal of SEM_FPGA R/W steering logic 1007.Latch 1011 outputs to data high-end group of bus of FPGA and outputs to latch 1012 by lead/bus 1038 by lead/bus 726.
Latch 1012 receives the data of coming on lead/buses 1038 from latch 1011, and on lead/bus 1040 from the enabling signal of SEM_FPGA R/W steering logic 1007.Latch 1012 outputs to DEMUX logical one 008 by lead/bus 1039 with data.
Figure 24 has shown 4 * 4FPGA array, the relation that itself and FPGA organize, and extended capability.Similar Fig. 8, Figure 24 have shown 4 * 4 same arrays.Also shown CTRL_FPGA unit 740.Low side core assembly sheet (chip F41-F44 and F21-F24) and high-end core assembly sheet (chip F31-F34 and F11-F14) are arranged in the mode that replaces.Therefore, fpga chip is capable from the bottom line to the top row is successively: the low side group--high-end group--low side group--high-end group.The data transmission chain is pressed predefined procedure and is formed along each group.The data transmission chain of arrow 741 expression low side groups.The data transmission chain that arrow 742 expressions are high-end group.Arrow 743 expression JTAG configuration chains, it is through all 16 chips of whole array, and from F41 to F44, F34 is to F31, and F21 is to F24, and F14 returns CTRL_FPGA unit 740 to F11.
Can utilize the on-board circuitry plate to finish expansion.Suppose that original fpga chip array comprises F41-F44 and F31-F34 among Figure 24, can increase other two row chip F21-F24 and F11-F14 by on-board circuitry plate 745.On-board circuitry plate 745 also has suitable bus with expanded set.The top that more on-board circuitry plate can be placed in other circuit boards is to finish further expansion in the array.
Figure 25 has shown an embodiment of hardware-initiated method.Step 800 begins to carry out power-on servicing or carries out hot startup procedure.In step 801, pci controller reads EEPROM to carry out initialization.Step 802 is carried out the read and write operation according to initialization time ordered pair pci controller register.Fpga chips all in the step 803 pair array carry out boundary scan testing.CTRL_FPGA unit in the step 804 configuration FPGA i/o controller.Register in the step 805 pair CTRL_FPGA unit carries out the read and write operation.Step 806 is set up pci controller for DMA master's read/write mode.After this, transmission and checking data.Step 807 is utilized test design to dispose all fpga chips and is verified its correctness.In step 808, hardware has been ready to available.At this moment, system postulation result in steps all confirmed the operability of hardware, otherwise system can not arrive step 808.
E. use the alternate embodiment of more intensive fpga chip
In one embodiment of this invention, the fpga logic device is installed on the single circuit board.If make the needed fpga logic device of user's circuit design modelling than being installed in many on the circuit board, a plurality of circuit boards that have more fpga logic devices can be provided.Can increase more circuit board in simulation system is a good characteristics of the present invention.In this embodiment, use more intensive fpga chip, (as Altera 10k130v and 10k250v).The use of these chips has changed the design of circuit board, so that has only substituted eight more low-density fpga chips (as Altera10k100) with four more intensive fpga chips on each circuit board.
Like this, we solve coupling connection problem between these circuit boards and the simulation system mainboard with regard to needs, they interconnection and connection plan in must make compensation to the base plate disappearance.FPGA array in the simulation system is installed on the mainboard by a unique circuit board interconnection structure.Each chip may have 8 cover interconnection at most, the configuration of these interconnection is according in same circuit board and between the various boards, contiguous direct neighbor interconnection (be N[73:0], S[73:0], W[73:0], E[73:0]) interconnection adjacent with single-hop (be NH[27:0], SH[27:0], XH[36:0], XH[72:37]), do not comprise the local bus connection.Each chip can be direct and contiguous chip interconnect or upper and lower by being positioned at of single-hop and non-vicinity, a left side or right chip interconnect.Array ringwise, and is latticed in that Y direction (north is to southern) is next under directions X (east to west).
These interconnection can connect logical unit and other parts independently in independent circuit board.Yet, be these circuit boards and connectors to be linked together between different circuit boards by the connector between circuit board, and (1) by mainboard pci bus and the array circuit plate between transmit data between (2) any two array circuit plates.Each circuit board comprises the FPGA bus FD[63:0 that the fpga logic device is interknited that carries], SRAM storage arrangement and CTRL_FPGA unit (FPGA i/o controller).FPGA bus FD[63:0] be not mounted between a plurality of circuit boards.Though it is related that FPGA interconnection and FPGA bus do not have, these are interconnected in the connection of setting up between a plurality of circuit boards between the fpga logic device ().And on the other hand, be provided with local bus between all circuit boards.
Motherboard connector is linked circuit board on the mainboard, and therefore is connected to pci bus, power supply and ground connection.For some circuit board, motherboard connector is not as with the direct usefulness that is connected of mainboard.In a structure with six circuit boards, only being numbered 1,3,5 circuit board is directly to link on the mainboard, is to be connected on the mainboard by contiguous circuit board and be numbered 2,4 and 6 circuit board.Therefore, each circuit board of being separated by is directly to link on the mainboard and connectors these circuit boards and local bus are to link together to the mother daughter board connector on the component side by being installed in face of weld.Pci signal only transmits by one of them circuit board (normally first circuit board).The motherboard connector that power supply and ground connection are connected to other is used for those circuit boards.Make between pci bus parts, fpga logic device, storage arrangement and the various simulation system control circuit at the various circuit mother daughter board connectors on face of weld and the component side and to carry out mutual communication.
Shown in Figure 56 is the high-level structure figure of a fpga chip array according to an embodiment of the invention.CTRL_FPGA described above unit 1200 is connected on 1210 buses by 1209 circuits.In one embodiment, CTRL_FPGA unit 1200 is programmable logic devices (PLD) of fpga chip (as Altera 10K50 chip) form.Bus 1210 makes CTRL_FPGA unit 1200 be connected to other analog array circuit board (if any) and other chip (as pci controller, EEPROM, clock buffer etc.).Figure 56 has also shown other major function piece of logical unit and storage arrangement form.In one embodiment, logical unit is the programmable logic device (PLD) of fpga chip (as Altera10K130V or 10K250V chip) form.10K130V or 10K250V chip are pin compatibilities, and the both is the PGA bag of 599 pins.Therefore with top be shown in the array embodiment with 8 Altera FLEX 10K100 chips different be only to use 4 Altera FLEX 10K130 chips in this embodiment.One embodiment of the invention have been described the circuit board that has these 4 logical units and their interconnection.
Because the user designs by modelling and is configured in these logical units of any number in this array, the communication between the fpga logic device must be connected to another part with the part of subscriber's line circuit design.And initial configuration information and boundary scan testing also are to be supported by the interconnection between the FPGA.At last, also need to guarantee what simulation system control signal necessary between simulation system and the fpga logic device was visited.
Shown in Figure 36 is the hardware configuration of a fpga logic device used in the present invention.This fpga logic device 1500 comprises 102 top I/O pins, 102 bottom I/O pins, 111 left side I/O pins and 110 right side I/O pins.Like this, the interconnection pin sum is exactly 425.And, also have additional 45 I/O pin: GCLK, FPGA bus FD[31:0 with the lower device special use] (for high-end group, special-purpose FD[63:32]), F_RD, F_WR, DATAXSFR, SHIFTIN, SHIFTOUT, SPACE[2:0], EVAL, EVAL_REQ_N, DEVICE_OE (signal of the output pin of the unlatching fpga logic device that CRTL_FPGA sends the unit) and DEV_CLRN (signal of all internal triggers of removing before beginning to simulate that CRTL_FPGA sends the unit).Like this, these interconnection any data and the control signal that is transmitted between any two fpga logic devices with regard to load.Remaining pin is power supply and ground connection special use.
Shown in Figure 37 is according to the FPGA interconnection output connecting pin of the independent fpga chip of one embodiment of the invention.Each chip 1510 can have 8 groups of interconnection, every group of pin that comprises given number at most.According to they positions on circuit board, some chips may comprise and be less than 8 groups of interconnection.In a preferred embodiment, though according to they diverse locations on circuit board, the employed concrete interconnected set of chip and chip chamber is different, and all chips all have 7 groups of interconnection.The interconnection of each fpga chip is that level (east is to the west) is installed with vertical (north is to south).The west to interconnected set be designated as W[73:0], the interconnected set of east orientation is designated as E[73:0], the interconnected set of north orientation is designated as N[73:0], and the interconnected set of south orientation is designated as S[73:0].The close set of these interconnection can only be connected on the contiguous chip and can not skip any one chip.For example: in Figure 39, the N[73:0 of chip 1570] for interconnecting 1540, W[73:0] and be interconnect device 1542, E[73:0] for interconnecting 1543, S[73:0] be interconnection 1545.Note this fpga chip 1570, i.e. FPGA2 chip has whole four groups of contiguous interconnection---N[73:0], S[73:0], W[73:0] and E[73:0].The FPGA0 west to the intercommunicated circuit 1539 of crossing be connected with the FPGA3 east orientation in the annulus mode.Like this, circuit 1539 method that 1569 (FPGA0) and 1572 (FPGA3) are directly coupled together with the thing two ends of circuit board around and the mode that couples together similar.
Get back to Figure 37, also have four groups " jump " interconnection here.Wherein two groups of interconnection are that interconnection for the homeotropic alignment of non-vicinity is provided with, i.e. NH[27:0] and SH[27:0].For example, interconnection 1541 of NH shown in the FPGA2 chip 1570 and SH interconnection 1546 among Figure 39.Return Figure 37, other two groups of interconnection are what the horizontal interconnection for non-vicinity was provided with, i.e. XH[36:0] and XH[72:37].For example, the interconnection of XH shown in the FPGA2 chip 1,570 1544 among Figure 37.
Turn to Figure 37, vertical jump interconnection NH[27:0] and SH[27:0] 28 pins are respectively arranged.The interconnection XH[36:0 of level] and XH[72:37] 73 pins are then arranged.The pin that level connects (XH[36:0] and XH[72:37]) can be used in that west (as the interconnection 1605 of FPGA3 chip 1576 among Figure 39) is gone up and/or in the east in (as the interconnection 1602 of FPGA0 chip 1573 among Figure 39).This configuration makes the production of each chip become identical.Like this, each chip can be connected to one by single-hop and is positioned on upper and lower a, left side and the right non-adjacent chips.
What Figure 39 showed is a directly contiguous and contiguous FPGA array design of single-hop that is positioned at six circuit boards on the single mainboard according to one embodiment of present invention.This figure is used for demonstrating two kinds of possible configurations, i.e. one six circuit board systems and a double circuit plate system.Position indicator 1550 has shown that " Y " direction is that north is to south and " X " direction is that east is to western.Under directions X array ringwise, and array is latticed under the Y direction.In Figure 39, only show high-rise circuit board, fpga logic device, interconnection and connector, and do not shown mainboard and other holding components (as the SRAM storage arrangement) and wire line (as the FPGA bus).
Notice that Figure 39 has provided the array view of circuit board and parts, interconnection and connector.Actual physical configuration and installing comprises these circuit boards is placed on separately the edge from the component side to the face of weld.Nearly half circuit board is that to be directly connected on the mainboard second half then be to be connected on the adjacent separately circuit board.
In the embodiment of the present invention's six circuit boards, six circuit boards 1551 (circuit board 1), 1552 (circuit boards 2), 1553 (circuit boards 3), 1554 (circuit boards 4), 1555 (circuit boards 5) and 1556 (circuit boards 6) are that the part as reconfigurable hardware unit 20 among Fig. 1 is installed on the mainboard (not shown).Each circuit board comprises almost completely identical cover parts and a connector.Like this, for demonstrate convenient for the purpose of the 6th circuit board 1556 comprise fpga logic device 1565 to 1568 and connector 1557 to 1560 and 1581; The 5th circuit board 1555 comprises fpga logic device 1569 to 1572 and connector 1582 and 1583; The 4th circuit board 1554 comprises fpga logic device 1573 to 1576 and connector 1584 and 1585.
In the configuration of this six circuit board, first circuit board 1551 and the 6th circuit board 1556 are for comprising Y-grid terminal, as " bookend " circuit board of the terminal 1591 to 1594 on bag terminal 1557 to 1560 of the R-on the 6th circuit board 1556 and the first circuit board 1551.Circuit board (i.e. 1552 (circuit boards 2), 1553 (circuit boards 3), 1554 (circuit boards 4) and 1555 (circuit boards 5)) in the middle of being positioned in addition forms complete array.
As what set forth above, the configuration of these interconnection be according to direct neighbor interconnection contiguous in same circuit board and between the various boards (be N[73:0], S[73:0], W[73:0], E[73:0]) interconnection adjacent with single-hop (be NH[27:0], SH[27:0], XH[36:0], XH[72:37]), do not comprise the local bus connection.This interconnection can the logical unit of separate connection in an independent circuit board and other parts.Yet circuit mother daughter board connector 1581 to 1590 can make the logical unit on the various boards (first circuit board to the six circuit boards) carry out communication.The FPGA bus is the part of circuit mother daughter board connector 1581 to 1590.These connectors 1581 to 1590 are 600 pin connectors, and they are that 520 signals of two adjacent array circuit boards load are connected with 80 power supply/ground connection.
In Figure 39, different circuit boards connects in asymmetric mode for circuit mother daughter board connector 1581 to 1590.For example, being present between circuit board 1551 and 1552 is circuit mother daughter board connector 1589 and 1590.Interconnection 1515 fpga logic device 1511 linked together with 1577 and according to connector 1589 with 1590 this to be connected be symmetrical.Yet interconnect 1603 be asymmetric.It is connected to a fpga logic device in the tertiary circuit plate 1553 in the circuit board 1551 on the fpga logic device 1577.For connector 1589 and 1590, this connection is asymmetric.Similarly, for connector 1589 and 1590, interconnection 1600 also is asymmetric.Because it is connected to fpga logic device 1577 on the terminal 1591, this terminal then is 1601 to be connected to fpga logic device 1577 by interconnecting.Other similar interconnection has further showed this asymmetric.
This asymmetric connector that has caused between intercommunicated oversampling circuit plate, according to two kinds of different modes connect up----a kind of be that the symmetric form another kind that resembles connectors 1515 then is the asymmetrical type that resembles connectors 1603 and 1600.Figure 40 (A) and 40 (B) have shown the interconnection wiring scheme.
In Figure 39, the example that the direct neighbor in independent circuit board connects is interconnection 1543, and it couples together the logical unit on the circuit board 1,555 1570 and logical unit 1571 along east-west direction.The other example that direct neighbor in independent circuit board connects is interconnection 1607, and it couples together logical unit on the circuit board 1,554 1573 and logical unit 1576.The example that direct neighbor between two various boards connects is interconnection 1545, and it couples together the logical unit on logical unit on the circuit board 1,555 1570 and the circuit board 1,554 1574 by connector 1583 and 1584 along North and South direction.At this, two circuit mother daughter board connectors 1583 and 1584 are used for transmission signals.
An example of the single-hop interconnection in independent circuit board is interconnection 1544, and it couples together the logical unit on the circuit board 1,555 1570 and logical unit 1572 along east-west direction.An example of single-hop interconnection is interconnection 1599 between two various boards, and it couples together the logical unit 1573 on logical unit on the circuit board 1,556 1565 and the circuit board 1554 by connector 1581 to 1584.At this, four circuit mother daughter board connectors 1581 to 1584 are used for transmission signals.
Some circuit boards, those circuit boards that especially are positioned at mainboard north and south end also comprise the 10 ohm of R bags (R-pack) that are used for stopping some connections.Like this, the 6th circuit board 1556 comprises 10 ohm of R packet gatewaies 1557 to 1560 and first circuit board 1551 comprises 10 ohm of R packet gatewaies 1591 to 1594.The R packet gateway 1557 that the 6th circuit board 1556 comprises is used to interconnect 1970 and 1971, and R packet gateway 1558 is used to interconnect 1972 and 1541, and R packet gateway 1559 is used to interconnect 1973 and 1974, and R packet gateway 1560 is used to interconnect 1975 and 1976.In addition, interconnection 1561 to 1564 does not link to each other with any device.These north and south interconnection are different with the circular ring type interconnection of thing, and they are grid types.
These grid terminals have increased the number of North and South direction direct interconnection.Otherwise the interconnection that is positioned at two ends, FPGA grid north and south will be wasted.For example, fpga logic device 1511 and 1577 also has the additional interconnection by R bag 1591 and interconnection 1600 and 1601 except one group of direct interconnection 1515 is arranged.That is to say that R bag 1591 will interconnect and 1600 and 1601 link together.This has increased direct-connected number between fpga logic device 1511 and 1577.
Also disposed the connection between circuit board. Logical unit 1577,1578,1579 on the circuit board 1551 with 1580 by interconnection 1515,1516,1517 with 1518 with circuit mother daughter board connector 1589 with 1590 with circuit board 1552 on logical unit 1511,1512,1513 link to each other with 1514.Like this, interconnection 1515 couples together the logical unit 1577 on logical unit on the circuit board 1,552 1511 and the circuit board 1551 by connector 1589 and 1590; 1516 couple together logical unit on the circuit board 1,552 1512 and logical unit 1578 on the circuit board 1551 by connector 1589 and 1590; 1517 couple together logical unit on the circuit board 1,552 1513 and logical unit 1579 on the circuit board 1551 by connector 1589 and 1590; 1518 couple together logical unit on the circuit board 1,552 1514 and logical unit 1580 on the circuit board 1551 by connector 1589 and 1590.
Some interconnection as 1595,1596,1597 and 1598 do not link to each other with any device because they are not used.Yet just as described above, for logical unit 1511 and 1577, R bag 1591 will interconnect and 1600 and 1601 couple together and increased the number that North and South direction interconnects.
The embodiment of a double circuit plate of the present invention as shown in figure 44.In double circuit plate embodiment of the present invention, make the designing a model of user of simulation system only need two circuit boards.As six circuit board arrangement among Figure 39, two circuit boards of the identical with it conduct " bookend " that the configuration of the double circuit plate of Figure 44 is to use, i.e. circuit board 1551 and circuit board 1556.They are arranged on the mainboard and as the part of Fig. 1 reconfigurable hardware unit.In Figure 44, one of them bookend circuit board is a first circuit board, and another then is the 6th circuit board.The 6th used circuit board of Figure 44 has shown the similarity with the 6th circuit board of Figure 39.That is to say, resemble first and should be useful on the north and south grid with the such bookend circuit board of the 6th circuit board and be connected necessary terminal.
This double circuit plate configuration comprises four the fpga logic devices 1577 (FPGA0) on the first circuit board 1551,1578 (FPGA1), four fpga logic devices 1565 (FPGA0) on 1579 (FPGA2) and 1580 (FPGA3) and the 6th circuit board 1556,1566 (FPGA1), 1567 (FPGA2) and 1568 (FPGA3).These two circuit boards link together by circuit mother daughter board connector 1581 and 1590.
These circuit boards comprise 10 ohm of R bags that are used for stopping some connections.In the embodiment of double circuit plate, two circuit boards all are " bookend " plates.Circuit board 1551 comprises the 10 ohm of R packet gatewaies 1591,1592,1593 and 1594 as resistive terminal.Another piece circuit board also comprises 10 ohm of R packet gatewaies 1557 to 1560.
Be useful on the connector 1590 and 1581 of communication between circuit board on circuit board 1551 and the circuit board 1556 respectively.Connect the interconnection of two circuit boards, as 1600,1971,1977,1541 and 1540, pass connector 1590 and 1581; In other words, circuit mother daughter board connector 1590 and 1581 makes interconnection 1600,1971,1977,1541 and 1540 connections that can finish between the various boards upper-part.Circuit mother daughter board connector 1590 and 1581 is transmitting control data and the control signal on the FPGA bus.
In the configuration of four circuit boards, first circuit board and the 6th circuit board are the bookend circuit boards, and second circuit board 1552 and tertiary circuit plate 1553 (seeing Figure 39) then are the intermediate circuit plates.(discuss with reference to Figure 38 (A) and Figure 38 (B)) when according to the present invention it being connected with mainboard, circuit board 1 and circuit board 2 are in pairs and circuit board 3 is paired with circuit board 6.
In the configuration of six circuit boards, first circuit board and the 6th circuit board are bookend circuit boards and second circuit board 1552, tertiary circuit plate 1553, the 4th circuit board 1554 and the 5th circuit board 1555 (seeing Figure 39) are the intermediate circuit plates as mentioned above.(discuss with reference to Figure 38 (A) and Figure 38 (B)) when being connected with mainboard according to the present invention, circuit board 1 is paired with circuit board 2, and circuit board 3 and circuit board 4 are in pairs and circuit board 5 is paired with circuit board 6.
In case of necessity more circuit board can be installed.Yet no matter the number that will be increased to the circuit board in the system how, bookend circuit board (as circuit board among Figure 39 1 and circuit board 6) should have indispensable terminal to finish being connected of latticed array.In one embodiment, minimal configuration is a double circuit template as shown in figure 44.Can increase number of circuit boards by increasing the double circuit plate.If initial configuration is first circuit board and the 6th circuit board, so in the future the change of four circuit board arrangement is just comprised and aforesaid the 6th circuit board being shifted out, with first circuit board and second circuit board pairing, and with tertiary circuit plate and the pairing of the 6th circuit board.
As described above, each logical unit all is to be connected to the contiguous logical unit and the logical unit of non-contiguous single-hop.Like this, logical unit 1,577 1547 is connected to contiguous logical unit 1578 by interconnecting in Figure 39 and Figure 44.Logical unit 1577 also is connected to the logical unit 1579 of non-vicinity by single-hop interconnection 1548.Yet because interconnection 1549 provide connection around loop configurations, it is contiguous that logical unit 1580 and logical unit 1577 are considered to.
What Figure 42 showed is the top view (component side) of an independent circuit board upper-part and connector.In one embodiment of the invention, the design of the user in the modelling simulation system only needs a circuit board.In other embodiments, a plurality of circuit boards (i.e. at least two circuit boards) are essential.Therefore, for example Figure 39 has shown that six circuit boards 1551 to 1556 link together to 1590 by different 600 pin connectors 1581.In top and bottom, circuit board 1551, circuit board 1556 are interrupted by 10 ohm of R bags of another group respectively.
Get back to Figure 42, circuit board 1820 comprises four fpga logic devices: 1822 (FPGA0), 1823 (FPGA1), 1824 (FPGA2) and (FPGA3).Also have two SRAM storage arrangements 1828 and 1829 in addition.These two SRAM storage arrangements are used for logical unit mapping memory block from this circuit board; That is to say that storer simulation of the present invention is mapped to the memory block of logical unit on this circuit board in the SRAM storage arrangement of this circuit board.Other circuit board comprises other logical unit and the storage arrangement of finishing similar map operation.In one embodiment, memory mapped depends on circuit board; That is to say that the memory mapped of first circuit board is only limited to logical unit on this circuit board and storage arrangement and irrelevant with other circuit board.Memory mapped is independent of circuit board in other embodiments.So only need to use several big storage arrangements that memory block is mapped on the storage arrangement of another circuit board from the logical unit on the circuit board.
Also dispose light emitting diode (LED) 1821 and be used to refer to some phenomenons.According to one embodiment of the invention, what LED was shown thes contents are as follows shown in the Table A:
Table A: LED shows
Various other control chip such as PLX pci controller 1826 and CTRL_FPGA unit 1827 are being controlled between FPGA and the communication of PCI.The example that may be used in intrasystem PLXPCI controller 1826 is PCI9080 or 9060 of PLX Technology.PCI9080 has suitable local bus interface, control register, FIFO and the pci interface that arrives pci bus.The content of the databook PCI 9080 tables of data DataSheet of PLX Technology (on February 28th, 1997,0.93 edition) is incorporated this paper by reference into.The programmable logic device (PLD) that an example of CTRL_FPGA unit 1827 is fpga chip forms is as Altera 10K50 chip.In the configuration of multicircuit plate, have only the first circuit board that is connected with pci bus to comprise pci controller.
Connector 1830 is connected to mainboard (not showing) with circuit board 1820, and therefore is connected to pci bus, power supply and ground connection.In some circuit board, connector 1830 is not used in mainboard and directly is connected.In the configuration of a double circuit plate, have only first circuit board directly to link to each other like this with mainboard.In the configuration of one six circuit board, have only circuit board 1,3,5th, with mainboard directly link to each other and circuit board 2,4,6th links to each other with mainboard by the circuit board adjacent with them.Also disposed circuit mother daughter board connector J1 in addition to J28, as its name suggests, connector J1 can set up connection between various boards to J28.
Connector J1 connects external power source and ground connection.Following table B is shown is pin and associated description according to one embodiment of the invention external power supply connector J1.
Table B: external power source-J1
Connector J2 is used for parallel port and connects.Connector J1 and J2 are used for the independent boundary scan testing of circuit board separately in process of production.Following table C shows is pin and associated description according to the parallel jtag port J2 of one embodiment of the invention.
Table C: parallel jtag port-J2
|
S |
|
|
|
7 |
PARA_TDI | I | |
4 |
D2 |
9 |
PARA_NR | I | |
5 |
D3 |
19 |
PARA_TD O | O | |
10 |
NACK |
10,12,14,16 ,18,20,22,24 |
GND |
|
18-25 |
GND |
Connector J3 is that the local bus that is used between circuit board is connected with J4.Connector J5 is the interconnection connection of a cover FPGA to J16.Connector J17 is the interconnection connection of another set of FPGA to J28.When being positioned at component side to face of weld, these connectors are that the parts between the various boards are set up effective connection.Following table D and E be according to one embodiment of the invention to connector J1 to complete tabulation and description that J28 did.
Table D: connector J1-J28
The connector of band shade is perforate.Attention in table D, the digitized representation fpga logic device in the bracket [] number 0 to 3.Like this, S[0] just represent 74 bytes of south orientation interconnection (be the S[73:0 among Figure 37]) and FPGA0 thereof.
Table E: local bus connector-J3, J4
I/O direction wherein refers to for circuit board 1.
Figure 43 connector J1 that is Figure 41 (A) in 41 (F) and Figure 42 is to the legend of J28.On the whole, what white block was represented is mounted on surface, and wherein the block that fills up with grey is then represented the perforation type.In addition, housing is the connector on the block representation element face of solid line, and housing is that the block of dotted line is represented the connector on the face of weld.Like this, be that the white block 1840 of solid line is just represented 2 * 30 top covers with housing, mounted on surface also is fixed on the component side.Housing is that the white block 1841 of dotted line is represented 2 * 30 sockets, and mounted on surface also is fixed on the face of weld of circuit board.Housing is that the grey block 1842 of solid line is represented 2 * 30 or 2 * 45 top covers, bores a hole and is fixed on the component side.Housing is that the grey block 1843 of dotted line is represented 2 * 30 or 2 * 45 sockets, bores a hole and is fixed on the face of weld.In one embodiment, simulation system is used the SFM of Samtec and 2 * 30 or 2 * 45 microstripline connectors that TFM series is applicable to mounted on surface and perforation type.Housing is that the oblique line filling block 1844 of solid line is represented the R bag, and mounted on surface also is fixed on the component side of circuit board.Housing be dotted line oblique line fill block 1845 and represent R to wrap, mounted on surface also is fixed on the face of weld.Content in the Samtec instructions in the Samtec products catalogue on the Samtec website is incorporated this paper by reference into.Get back to Figure 42, connector J3 is the specified type of legend of Figure 43 to J28.
Figure 41 (A) has shown the top view of each circuit board and their connectors separately to 41 (F).What Figure 41 (A) showed is the connector of the 6th circuit board.Like this, circuit board 1660 comprises connector 1661 to 1681 and motherboard connector 1682.What Figure 41 (B) showed is the connector of the 5th circuit board.Like this, circuit board 1690 just comprises connector 1691 to 1708 and motherboard connector 1709.What Figure 41 (C) showed is the connector of the 4th circuit board.Therefore, circuit board 1715 just comprises connector 1716 to 1733 and motherboard connector 1734.What Figure 41 (D) showed is the connector of tertiary circuit plate.Therefore, circuit board 1740 just comprises connector 1741 to 1758 and motherboard connector 1759.What Figure 41 (E) showed is the connector of second circuit board.Therefore, circuit board 1765 just comprises connector 1766 to 1783 and motherboard connector 1784.What Figure 41 (F) showed is the connector of first circuit board.Therefore, circuit board 1790 just comprises connector 1791 to 1812 and motherboard connector 1813.Legend as Figure 43 is specified, and these connectors of the 6th circuit board are various combinations of following several connections: (1) mounted on surface or perforation, (2) component side or face of weld, (3) top cover, socket or R bag.
In one embodiment, these connectors are the communications that are used for carrying out between circuit board.Relevant bus and signal is combined in together and by the connector transmission signals between any two circuit boards between these circuit boards.And the circuit board that has only half is directly to link to each other with mainboard.In Figure 41 (A), the 6th circuit board 1660 comprises specifies the connector 1661 to 1668 that is used for one group of FPGA interconnection and specifies the connector 1669 to 1674 and 1676,1679 that is used for another group FPGA interconnection, and specifies the connector 1681 that is used for local bus.Because the 6th circuit board 1660 is circuit boards that are arranged in mainboard end (first circuit board 1790 of Figure 41 (F) is positioned at the other end), so connector 1675,1677,1678 and 1680 is being connected of 10 ohm of R bags of specifying the north-south interconnection be used to determine.Motherboard connector 1682 neither be used for the 6th circuit board 1660 in addition, and shown in Figure 38 (B), the 6th circuit board 1535 is that link to each other with the 5th circuit board 1534 rather than is directly connected on the mainboard 1520 in Figure 38 (B).
In Figure 41 (B), the 5th circuit board 1690 comprises specifies the connector 1691 to 1698 that is used for one group of FPGA interconnection and specifies the connector 1699 to 1706 that is used for another group FPGA interconnection, and specifies the connector 1707 and 1708 that is used for local bus.Connector 1709 is used for the 5th circuit board 1690 is connected to mainboard.
In Figure 41 (C), the 4th circuit board 1715 comprises specifies the connector 1716 to 1723 that is used for one group of FPGA interconnection and specifies the connector 1724 to 1731 that is used for another group FPGA interconnection, and specifies the connector 1732 and 1733 that is used for local bus.Connector 1709 is not used for the 4th circuit board 1715 is directly connected to mainboard.This is configured in Figure 38 (B) also demonstration, and wherein the 4th circuit board 1533 is directly to link to each other with the 5th circuit board 1534 rather than directly link to each other with mainboard 1520 with tertiary circuit plate 1532.
In Figure 41 (D), tertiary circuit plate 1740 comprises specifies the connector 1741 to 1748 that is used for one group of FPGA interconnection and specifies the connector 1749 to 1756 that is used for another group FPGA interconnection, and specifies the connector 1757 and 1758 that is used for local bus.Connector 1759 is used for tertiary circuit plate 1740 is connected to mainboard.
In Figure 41 (E), second circuit board 1765 comprises specifies the connector 1766 to 1773 that is used for one group of FPGA interconnection and specifies the connector 1774 to 1781 that is used for another group FPGA interconnection, and specifies the connector 1782 and 1783 that is used for local bus.Connector 1784 is not used for second circuit board 1765 is directly connected to mainboard.This is configured in Figure 38 (B) also demonstration, and wherein second circuit board 1525 is directly to link to each other with first circuit board 1526 with tertiary circuit plate 1532 rather than directly link to each other with mainboard 1520.
In Figure 41 (F), first circuit board 1790 comprises specifies the connector 1791 to 1698 that is used for one group of FPGA interconnection, be used for the connector 1799 to 1804,1806 and 1809 of another group FPGA interconnection with appointment, and specify the connector 1811 and 1812 that is used for local bus.Connector 1813 is used for first circuit board 1790 is connected to mainboard.Because first circuit board 1790 is circuit boards of mainboard end (the 6th circuit board 1660 among Figure 41 (A) is positioned at the other end), so connector 1805,1807,1808 and 1810 is being connected of 10 ohm of R bags of specifying the north-south interconnection be used to determine.
In one embodiment of this invention, the multicircuit plate is to be connected in mainboard and interconnected with a kind of unique mode.A plurality of circuit boards are according to from component side being linked in sequence together to face of weld.One of them circuit board, for example first circuit board is to link to each other with mainboard by a motherboard connector to be connected with pci bus then.FPGA interconnect bus on the first circuit board is connected to by a pair of FPGA interconnecting connector on the FPGA interconnect bus of another piece circuit board (such as second circuit board).The FPGA interconnecting connector of first circuit board be positioned on the component side and the FPGA of second circuit board interconnection is positioned on the face of weld.Connector on the connector on the component side of first circuit board and the face of weld of second circuit board makes the interconnect bus of FPGA to be connected to each other.
Similarly, the local bus of these two circuit boards links together by the local bus connector.Local bus connector on the first circuit board is positioned on the component side, and the local bus connector on the second circuit board is positioned on the face of weld.Like this, the connector on the face of weld of connector on the component side of first circuit board and second circuit board makes the interconnect bus of FPGA to be connected to each other.
Can also increase more circuit board.Can increase the 3rd circuit board, make the component side of its welding facing to second circuit board.So just set up being connected between similar FPGA interconnection and local bus circuit plate.The tertiary circuit plate also is to link to each other with mainboard by another connector, but this connector is only for the tertiary circuit plate provides power supply and ground connection, and this will discuss hereinafter.
The component side of double circuit plate configuration is discussed with reference to Figure 38 (A) to the connector of face of weld.This figure shows is the side view that the FPGA circuit board connects on the mainboard according to an embodiment of the invention.What Figure 38 (A) showed is the configuration of double circuit plate, as the term suggests only used two circuit boards.Two circuit boards 1525 (second circuit board) among Figure 38 (A) are consistent with two circuit boards 1552 and 1551 among Figure 39 with 1526 (first circuit boards).Numbering 1989 has been represented the component side of circuit board 1525 and 1526.Numbering 1988 has been represented the face of weld of circuit board 1525 and 1526.Shown in Figure 38 (A), circuit board 1525 links to each other with mainboard 1520 by motherboard connector 1523 with 1526.For needing of expansion, also can provide other motherboard connector 1521,1522 and 1524.Signal between pci bus and circuit board 1525 and 1526 is by motherboard connector 1523 transmission.Pci signal transmission between this double circuit plate structure and the pci bus is at first undertaken by first circuit board 1526.Like this, the signal that sends from pci bus is at first running into first circuit board 1526 before second circuit board 1525.Similarly, the signal from the double circuit plate structure to pci bus is sent by second circuit board 1526.The supply unit (not shown) also provides power supply by motherboard connector 1523 for this configuration in addition.
Shown in Figure 38 (A), circuit board 1526 comprises several parts and connector.One of them parts is exactly a fpga logic device 1530.Also have connector 1528A and 1531A in addition.Similarly, circuit board 1525 also comprises several parts and connector.One of them parts is exactly a fpga logic device 1529.Also have connector 1528B and 1531B in addition.
In one embodiment, 1528A and 1528B are the circuit mother daughter board connectors (as 1590 and 1581, seeing Figure 44) that is used for the FPGA bus.These circuit mother daughter board connectors are that the connection between circuit board has been set up in various FPGA interconnection, as N[73:0], S[73:0], W[73:0], E[73:0], NH[27:0], SH[27:0], XH[36:0] and XH[72:37], and do not comprise that local bus connects.
In addition, connector 1531A and 1531B are the circuit mother daughter board connectors that is used for local bus.Local bus is handled the signal between pci bus (passing through pci controller) and the FPGA bus (by FPGA i/o controller (CRTL_FPGA) unit).Local bus is also handled configuration and the boundary scan testing information between pci controller, fpga logic device and FPGA i/o controller (CRTL_FPGA) unit.
In a word, motherboard connector is connected to pci bus and power supply with one in a pair of circuit board.One set of connectors is connected to FPGA by the component side of a circuit board face of weld of another circuit board.Another set of connector is connected to local bus by the component side of a circuit board face of weld of another circuit board.
Used plural circuit board in another embodiment of the present invention.Figure 38 (B) has shown the configuration of six circuit boards.This configuration is similar with the configuration of Figure 38 (A), all be to link to each other with mainboard every a circuit board, and the interconnection of these circuit boards all is to be connected according to the mode of face of weld to component side by the circuit mother daughter board connector with local bus.
Figure 38 (B) has shown six circuit boards 1526 (first circuit board), 1525 (second circuit boards), 1532 (tertiary circuit plate), 1533 (the 4th circuit boards), 1534 (the 5th circuit boards) and 1535 (the 6th circuit boards).These six circuit boards link to each other with mainboard 1520 with connector on 1534 (the 5th circuit boards) by 1526 (first circuit boards), 1532 (tertiary circuit plate).Other circuit board 1525 (second circuit board), 1533 (the 4th circuit board) does not directly link to each other with mainboard with 1535 (the 6th circuit boards), and they are by with being connected indirectly of proximate circuitry plate links to each other with mainboard separately.
Set up being connected between pci bus parts, fpga logic device, storage arrangement and the various simulation system control circuit at face of weld and the various circuit mother daughter board connectors between the component side.Connector J5 among the first cover circuit mother daughter board connector, 1990 corresponding Figure 42 is to J16.Connector J17 among the second cover circuit mother daughter board connector, 1991 corresponding Figure 42 is to J28.Connector J3 and J4 among the 3rd cover circuit mother daughter board connector 1992 corresponding Figure 42.
Motherboard connector 1521 to 1524 on the mainboard 1520 is connected to mainboard (and pci bus) on six circuit boards.As mentioned above, 1526 (first circuit boards), 1532 (tertiary circuit plate) and 1534 (the 5th circuit boards) directly link to each other with 1521 with connector 1523,1522 respectively.Other circuit board 1525 (second circuit board), 1533 (the 4th circuit board) does not directly link to each other with mainboard 1520 with 1535 (the 6th circuit boards).Because six circuit boards only need a pci controller altogether, so have only first circuit board 1526 to comprise a pci controller.The motherboard connector 1523 that links to each other with first circuit board 1526 provide to/from the visit of pci bus.Connector 1522 and 1521 are connected to power supply and ground connection.In one embodiment, the spacing of being close to the center to center between the motherboard connector is approximately 20.32mm.
For direct circuit board 1526 (first circuit board), 1532 (tertiary circuit plate) and 1534 (the 5th circuit boards) that link to each other with 1521 with connector 1523,1522 of difference, their J5 is to be positioned on the component side to the J16 connector, and J17 is positioned on the face of weld and local bus connector J3 is to be positioned on the component side to J4 to J28.For the circuit board 1525 (second circuit board), 1533 (the 4th circuit board) and 1535 (the 6th circuit boards) that directly do not link to each other with 1521 with motherboard connector 1523,1522, their J5 is to be positioned on the face of weld to the J16 connector, and J17 is positioned on the component side and local bus connector J3 is to be positioned on the face of weld to J4 to J28.For tail circuit plate 1526 (first circuit board) and 1535 (the 6th circuit board), connector J17 partly is 10 ohm of R bag terminals to J28.
Figure 40 (A) shows that with figure (B) array between various boards is connected.For helping production run, all circuit boards are used with a kind of design proposal.Just as explained above, circuit board is connected on other circuit board by the connector that does not have base plate.Figure 40 (A) has shown two example circuit board 1611 (second circuit board) and 1610 (first circuit boards).The component side of circuit board 1610 is facing to the face of weld of circuit board 1611.Circuit board 1611 comprises many fpga logic devices, other parts and wire line.Specific node node A ' of these logical units and other parts on the circuit board 1611 (numbering 1612) and B ' (numbering 1614) expression.Node A ' is connected to connector pad 1616 by PCB stitching 1620.Similarly, Node B ' be connected to connector pad 1617 by PCB stitching 1623.
Similarly, circuit board 1610 also comprises many fpga logic devices, other parts and wire line.The specific node of these logical units and other parts is represented with node A (numbering 1613) and B (numbering 1615) on the circuit board 1610.Node A is connected to connector pad 1618 by PCB stitching 1625.Similarly, Node B is connected to connector pad 1619 by PCB stitching 1622.
We will discuss the wiring lines road between the node on the various boards of using surface-mount connector now.In Figure 40 (A), required connection is to be based upon (1) to fabricate between indicated node A in path 1620,1621 and 1622 and the B ' and (2) are fabricated between indicated Node B in path 1623,1624 and 1625 and the A '.These connections are to be used for resembling the such path of asymmetric interconnection 1600 between Figure 39 circuit board 1551 and the circuit board 1552.Other asymmetric interconnection is included in the interconnection 1977,1979 and 1981 of the NH on connector 1589 and 1590 two sides to SH.
(N, S) such symmetry interconnects corresponding to interconnection 1515 for A-A ' and B-B '.The perforation connector is used in N and S interconnection, and wherein the SMD connector is used in the asymmetric interconnection of NH and SH.Details reference table D.
Referring now to Figure 40 (B) actual installation of using surface-mount connector is discussed, is used identical sequence number to represent identical part.In Figure 40 (B), circuit board 1611 has shown that node A ' on the component side is connected to connector pad 1636 on the component side by PCB stitching 1620.Connector pad 1636 on the component side is connected to the connector pad 1639 of face of weld by conductive path 1651.Connector pad 1639 on the face of weld is connected to connector pad 1642 on circuit board 1610 component sides by conductive path 1648.At last, the connector pad 1642 on the component side is connected to Node B by PCB stitching 1622.Like this, the node A ' on the circuit board 1611 just can be connected to the Node B on the circuit board 1610.
Equally, in Figure 40 (B), circuit board 1611 has shown the Node B on the component side ' be connected to connector pad 1638 on the component side by PCB stitching 1623.Connector pad 1638 on the component side is connected to the connector pad 1637 of face of weld by conductive path 1650.Connector pad 1637 on the face of weld is connected to connector pad 1640 on the component side by conductive path 1645.At last, the connector pad 1640 on the component side is connected to node A by PCB stitching 1625.Like this, the Node B on the circuit board 1611 ' just can be connected to the node A on the circuit board 1610.Because these circuit boards all adopt same design proposal,, conductive path 1652 and 1653 is used for other circuit board contiguous conductive path 1650 and 1651 with circuit board 1610 so can resembling.Like this, a kind ofly use mounted on surface and perforation connector and need not use connectivity scenario between the unique circuit plate of switch block with regard to having set up.
F. timing-insensitive and glitch-free logical unit
One embodiment of the invention have solved the problem of holding time and clock glitch two aspects.According to one embodiment of the invention, in the process of the hardware model of user's design configurations being gone into reconfigurable computing system, the standard logical devices of finding in user's design is (as latch, trigger) by emulation logic device or timing-insensitive and glitch-free (timing-insensitive glitch-free, TIGF) logical unit replacement.In one embodiment, the trigger pip that is incorporated in the EVAL signal is used for upgrading stored value in these TIGF logical units.Wait for various inputs and other signal in the hardware model of user's design transmission and in estimation process, reach steady state (SS) after, can produce the trigger pip that is used for upgrading the stored or value that latchs of TIGF logical unit.After this, begun the new estimation cycle.In one embodiment, this from the stage that is estimated to triggering be round-robin.
The problem of the holding time of being mentioned above discussing now briefly.The person of ordinary skill in the field knows that a general problem of Logic Circuit Design is exactly that holding time is upset.The data input that holding time is meant a logic element must keep stable minimum time after the variation of control input (as the clock input) has caused latching, obtaining or storing of value that the data input is indicated; Otherwise logic element can not normal operation.
Now we through discussion the example of a shift register demonstrate the requirement of holding time.Figure 75 (A) has shown an exemplary shift register of three D flip-flops connected in series, i.e. the output of trigger 2400 connects is input with trigger 2401, and 2401 output is connected to the input of trigger 2402.Total input signal S
InLink to each other and total output signal S with the input of trigger 2400
OutBe to produce by the output of trigger 2402.These three triggers receive a common clock signal at their input end of clock separately.This shift register is according to following hypothesis design: (1) clock signal arrives all triggers simultaneously, and after detecting the clock signal edge, the input of trigger can not change in the period of holding time (2).
The sequential chart of Figure 75 (B) has been demonstrated the hypothesis of holding time, and wherein system does not destroy the requirement of holding time.Holding time can be different between logic element, but these different holding times all are prescribed at instructions.At time t
0The clock input changes to logical one from logical zero.Shown in Figure 75 (A), the clock input offers each trigger in 2400 to 2402.From t
0This clock edge at place begins, input S
InMust be from time t
0To t
1Holding time T
HKeep stable in period.Similarly, (be D to trigger 2401
2) and 2402 (be D
3) the input holding time that also must begin at triggering edge from clock signal keep stable in the period.Since Figure 75 (A) and 75 (B) have satisfied this requirement, import S so
InJust be displaced to trigger 2400, D
2Input (logical zero) be displaced to trigger 2401 and D
3Input (logical one) then be displaced to trigger 2402.The person of ordinary skill in the field knows after the clock edge is triggered, and supposes that the requirement of holding time is satisfied, so the new value of trigger 2401 input ends (input D
2Logical one) and the new value of trigger 2402 input ends (input D
3Logical zero) will and be stored in the next trigger in next clock ring shift.Following table has been summed up the operation to the shift register of these typical assignment:
|
D
1 |
D
2 |
D
3 |
Q
3 |
Before the clock edge |
1 |
0 |
1 |
0 |
Behind the clock edge |
1 |
1 |
0 |
1 |
When reality was implemented, clock signal can not reach all logic elements simultaneously.Or rather, the design of circuit make clock signal will be almost simultaneously or arrive all logic elements in fact simultaneously.The design of circuit must make clock skew or clock signal arrive between each trigger time sequence difference more than holding time require little.Correspondingly, all logic elements will obtain suitable input value.In the example shown in above Figure 75 (A) and 75 (B), another trigger obtains new input value because the holding time upset that clock signal causes at different time arrival trigger 2400 to 2402 may cause some triggers to obtain old input value.The result makes that shift register can not normal operation.
In the device of the reconfigurable logic (as FPGA) that same shift register designs, if clock is not directly to produce from primary input, circuit design can be become make low skew network clock signal can be distributed in all logic elements and go that these logic elements just can substantially same time detecting clock edge so so.Major clock is produced by self clock test platform program.Usually master clock signal produces in software, has only seldom some major clocks (being 1-10) in typical subscriber's line circuit design.
Yet if clock signal is a logic rather than produced by primary input internally, it is even more important that the problem of holding time just seems.Derive from or gated clock is that the combinational logic that driven by major clock and the combinational network of register produce.The clock that many (promptly 1000 or more) derivation is arranged in typical subscriber's line circuit design.Do not have prevention and other control measure, these clock signals just can arrive each logic element in the different time and clock skew may be longer than holding time.This will cause the failure of circuit design, such as the shift-register circuit shown in Figure 75 (A) and 75 (B).
By the same shift-register circuit shown in Figure 75 (A) holding time being discussed now upsets.At this moment, the single trigger of shift-register circuit is crossed over shown a plurality of reconfigurable logic chip (as a plurality of fpga chips) expansion of Figure 76 (A).First fpga chip 2411 comprises its clock signal clk is fed back to the clocked logic 2410 that the inside of some parts of fpga chip 2412 to 2416 is derived from.In this example, the inner clock signal clk that produces will offer the trigger 2400 to 2402 of shift-register circuit.Chip 2412 comprises trigger 2400, and chip 2415 comprises trigger 2401 and chip 2416 comprises trigger 2402.Two chips 2413 in addition and 2414 are used for demonstrating holding time and upset notion.
Clocked logics 2410 in the chip 2411 receive a major clocks input (perhaps may be the clock input of an other derivation) and produce an internal clock signal CLK.This internal clock signal CLK will be transferred to chip 2412 and be designated as CLK1.This internal clock signal CLK that sends from clocked logic 2410 also can be transferred to chip 2415 and be designated as CLK2 by chip 2413 and 2414.As shown in the figure, CLK1 is input to trigger 2400 CLK2 and then is input to trigger 2401.CLK1 and CLK2 can experience the circuit stitching to postpone, and the edge of CLK1 and CLK2 is just than the marginal delay of internal clock signal CLK like this.And CLK2 also can be because of passing other two chips 2413 and 2414 and have additional delay.
With reference to the sequential chart of Figure 76 (B), internal clock signal CLK is at time t
2Produce and be triggered.Because the circuit stitching postpones, CLK1 can arrive time t
3Just arrive the trigger 2400 in the chip 2412, be designated as T1 this time delay.Shown in as above showing, Q
1Output (or the input D
2) before CLK1 clock edge arrives, be logical zero.Detect the edge of CLK1 at trigger 2400 after, D
1Input must be (promptly up to time t during the holding time H2 of necessity
4) keep stable.At this moment trigger 2400 displacements or stored logic 1 are so that Q
1Output (or D
2) be logical one.
When these took place on trigger 2400, the trigger 2401 in the clock signal clk 2 forward chips 2415 moved.Chip 2413 and 2414 caused delay T2 make CLK2 at time t
5Arrive trigger 2401.D
2Input at this moment be logical one.Behind the required holding time of this trigger 2401, this logical assignment 1 will appear at output Q
2(or D
3) on.Like this, output Q before CLK2 arrives
2For logical one arrives back output at CLK2 still is logical one.This result is incorrect.This shift register should be displaced to logical zero.When register 2400 correctly was displaced to old input value (logical one), trigger 2401 was displaced to new input value (logical one) mistakenly.This is the typical fault operation that is taken place when clock skew (or sequential time delay) is bigger than holding time.In the middle of this example, T2>T1+H2.Generally speaking, unless take some preventive measure, otherwise shown in Figure 76 (A), when being distributed to the logic element on the different chips when a chip clocking and with it holding time takes place possibly and upset.
Now with reference to Figure 77 (A) and 77 (B) the clock aliasing problem is discussed.Usually, when the input of a circuit changed, output also can become a random value in the quite short time before it is decided to be a right value.If random value is exported and read to another circuit in that wrong time detecting just, the result will be incorrect and be difficult to debugging so.This random value to other circuit generation deleterious effect just is called glitch (glitch).In typical logical circuit, a circuit can be another circuit clocking.If one or two circuit exists the sequential time delay that is not compensated, will produce a clock glitch (being the clock edge of beyong contemplation) so and cause the result of a mistake.Identical with the holding time upset, the reason that causes the clock glitch is that some logic element in the circuit design has changed value in the different time.
Figure 77 (A) has shown the exemplary logic circuit of some logic elements for another group logic element clocking; That is, D flip-flop 2420, D flip-flop 2421 and partial sum gate (XOR) 2422 are that D flip-flop 2423 produces a clock signal (CLK3).The D1 of trigger 2420 on circuit 2425 receive it the input data and Q1 output data on circuit 2427.It receives its clock input (CLK1) from a clock logic 2424.CLK refers to that CLK1 refers to that identical signal arrives the signal that trigger 2420 is postponed by the clocked logic 2424 initial clock signals that produce.
The D2 of trigger 2421 on circuit 2426 receive it the input data and at the Q of circuit 2428
2Last output data.It receives its clock input (CLK2) from a clock logic 2424.As mentioned above, CLK refers to by the clocked logic 2424 initial clock signals that produce, and CLK1 refers to that identical signal arrives the signal that trigger 2421 is postponed.
Trigger 2420 and 2421 output separately are input to partial sum gate 2422 on the circuit 2427 and 2428.The data that partial sum gate 2422 will be labeled as CLK3 output to the clock input of trigger 2423.Trigger 2423 is the D on circuit 2429 also
3Input data and at Q
3Output data.
Now with reference to the sequential chart among Figure 77 (B) the clock aliasing problem that this circuit may cause is discussed.The CLK signal is at time t
0Be triggered.When clock signal (being CLK1) arrives trigger 2420 is time t
1When arriving trigger 2421 is CLK2 time t
2
Suppose D
1And D
2Input all be logical one.When CLK1 at time t
1The output of Q1 will be logical one (shown in Figure 77 (B)) when arriving trigger 2420.CLK2 arrives trigger 2421 at time t2 after a while, and like this, the Q2 output on the circuit 2428 all remains on logical zero from time t1 to time t2.Even required signal is a logical zero (1XOR1=0), but partial sum gate 2422 produces a logical one appears at trigger 2423 as CLK3 input end of clock at time t1 during time t2.The generation of this CLK3 in time t1 is during the time t2 is a clock glitch.Therefore, no matter whether expect that any logical value that appears at the D3 on trigger 2423 incoming lines 2429 all has been stored, and this moment, trigger 2423 all set was received in input next time on the circuit 2429.If design is correct, the time delay of CLK1 and CLK2 can reduce to minimum so, so just can not produce the clock glitch, and perhaps time of being continued of clock glitch is very short at least, can not exert an influence to the part that circuit is left.During latter event, if the clock skew between CLK1 and CLK2 is enough short, then XOR gate postpones long enough, can filter glitch and disturb, and can the remainder of circuit not exerted an influence.
The method of two kinds of known solution holding time upset problems is as follows: (1) sequential adjustment, (2) sequential are synthetic again.At United States Patent (USP) 5,475, the sequential adjustment of being discussed in 830 requires to insert the holding time that enough delay elements (as impact damper) prolong logic element in some signal path.For example, increasing enough delays on input D2 in above-mentioned shift-register circuit and the D3 can avoid holding time to upset.Like this, in Figure 78, shown the same shift-register circuit that respectively delay element 2430 and 2431 is added to input D2 and D3.After the result just can design delay element 2430 and makes that time t4 occurs in time t5, T2<T1+H2 (Figure 76 (B)), the generation that does not so just have holding time to upset.
Potential problems of timing adjusting method are the technical parameters that it too relies on fpga chip.The person of ordinary skill in the field knows the reconfigurable logic chip resemble the fpga chip, by the look-up tables'implementation logic element.The delay of look-up table is to indicate in instructions in the chip, and the deviser will depend on concrete time delay and uses timing adjusting method to avoid holding time to upset.Yet this postpones just individual estimated value and can change along with the difference of chip.Another potential problems of timing adjusting method are that the deviser must compensate the line delay that exists in the entire circuit design process.Though this is not an impossible mission, the estimation of line delay is to need time consuming and cause mistake easily.In addition, the sequential adjustment does not solve the problem of clock glitch.
Another solution is that the sequential that the VirtualWires technology of IKOS is introduced is synthesized again.The synthetic again notion of sequential comprises that the circuit design with a user is transformed into the design of a functional equivalent simultaneously by finite state machine and the strict sequential of controlling clock and output connecting pin signal of register.Sequential is synthetic again comes to reset sequential for what subscriber's line circuit designed by introducing an independent high-frequency clock.It also with latch, gated clock and multiple synchronously and asynchronous clock be transformed into a single clock Synchronization Design based on trigger.Like this, sequential is synthesized the generation of using register to come the precision that signal moves between control chip and avoid the chip chamber holding time to upset at the input and output output connecting pin of each chip again.Sequential is synthetic more also to be used a finite state machine and is ranked from the input of other chip, arrives the output of other chip and the renewal of internal trigger based on reference clock in each chip.
The sequential that Figure 79 has shown the same shift register of being introduced in the use above-mentioned discussion related with Figure 75 (A), 75 (B), 76 (A) and 76 (B) is an example of combiner circuit again.Basic three trigger shift registers design has been transformed into the design of a functional equivalent.Chip 2430 comprises the original internal clocking formation logic 2435 that links to each other with a register 2443 by circuit 2448.Clocked logic 2435 produces the CLK signal.First finite state machine 2438 also links to each other with register 2443 by circuit 2449.The register 2443 and first finite state machine 2438 all are to be controlled by the overall reference clock that does not rely on design.
CLK also passed chip 2432 and 2433 and transmits before arriving chip 2434.In chip 2432, second finite state machine 2440 is by circuit 2462 controls one register 2445.The CLK signal is delivered to register 2445 from register 2443 by circuit 2461.Register 2445 outputs to next chip 2433 by circuit 2463 with the CLK signal.Chip 2433 comprises the 3rd finite state machine 2441 controlling register 2446 by circuit 2464.Register 2446 outputs to chip 2434 with the CLK signal.
Chip 2431 comprises initial trigger 2436.Register 2444 receives input S
InAnd will import S by circuit 2452
InOutput to the D of trigger 2436
1In the input.The Q of trigger 2436
1Output links to each other with register 2466 by circuit 2454.Having ideals, morality, culture, and discipline limit state machine 2439 starts circuit 2453 control triggers 2436 by circuit 2451 control registers 2444 by circuit 2455 control registers 2466 and by latch.Having ideals, morality, culture, and discipline limit state machine 2439 also receives initial clock signal clk by circuit 2450 from chip 2430.
Chip 2434 comprises initial trigger 2437, it by register 2466 received signals of circuit 2456 from the chip 2431 to D
2Input.The Q of trigger 2437
2Output links to each other with register 2447 by circuit 2457.The 5th finite state machine 2439 starts circuit 2458 control triggers 2437 by circuit 2459 control registers 2447 and by latch.The 5th finite state machine 2442 also receives initial clock signal clk by chip 2432 and 2433 from chip 2430.
Utilize sequential synthetic again, finite state machine 2438 to 2442, register 2443 to 2447 and 2466, and independent overall reference clock is used to control the signal flow that strides across the multicore sheet and upgrades internal trigger.Like this, the CLK signal is ranked by register 2443 by first finite state machine 2438 to the distribution of other chip in chip 2430.Similarly, having ideals, morality, culture, and discipline limit state machine 2439 is arranged by the input S of register 2444 to trigger 2436 in chip 2431
InTransmission, and the Q by register 2466
1The transmission of output.The latch function of trigger 2436 also is by limit one of state machine 2439 to latch enabling signal control from having ideals, morality, culture, and discipline.The logic of other chip 2432 to 2434 also is suitable for same principle.Owing to strict control has been carried out in chip chamber input passing time, chip chamber output passing time and the renewal of internal trigger state, has been upset thereby got rid of the chip chamber holding time.
Yet sequential synthetic technology again requires the subscriber's line circuit design is transformed into a suitable circuit of function that comprises additional finite state machine and register, and the circuit that this function is suitable is much bigger.In general, in order to carry out the necessary additional logic of this technology can reach at most each chip useful logic 20%.And this technology can not be avoided the clock aliasing problem fully.For avoiding the clock aliasing problem, use sequential again the deviser of synthetic technology also must take the preventive means of adding.Conservative method is that circuit of design makes the input with the logical unit of gated clock can not change in the same time.A positive method is to use lock to postpone to filter glitch so that they can not influence the remaining part of circuit.Yet as the above, synthetic again some the additional effective measures that need of sequential are avoided the clock glitch.
The various embodiment of solution holding time of the present invention and clock aliasing problem will be discussed now.In the process of the hardware model of software model that user's design configurations is mapped to rcc computing system and RCC array, according to one embodiment of the invention, the shown latch of Figure 18 (A) is carried out emulation with the insensitive glitch-free of a sequential (TIGF) latch.Similarly, the shown design trigger of Figure 18 (B) uses the TIGF trigger according to one embodiment of the invention to carry out emulation.No matter these TIGF logical units are with the latch or the form of trigger, also can be known as the emulation logic device.The renewal of TIGF latch and trigger is controlled by overall trigger pip.
In one embodiment of this invention, not that all logical units in user's design circuit are all replaced by TIGF.User's design circuit comprises by major clock and starting or the part of timing and by the other parts of the clock control of gate or derivation.Because holding time is upset and the clock glitch is the problem that belongs to the latter, wherein logical unit is by the clock control of gate or derivation.According to one embodiment of the invention, have only the logical unit of these specific clock controls by gate or derivation to be replaced by the TIGF logical unit.In other embodiments, all logical units in user's design circuit are all replaced by the TIGF logical unit.
Before TIGF latch of the present invention and trigger embodiment are discussed, overall trigger pip is discussed earlier.On the whole, overall trigger pip is to be used for making TIGF latch and trigger to keep its state (promptly keeping old input value) in estimation process and renewal its state (promptly storing new input value) during very short triggering.In one embodiment, the overall trigger pip shown in Figure 82 is from above-mentioned EVAL Signal Separation and derive out.In this embodiment, overall trigger pip has the triggering cycle of a long estimation cycle and a weak point subsequently.Overall situation trigger pip is followed the trail of the EVAL signal in estimation process, and can produce the short trigger pip that is used for upgrading TIGF latch and trigger when finishing when EVAL circulates.In another embodiment, the EVAL signal is exactly overall trigger pip, and it is to be in a logic state (as logical zero) and to be to be in another logic state (as logical one) in non-estimation or TIGF latch/trigger update stage in the estimation cycle.
About the discussion of rcc computing system and RCC hardware array, the estimation cycle is used for the variation of all primary inputs and trigger/latch means is transferred to whole user's design, next simulation loop as above.In the process of transmission, the RCC system is in waiting status always and all reaches steady state (SS) up to all signals of system.With user's design map and be configured to go in the suitable reconfigurable logical unit (as fpga chip) of RCC array after, calculate the estimation cycle.Correspondingly, the estimation cycle is determined that by specific design that is to say, the estimation cycle of different user design can be different.The duration in this estimation cycle should be able to guarantee that all signals of system can transmit and reached steady state (SS) at the next one before the short triggering stage by total system.
Shown in Figure 82, short triggering stage and estimation cycle are contiguous.In one embodiment, lacking the triggering stage lags behind the estimation cycle.Before the short triggering stage, input signal runs through user's design circuit in the estimation cycle hardware model configuration section transmits.According to one embodiment of the invention, come the short triggering stage of mark controlling all TIGF latch and triggers in user's design with a variation of EVAL logic state signal, like this they just can be updated to transmitted from the new value that reaches the estimation cycle after the steady state (SS).This weak point triggering stage is undertaken by a low skew network that the overall situation distributes and the weak point of its duration (be shown in Figure 82 from t
0To t
1And from t
2To t
3) can satisfy reconfigurable logical unit and carry out the requirement that proper operation allowed.In this short triggering stage, can take a sample to new primary input at each input phase of TIGF latch and trigger, and old being stored in the next stage that identical TIGF latch and the value in the trigger can be output to the RCC hardware of user's design.In the following discussion, the part of the overall trigger pip that is taken place in the short triggering stage will be called as TIGF and trigger, TIGF trigger pip, trigger pip or be called triggering simply.
Figure 80 (A) has shown at first latch 2470 shown in Figure 18 (A).This latch is operated by following program:
if(#S),Q←1
else?if(#R),Q←0
else?if(en),Q←D
Else Q Keeps the old value. (keeping old value)
Because this latch is the level induction and is asynchronous, so as long as clock input and latch start input to be activated, to export Q so and will follow the trail of input D.
Figure 80 (B) shows is TIGF latch according to one embodiment of the invention.As the latch among Figure 80 (A), the TIGF latch has D input, a startup input, a setting (S), resets (R) and an output Q.It has one to trigger input in addition.The TIGF latch comprises a D flip-flop 2471, a multiplexer 2472, an OR-gate 2473 and an AND gate 2474 and various interconnection.
D flip-flop 2471 receives its input by circuit 2476 from the output of AND gate 2474.D flip-flop also is triggered at its input end of clock by the trigger pip on the circuit 2477, and this trigger pip is that the RCC system carries out overall situation distribution according to a plan that depends on the strictness of estimation round-robin.The output of D flip-flop 2471 links to each other with multiplexer 2472 by circuit 2478.TIGF latch D input on the other input of multiplexer 2472 and the circuit 2475 links to each other.Multiplexer is controlled by the enabling signal on the circuit 2484.The output of multiplexer 2472 links to each other with an input of OR-gate 2473 by circuit 2479.OR-gate 2473 other inputs are to link to each other with setting (S) input on the circuit 2480.The output of OR-gate 2473 links to each other with the input of AND gate 2474 by circuit 2481.AND gate 2474 other inputs are to link to each other with (R) signal that resets on the circuit 2482.As mentioned above, the output of AND gate 2474 feeds back to the input of D flip-flop 2471 by circuit 2476.
The operation of TIGF latch embodiment of the present invention is discussed now.In the embodiment of this TIGF latch, D flip-flop 2471 keeps the current state (being old value) of TIGF latch.Circuit 2476 on D flip-flop 2471 input ends has provided the new input value that need be latched into the TIGF latch.Circuit 2476 is for the value of making new advances because the primary input (D input) of TIGF latch at last can be from multiplexer 2472 (having the correct enabling signal that will finally provide on the circuit 2484) on the circuit 2475, pass OR-gate 2473 and pass AND gate 2474 arrival circuits 2483 at last, 2483 input signals that the TIGF latch is new of circuit feed back to the D flip-flop 2471 on the circuit 2476.Trigger pip on the circuit 2477 is upgraded the TIGF latch by input value new on the circuit 2476 is recorded on the D flip-flop 2471.Like this, the output of D flip-flop 2471 has just shown the current state (being old value) of TIGF latch on the circuit 2478, and the input on the circuit 2478 has then shown the new input value that need be latched in the TIGF latch.
Multiplexer 2472 receives from the current state of D flip-flop 2471 and the new input value on the circuit 2475.The function that starts circuit 2484 is the selector signal as multiplexer 2472.Because the TIGF latch has trigger pip just can upgrade (promptly storing new input value) on circuit 2477, so the D input value of TIGF latch and the startup value on the circuit 2484 can arrive TIGF with any order on the circuit 2475.If running into, this TIGF latch (designing other latch of hardware model with the user) in the circuit that uses a conventional latch, can cause the situation that holding time is upset usually; clock signal far is later than another clock signal and arrives among Figure 76 as mentioned above (A) and 76 (B), and this TIGF latch can come normal operation till correct old value being remained to when trigger pip occurring on the circuit 2477 so.
Trigger pip distributes by low skew global clock network.
***
This TIGF latch has also solved the problem of clock glitch.Notice that clock signal is replaced by the enabling signal in the TIGF latch.Enabling signal on the circuit 2484 may occur disturbing in the process in estimation cycle but TIGF can continue to keep current state and can not lose efficacy.In one embodiment, unique mechanism that TIGF can be updated when signal reaches steady state (SS) is the trigger pip by estimating that all after dates produce.
Figure 81 (A) has shown at first trigger 2490 shown in Figure 18 (B).This latch is operated by following program:
if(#S),Q←1
else?if(#R),Q←0
else?if(positive?edge?of?CLK),Q←D
Else Qkeeps the old value (keeping old value).
Because this latch is an edge-triggered, so, export Q so and will follow the trail of input D in the positive edge of clock signal as long as the flip-flop toggle input is activated.
Figure 81 (B) shows is TIGF D flip-flop according to one embodiment of the invention.As the trigger among Figure 81 (A), the TIGF trigger has D input, clock input, a setting (S), to reset (R) and an output Q.It has one to trigger input in addition.The TIGF trigger comprises three D flip-flops 2491,2492 and 2496, one multiplexers, 2493, one OR-gates 2494 and two AND gates 2495 and 2497 and various interconnection.
TIGF D input on trigger 2491 receiving liness 2498, the triggering input on the circuit 2499, and on circuit 2500, provide Q output.This outlet line 2500 is also as an input of multiplexer 2493.Another input of multiplexer 2493 is by the Q output of circuit 2503 from trigger 2492.The output of multiplexer 2493 links to each other with an input of OR-gate 2494 by circuit 2505.OR-gate 2494 other inputs are setting (S) signals on the circuit 2506.The output of OR-gate 2494 links to each other with an input of AND gate 2495 by circuit 2507.AND gate 2495 other inputs are resetting on the circuit 2508 (R).The output of AND gate 2495 (also being the whole output of TIGF Q) links to each other with the input of trigger 2492 by circuit 2501.Trigger 2492 also has one to trigger input on circuit 2502.
Get back to multiplexer 2493, its selector switch input links to each other with the output of AND gate 2497 by circuit 2509.An input of AND gate 2497 is from the CLK signal on the circuit 2510, and another input is from the output of trigger 2496 on the circuit 2512.The trigger 2496 also CLK signal from the circuit 2511 receives input and receives its trigger pips from circuit 2513.
The operation of TIGF trigger embodiment of the present invention is discussed now.In this embodiment, the TIGF trigger receives trigger pips at following three different points: via the D flip-flop 2491 of circuit 2499, via the D flip-flop 2492 of circuit 2502 and via the D flip-flop 2496 of circuit 2513.
The TIGF trigger only is detected Shi Caihui and stores input value at the edge of clock signal.According to one embodiment of the invention, desired edge is the positive edge of clock signal.For detecting the positive edge of this clock signal, provide edge detector 2515.Edge detector 2515 comprises a D flip-flop 2496 and an AND gate 2497.Edge detector also upgrades by the trigger pip of D flip-flop 2496 on the circuit 2513.
D flip-flop 2491 keeps the new input value of TIGF trigger and stops any change of D input on the circuit 2498, has trigger pip on circuit 2499.Like this, before each estimation cycle of TIGF trigger, new value is stored in the D flip-flop 2491.Correspondingly, the TIGF trigger is just avoided the holding time upset by the new value that prestores up to the TIGF trigger signal update that is triggered.
The currency (or old value) that keeps the TIGF trigger D flip-flop 2492 trigger pip occurs on circuit 2502 before.This value is the state before emulation TIGF trigger upgrades back and next estimation cycle.The input of D flip-flop 2492 keeps new value (also being the value on the valid period circuit 2500 of estimating the cycle) to the circuit 2501.
New input value on multiplexer 2493 receiving liness 2500 and the current old value that is stored in the TIGF trigger on the circuit 2503.Based on the selector signal on the circuit 2504, multiplexer or the new value (circuit 2500) of output or export the output of old value (circuit 2503) as emulation TIGF trigger.Designing transmission signals all in the hardware model the user up till now exported along with the clock glitch changes near steady state (SS).Like this, the input on the circuit 2501 will provide the new value that is stored in the trigger 2491 latter stage in estimation.When the TIGF trigger received trigger pip, trigger 2492 was storing the new value that occurs now on circuit 2501, and trigger 2491 is then storing the next new value on the circuit 2498.The TIGF trigger just no longer is subjected to the negative effect of clock glitch like this, according to an embodiment of the invention.
More at large setting forth, this TIGF trigger also has some to resist the effect of clock glitch.Persons of ordinary skill in the field will appreciate that the clock glitch just can not influence the circuit of this TIGF trigger of any use so if replace trigger 2420,2421 and 2423 among Figure 77 (A) with the TIGF trigger among Figure 81 (B) embodiment.With reference to once Figure 77 (A) and 77 (B), it is because of from time t that the clock glitch can have a negative impact to the circuit of Figure 77 (A)
1To t
2During internal trigger 2423 when should not writing down new value, write down new value.The characteristic of CLK1 and CLK2 signal skew forces partial sum gate 2422 at t
1To t
2Produce a logical one state during this time, driving the clock line of next trigger 2423.According to one embodiment of the invention, if use TIGF trigger clock glitch just can not influence the new value of record.Replace trigger 2423 with the TIGF trigger, in case signal reaches steady state (SS) in the estimation cycle, the trigger pip in so short triggering stage will make the TIGF trigger store the new value of (Figure 81 (B)) in the trigger 2491.Thereafter image pattern 77
(B) such any of the clock glitch in from t
1To t
2Period in the clock glitch can not write down new value.The TIGF trigger only can upgrade along with trigger pip, and this trigger pip just can be provided for the TIGF trigger when only the signal in being transmitted in circuit reaches steady state (SS) after the estimation cycle.
Though this specific TIGF trigger embodiment is a D flip-flop, and other trigger (as T, JK, SR) also within the scope of the invention.The trigger of the edge-triggered of other type can by increase before D input " with "/" or " logic and derive out by D flip-flop.
VII. emulating server
Emulating server can allow the multi-user to enter identical reconfigurable hardware cell according to another embodiment of the present invention, so that simulate and quicken identical or different user's design in the mode of time-sharing operation effectively.Use operation simulation program and state exchange mechanism at a high speed, cause effectively simulation process of high-throughput ground for emulating server provides.Server makes multi-user or processing enter the reconfigurable hardware unit, to realize the purpose of acceleration and hardware state exchange.Quicken or arrive hardware state in case finish, each user or processing just can only be simulated in software, have so just discharged the control to the reconfigurable hardware unit, make other users or processing to control it.
In the part of the emulating server of this instructions, we have used such as " operation " and " processing " such term.In this instructions, term " operation " and " processing " can be exchanged use usually.The batch processing system in past is carried out " operation " and time sharing operating system stores and carry out " processing " or program.And these operations and processing are similar in system of today.Term " operation " just is not limited to batch processing system in this instructions like this, and " processing " just is not limited to time sharing operating system.And " operation " and " processings " are equal under certain extreme case, and that exactly ought " processing " can be carried out or not have in a timeslice (time slice) under the situation that other timesharing intervention interrupts.Require many timeslices to finish if another extreme case is exactly " operation ", " operation " is the subclass of " processing " so.Therefore, if one " processing " needs many timeslices to finish execution owing to the appearance of user/processing of other All factors being equal, preference will be give to power, it just is divided into " operation " so.And, if one " processing " perhaps handled very short so that just can finish in a timeslice because it is the user of unique highest priority, so one " processing " just be equal to one " operation ".Such user just can with one or more " processing " or the program interaction of in simulation system, having loaded and having carried out, and each " processings " may need one or more " operation " to finish in time-sharing system.
In a system configuration, the multi-user can utilize identical multiprocessor workstation by remote terminal under non-network environment, and enters same reconfigurable hardware unit, thereby checks or debug identical or different subscriber's line circuit design.In non-network environment, remote terminal is to realize its processing capacity by linking with a host computer system.These non-network settings make the multi-user can enter same user and design the purpose that reaches parallel debugging.This enter to handle by time-sharing operation realize, and when carrying out this time-sharing operation, scheduler program decision multi-user's preferential right of ingress, exchanging operation, and in predetermined user, optionally latch the hardware cell inlet.In other cases, the multi-user can enter same reconfigurable hardware cell and debugs by being used for his (she) oneself server of separation and different users design.In this configuration, the multimicroprocessor in multi-user or processing and the operation systems share workstation.In another configuration, separation based on the multi-user in the workstation of microprocessor or handle and can enter the same hardware cell that reconfigures by network, thereby check or debug identical or different subscriber's line circuit design.Similarly, this enter to handle by time-sharing operation realize, and in carrying out this time-sharing operation, scheduler program decision multi-user's preferential right of ingress, exchanging operation, and in predetermined user, optionally latch the hardware cell inlet.Under network environment, scheduler program is accepted the network requests by UNIX socket (socket) system call.This operating system uses socket to send instruction to scheduler program.
As previously mentioned, the operation simulation program is used the multipriority round robin algorithm of trying to be the first.In other words, have the user of high priority more or handle at first serviced, till this user or its operation and end process of finishing dealing with.User or in handling with equal priority, use the round robin algorithm of trying to be the first, each user or processing all are assigned to an identical timeslice and go executable operations up to finishing.This timeslice is very short, to such an extent as to the multi-user or handle before serviced, need not to wait for too of a specified duration.Also long enough during this period of time makes scheduler program at emulating server interrupt a user or processing and before exchanging into and carrying out new user job, has carried out sufficient operation.In one embodiment, the default time sheet was 5 seconds, and can be set by the user.In one embodiment, scheduler program sends especially to the built-in scheduler program of operating system and calls.
Figure 45 is a non-network environment with multiprocessor workstation according to one embodiment of the invention.Figure 45 is the modification of Fig. 1, therefore, components identical, unit has been used identical numbering.Workstation1 100 comprises local bus 1105, main frame/PCI bridge 1106, memory bus 1107, and primary memory 1108.Also may have a cache subsystem (not shown).The user interface section (as display, keyboard) that also has other, but in Figure 45, do not show.Workstation1 100 also comprises multiple microprocessor 1101,1102,1103,1104, they by scheduler program 1117 be connected/path 1118 is coupled on the local bus 1105.Known to the person of ordinary skill in the field, operating system 1121 is used to different user, processing and device management file and Resources allocation in the computing environment for whole computing environment provides user's hardware interface basis.In order to make notion clearer, we have shown operating system 1121 and bus 1122.Can find the list of references about operating system in " MODERN OPERATINGSYSTEMS (1996) " of " OPERATING SYSTEMCONCEPTS (1988) " that AbrahamSilberschatz and James L.Peterson collaborate and William Stalling, its content is incorporated this paper by reference into.
In one embodiment, what workstation1 100 adopted is Enterprise 450 systems of Sun Microsystem, and what it used is UltraSPARC II processor.Sun 450 systems do not conduct interviews to storer by local bus, visit reservoir but link to each other with reservoir via crossbar switch by some private buss.Therefore, multiprocessing can be carried out instruction separately by multimicroprocessor, by local bus storer is not conducted interviews and moves.The content of the instructions of Sun 450 systems and UltraSPARC will be incorporated this paper by reference into.Though Sun Ultra 60 systems only can allow 2 processors, it remains an other example of microprocessor system.
Scheduler program 1117 provide by device driver 1119 be connected/timesharing of the 1120 pairs of reconfigurable hardware cells 20 in path visit.Scheduler program 1117 is mainly realized in software, so that carry out reciprocation with the operating system of host computer system, realizing in hardware of part also in addition is so that by supporting simulation job to interrupt and exchange is gone into/gone out the simulation process and carries out reciprocation with emulating server.Scheduler program 1117 and device driver 1119 will be done below and go through.
In workstation1 101, each microprocessor 1101 to 1104 all has independent processing and does not rely on the ability of other microprocessor.In an embodiment of the present invention, workstation1 100 moves under the operating system based on UNIX, although among other the embodiment, 1101 workstations can move under the operating system based on Windows or Macintosh.Based on the system of UNIX is that the user has disposed X-Windows manages program, task and file as required as the user user interface.If want to understand the details of UNIX operating system, please refer to " the THE DESIGN OFTHE UNIX OPERATING SYSTEM (1986) " of Maurice J.Bach.
In Figure 45, the multi-user can enter workstation1 100 by remote terminal.Sometimes, each user can come working procedure with a specific CPU.Under other situation, each user uses different CPU according to resource limit.Usually, operating system 1121 these visits of decision, in fact, operating system itself can jump to another from a CPU and finish its task.In order to handle time-sharing operation, scheduler program is accepted network requests by the socket system call, operating system 1121 is sent system call, and operating system 1121 begins to produce look-at-me by device driver 1119 to reconfigurable hardware cell 20 conversely and produces and handle the operation of trying to be the first.The generation of look-at-me is the step in many dispatching algorithm steps, these steps also comprise stop current operation, for job storage status information, the exchanging operation of current termination with carry out new operation.The server scheduling algorithm is discussed below.
Socket (socket) and socket system call are discussed now briefly.In one embodiment, UNIX operating system can be with the time-sharing operation mode operation.In the regular hour (being timeslice), the UNIX kernel is distributed to a processing with CPU, and when finishing during this period of time, this processing of trying to be the first is that another one is handled and arranged next timeslice then.Dispatched once more in the processing that last timeslice is tried to be the first, carried out in the timeslice a little later.
One of method that realizes and promote the communication between processing and allow use complex network agreement is to adopt socket.Kernel has the layer of three performance functions under client/server mode, comprises socket layer, protocol layer and mechanical floor.Top layer (socket layer) provides the interface between system call and other layers (protocol layer and mechanical floor).In general, socket layer has the end points (end points) of coupling connection CLIENT PROGRAM and server program.The socket end points can be positioned on the different machines.The middle layer is the protocol mode that protocol layer provides communication, such as TCP and IP.Bottom is the device driver that mechanical floor comprises control network devices.A device driver be exemplified as Ethernet driver based on Ethernet.
Handling the application client server mode exchanges.In this pattern, server program is accepted in the socket of an end points, and CLIENT PROGRAM is accepted in server program by the socket of another end points in a two-way communication path.Kernel is being kept the inherence communication between three layers of each client, server and sending data as required from client to the server.
Socket comprises several system calls, comprising a socket system call of setting up the communication path end points.A lot of programs are used the socket descriptor sd in many system calls.Binding system calls (bind system call) name and a socket descriptor sd is connected.The example of some other system calls comprises that connected system calls (connectsystem call), it requires kernel and socket to interrelate, shutdown system is called (close systemcall) and is closed socket, cut off system call (shutdown system call) and close the socket connection, send and call (send and recv system call) with receiving system by socket transmission data that are connected.
What Figure 46 had shown is an alternative embodiment of the invention, wherein many workstations by a network share one independent based on the simulation system on the time-sharing operation basis.Many workstations are by scheduler program 1117 and simulation system coupling connection.Under the computing environment of simulation system, independent CPU 11 joins with local bus 12 couplings on workstation1 110.This system also can be equipped with many CPU.Known to the person of ordinary skill in the field, equipped operating system 1118 in addition, and nearly all program and use the top be present in operating system.For making clear concept, operating system 1121 together is illustrated together with bus 1122.
In Figure 46, workstation1 110 comprises those components/units in Fig. 1, and they are coupled to local bus 12 together with scheduler program 1117 and scheduler program bus 1118 by operating system 1121.Scheduler program 1117 is by sending socket call control timesharing calling party station 1111,1112 and 1113 to operating system 1121.Scheduler program 1117 major parts realize in software, partly realize in hardware.
In this drawing, only shown three can be by the user of access to netwoks simulation system.Certainly, the other system setting can have more than three or less than three users.Each user is by remote work station 1111,1112, or 1113 access system.The long-distance user stands and 1111,1112 and 1113 is coupled to scheduler program 1117 by network docking station 1114,1115 and 1116 respectively.
Known to the person of ordinary skill in the field, device driver 1119 is connected between pci bus 50 and the reconfigurable hardware cell 20.Between device driver 1119 and reconfigurable hardware cell 20, have and connect or conductive path 1120.In this network multi-user embodiment of the present invention, scheduler program 1117 is connected with device driver 1119 by operating system 1121, gets in touch with and control reconfigurable hardware cell 20, is used for simulating after hardware-accelerated and the hardware state recovery.
In addition, in one embodiment, analog operation station 1110 is Enterprise 450 systems of a Sun Microsystem, and it uses UltraSPARC II multiprocessor.With different by the local bus reference-to storage, Sun 450 systems can visit storer so that multiprocessor links to each other with storer via a crossbar switch by a specialized bus, but not the associating local bus.
Figure 47 has shown the high-level structure according to an emulating server of network implementation example of the present invention.Here, clearly do not demonstrate operating system, but the person of ordinary skill in the field knows, always exist the operating system that is used for file management and resources allocation, serve different users, processing and device in the analog computation environment.Emulating server 1130 comprises scheduler program 1137, one or more device driver 1138 and reconfigurable hardware cell 1139.Though in Figure 45 and 46, emulating server is not clearly illustrated out that it comprises scheduler program 1117 as a single integral unit, device driver 1119 and reconfigurable hardware cell 20.Get back to Figure 47, emulating server 1130 is respectively by 1134,1135 and 1136 and three teller work stations 1131,1132, network connection/path and 1133 couplings connection.As mentioned above, also can have more than three or less than three workstations and emulating server 1130 couplings connection.
Scheduler program in emulating server is based upon on the round robin algorithm basis of trying to be the first.In fact, wheel commentaries on classics scheme allows several users or program to carry out according to priority to finish cycling.Like this, each simulation job (interrelating with a workstation under the network environment or a user/program under the non-network environment of multiprocessing) just is awarded priority level and carries out required regular time sheet.
Usually, having more, the operation of high priority at first has been performed.Under a kind of extreme case, if each different user has different right of priority, so earlier service have highest priority the user till its end of job, and last service has the user of lowest priority.Do not use timesharing at this, because each user has different right of priority, and scheduler program is only served the user according to right of priority.This situation to have only a user capture simulation system similar up to situation about finishing.
Another extreme case is exactly that different users has identical right of priority.The timeslice notion of so just using first in first out (FIFO) to arrange.For the operation that has equal priority, no matter from which earlier, its end is all carried out in each operation or fixed time slicing stops.If operation is not finished in its timeslice, then must preserve the simulated image relevant with its completed task, be used for later recovery and execution.It is last that this not intact operation just comes sequence then.In next timeslice, recover and carry out the simulated image (if any) of the next operation of having preserved then.
The operation of higher-priority can be tried to be the first than the operation of low priority.In other words, the operation of equal priority is carried out with wheel revolving die formula and is finished by timeslice up to it.Thereafter, with the operation of wheel revolving die formula execution than low priority.Operation than low priority is moving if the operation of a certain higher-priority is inserted in the sequence, and the operation of higher-priority will have precedence over the operation than low priority so, executes up to the operation of higher-priority.Therefore, the operation of higher-priority just moved before the operation than low priority begins to carry out and has finished.If the operation than low priority has begun to carry out, the operation that then can end to carry out than low priority executes up to the operation of higher-priority.
In one embodiment, UNIX operating system provides the robin scheduling algorithm of trying to be the first on fundamental sum basis.According to one embodiment of the invention, the dispatcher algorithm collaborative work of the dispatcher algorithm of emulating server and operating system.In the system based on UNIX, the feature of trying to be the first of dispatcher algorithm makes running program can seize user-defined scheduling.For the time-sharing operation plan can be implemented, the operation simulation program has been used a kind of multiple right of priority round robin algorithm of trying to be the first on the dispatching algorithm of operating system own.
According to one embodiment of the invention, the relation between multi-user and the emulating server is followed a client/server mode, and wherein the multi-user is the client, and emulating server is a server.Carry out communication by socket call between user client and the server.Briefly with reference to Figure 55, the client comprises CLIENT PROGRAM 1109, socket system call parts 1123, UNIX kernel 1124 and ICP/IP protocol parts 1125.Server comprises ICP/IP protocol parts 1126, UNIX kernel 1127, socket system call parts 1128 and emulating server 1129.Many clients may need simulation job to be performed simulation by the UNIX socket call that client applications sends in server.
In one embodiment, a typical event sequence comprises that a plurality of clients send request by the UNIX socket protocol to server.For each request, server thinks all whether it runs succeeded about instruction., for the request of server queue state, server acknowledge with the situation of current queue so that present to the user rightly.Following table F has listed the relevant socket instruction from the client
Table F: client's socket instruction
Instruction | Describe |
0 | Begin simulation<design 〉 |
1 | Suspend simulation<design 〉 |
2 | Withdraw from simulation<design 〉 |
3 | The simulation process is redistributed right of priority |
4 | The design Storage emulation mode |
5 | Quene state |
For each socket call, with each bar instruction back of integer coding have some additional parameters as represent name of design<design parameter.If this instructs successful execution, the response that emulating server sends is 0, if failure then be 1.For the instruction 5 that requires quene state, an embodiment of instruction feedback is with the ASCII literal of " 0 " character ending, is presented on user's the display screen.After having used these system's socket calls, can send or receive correct communication protocol signal to reconfigurable hardware cell by device driver from it.
Figure 48 is the structure of emulating server according to an embodiment of the invention.As mentioned above, single emulating server can provide service to reach under the time-sharing operation mode simulation in user's design and hardware-accelerated purpose to multi-user or processing.Therefore, user/processing 1147,1148,1149 is respectively by communication line 1150,1151,1152 between handling and emulating server 1140 couplings connection.Communication line can be present in and be used for multiprocessor setting and operation in the same workstation between these were handled, and perhaps was present in the use that is used for multiple-workstation in the network.In order to carry out communication with reconfigurable hardware cell, each simulation process comprises software simulation state and hardware state.Communication is carried out by UNIX socket or system call between the processing among the software process, it can allow when being equipped with simulator insertion card, this simulation process is present on the same workstation, perhaps is present on the workstation of the separation that links to each other by the TCP/IP network.Like this, the communication with emulating server can start automatically.
In Figure 48, emulating server 1140 comprises server display 1141, simulation job queue table 1142, priority classification device 1143, operation interchanger 1144, device driver 1145 and a reconfigurable hardware cell 1146, and simulation job queue table 1142, priority classification device 1143 and operation converter 1144 have been formed scheduler program shown in Figure 47 1137.
Server display 1141 provides user interface function for the system manager.The user can show simulation job in the formation by command system, and dispatching priority, service recorder and simulation job exchange efficiency are monitored the state of emulating server.Other functions of use also comprise the editing operating right of priority, delete simulation job and reset the emulating server state.
Simulation job queue table 1142 is listed is all uncompleted simulation requests in the formation of being inserted by scheduler program.This form comprises that job number, software simulation are handled number, software simulation image, hardware simulation image file, design configurations file, priority number, hardware size, software size, the integration time of dry run and owner's identity.Job queue is realized according to " first in first out " queueing form (FIFO).Therefore, when requiring a new operation, it is placed in the end of formation.
Which simulation job in the formation is carried out in 1143 decisions of priority classification device.In one embodiment, the simulation job priority scheme is can (promptly can be controlled and set by the system manager) defined by the user, controls the right of priority which simulation process is enjoyed current execution with this.In one embodiment, determine priority level according to the urgency of particular procedure or the importance of particular user.In another embodiment, priority is dynamic, can change in simulation process.In a preferred embodiment, according to user ID priority is set.Typically, a user has high right of priority, and other users enjoy lower but identical right of priority.
The rank of priority can be set by the system manager.Emulating server obtains all user profile from unix environment, relatively be typically to find in the UNIX user file of by name "/etc/passwd ".It is consistent with the new user procedures of increase in the UMX system increasing new user procedures.After all users are defined, just can come to adjust priority level for different user with the display of simulator server.
Operation interchanger 1144 is according to the simulation job of determining to use related another of simulation job replacement related with a processing or workstation and different disposal or workstation for the priority of scheduler program programming temporarily.If the multi-user is simulating same design, the emulation mode that the operation interchanger only changes to storage is used for the simulation process.But if the multi-user is simulating a plurality of designs, the operation interchanger will load this design for hardware configuration before changing to emulation mode so.In one embodiment, because the exchange that only need fulfil assignment for the visit of reconfigurable hardware cell, so this operation exchanging mechanism has improved the performance of time-sharing operation embodiment of the present invention.Therefore, if a user needs to carry out software simulation in the section between at a time, just server can change to other operation for other users so, that user just can visit reconfigurable hardware cell and is used for hardware-accelerated like this.The user can regulate and set the operation exchange frequency.Device driver also can carry out communication with exchanging operation with reconfigurable hardware cell.
The operation of emulating server will be discussed now.Figure 49 is the process flow diagram of an emulating server in operating process.Originally system is idle in step 1160, and at this moment, emulating server there is no need to be in dormant state, and the process of simulating in other words is not in the middle of operation.In fact, idle condition may mean one of following situation: (1) does not have dry run, (2) having only a user/workstation in single processor environment is to be in state of activation so that not need time-sharing operation, or (3) have only a user or workstation to be in state of activation in multiprocessing environment, but only a processing is in operation.Therefore, above-mentioned 2 and 3 two kind of situation show that emulating server only need be handled an operation, therefore arrange operation, decision priority and operation exchange all are not need with unnecessary.Owing to do not receive request, so emulating server is in idle condition from other workstations or handling.
When the workstation under multi-user environment or the microprocessor under the multi-processor environment send one or more request signals and when causing a simulation request to produce, emulating server will sort in the step 1162 pair one or more simulation jobs that enter.Scheduler program keeps a simulation job queue table, and all uncompleted simulation requests are inserted wherein and listed in all uncompleted simulation requests.For the batch processing simulation job, the scheduler program in the server is given all simulation request queues that enter, and handles these operations automatically, does not need artificial intervention.
Then in step 1163, emulating server will be classified so that determine right of priority to each operation of arranging.This step is for a plurality of operation particular importances, because server must determine in each operation that priority is to visit reconfigurable hardware cell.Which simulation job in the formation is carried out in the decision of priority classification device.In one embodiment, if contention for resources occurs, the simulation job precedence scheme can be defined by the user (can by system manager's control and definition), has current execution priority to control which simulator program.
After step 1163 priority classification, server when needed will be in step 1164 exchange simulation job.This step will temporarily use a simulation job related with a program or workstation to replace and another program or another related simulation job of workstation according to being the priority that the scheduler program in the server is set.If a plurality of users simulate same design, the operation interchanger will only change to the emulation mode of having stored and be used for the simulation process.And if a plurality of users are simulating different designs, the operation interchanger will load design earlier and change to emulation mode again.Here, device driver also carries out communication with exchanging operation with reconfigurable hardware cell.
In one embodiment, the operation exchanging mechanism has improved time-sharing operation implementation result of the present invention, because the operation exchange only needs to carry out when the reconfigurable hardware cell of visit.Therefore, if a user carries out software simulation in certain time period, server can change to another operation for another user, and this another user is hardware-accelerated to carry out with regard to addressable reconfigurable hardware cell like this.For example, suppose that user 1 and user 2 all visit reconfigurable hardware cell by emulating server.At first, user's 1 access system in a period of time can be debugged his/her user's design.If 1 of user debugs under software pattern, server just can discharge reconfigurable hardware cell, at this moment user 2 be addressable it.Server changes to user 2 operation, so user 2 can software simulation or hardware-accelerated model.According to the priority between user 1 and the user 2, user 2 can continue to visit reconfigurable hardware cell within the predetermined time, perhaps when user 1 needs reconfigurable hardware cell to quicken, server can be seized user 2 operation, and user 1 operation can be changed to and be undertaken hardware-accelerated by reconfigurable hardware cell like this.Preset time is meant taking the lead based on the simulator operation of a plurality of requests with equal priority.In one embodiment, though the user can be provided with it, the system default time is 5 minutes.This 5 minutes the form of having represented a kind of overtime timer is set.The execution that simulation system of the present invention uses overtime timer to stop current simulation job is that therefore system determines other operations co-pending with same-priority should visit reconfigurable hardware model because this operation is too consuming time.
In step 1164, after the operation exchange step is finished, the device driver in the server will pin reconfigurable hardware cell, have only user or program in the current planning can simulate and use hardware model like this.Latch with simulation steps and occur in step 1165.
In case finish or current simulation process is suspended in incident 1166 simulation, server will return priority classification step 1163 determining the priority of each simulation job co-pending, and the simulation job of carrying out where necessary subsequently exchanges.Similarly, server also may be seized the execution of current simulation job and be back to priority classification state 1163 in step 1167.Only seize and take place under given conditions.One of them such condition is that operation with high priority more etc. is pending.The such condition of another one is just to move the simulation task of a computation-intensive when system, and this moment, scheduler program can be designed to utilize overtime timer to stop the operation of current operation and handle another operation with same priority.In one embodiment, overtime timer is set at 5 minutes.If current operation carried out 5 minutes,, system changes to the operation co-pending of the status that is in that All factors being equal, preference will be give to even will seizing current operation.
Figure 50 has shown the process flow diagram that the operation exchange is handled.The operation function of exchange realizes in the step 1164 of Figure 49, and is presented in the hardware of emulating server as operation interchanger 1144 in Figure 48.In Figure 50, when a simulation job need exchange with the another one simulation job, the operation interchanger can transmit an interruption to reconfigurable hardware cell in step 1180.Not move any operation (be that system's free time or user only operate in software pattern if reconfigurable hardware cell is current, do not relate to any hardware-accelerated), then this interruption makes reconfigurable hardware cell perform the preparation of operation exchange immediately.Yet if reconfigurable hardware cell is moving an operation and carrying out an instruction or deal with data, look-at-me will be identified, but the data that reconfigurable hardware cell still continues to carry out current instruction co-pending and handles current operation.If reconfigurable hardware cell be not current operation is instructed carry out or the process of data processing in accepted look-at-me, then this signal has stopped the operation of reconfigurable hardware cell in fact immediately.
In step 1181, simulation system has been preserved current analog image (being the software and hardware state).By preserving this image, the user can be reruned the operation that recovers to simulate to the whole simulation of this savepoint subsequently.
In step 1182, simulation system is that reconfigurable hardware cell has disposed new user's design.This configuration step have only be only under below the situation necessary: the user who has disposed and loaded in promptly relevant with new operation user's design and the reconfigurable hardware cell of just being ended to carry out has designed different.After configuration was finished, the hardware simulation image that is saved reloaded in step 1183, and the software simulation image that is saved then reloads in step 1184.If new simulation job is relevant with same design, then do not need to be provided with again.For same design, simulation system will load the desired hardware simulation image relevant with new simulation job same design in step 1183, because the analog image of the analog image of new operation and firm suspended operation may be different.Configuration step detail as per patent specification.After this, Xiang Guan software simulation image reloads in step 1184.After reloading the software and hardware analog image, the simulation of new operation will begin in step 1185, and previous suspended operation can only be carried out under the software simulation pattern, because it temporarily can not visit reconfigurable hardware cell.
Signal between Figure 51 display-device driver and the reconfigurable hardware cell.Device driver 1171 provides the interface between scheduler program 1170 and the reconfigurable hardware cell 1172.Shown in Figure 45 and 46, device driver 1171 also provides the interface between whole computing environment (being workstation, pci bus, PCI equipment) and the reconfigurable hardware cell 1172, but Figure 51 has only shown the emulating server part.Signal between device driver and the reconfigurable hardware cell comprise the two way communication exchange signal, from computing environment via scheduler program pass non-directional design configurations information, change to emulation mode information to reconfigurable hardware cell, the emulation mode that swaps out information, and the slave unit driver passes to the look-at-me of reconfigurable hardware cell with the exchange simulation job.
Circuit 1173 transmits the two way communication exchange signal, and these signals and exchange agreement will be discussed in conjunction with Figure 53 and Figure 54.
Circuit 1174 transmits the non-directional design configurations information that passes to reconfigurable hardware cell 1172 via scheduler program 1170 from computing environment.Initial for carrying out modelling, configuration information can pass to reconfigurable hardware cell 1172 by circuit 1174.In addition, when the user modelling and the different user of simulation when designing, configuration information must be sent to reconfigurable hardware cell 1172 in a timesharing.When different users when the same user of modelling designs, new design configurations is unnecessary; And for different dry runs, may need the different analog hardware state relevant with same design passed to reconfigurable hardware cell 1172.
Circuit 1175 transmits the emulation mode information that changes to reconfigurable hardware cell 1172.Circuit 1176 transmits the emulation mode information that swaps out that (is generally storer) from reconfigurable hardware cell to computing environment.Change to emulation mode information and comprise the hardware model status information of previous preservation and the hardware memory state that reconfigurable hardware cell 1172 needs quicken.Change to emulation mode information and be timeslice begin transmit, Yu Ding active user just can visit reconfigurable hardware cell to quicken like this.The emulation mode that swaps out information comprises that reconfigurable hardware cell 1172 receives look-at-mes and after moving to the relevant next timeslice of the user/program different with, must be saved to hardware model and memory state information in the storer at the end of a timeslice.The storage of the status information active user/program that makes can (for example be distributed to the next timeslice of active user/program) and recover this state in the time after a while.
Circuit 1177 slave unit drivers 1171 transmit look-at-me to carry out the exchange of simulation job to reconfigurable hardware cell.Look-at-me transmits between timeslice, could swap out current simulation job and change to new simulation job in next timeslice at the current time sheet like this.
Now, on the basis of reference Figure 53 and Figure 54, the communication exchange agreement of implementing according to of the present invention is discussed.Figure 53 has shown the communication exchange signal that transmits by an exchange logic interface between device driver and the reconfigurable hardware cell.Figure 54 is the constitutional diagram of communication protocol.Figure 51 has shown the communication exchange signal on the circuit 1173.Figure 53 then is the detailed situation of the communication exchange signal between device driver and the reconfigurable hardware cell.
In Figure 53, exchange logic interface 1234 is arranged in reconfigurable hardware cell 1172.Perhaps, exchange logic interface 1234 can be installed in the outside of reconfigurable hardware cell 1172.Four groups of signals are arranged between device driver 1171 and exchange logic interface 1234.These signals are 3 byte SPACE (space) signals, the byte read/write signal on the circuit 1231,4 byte COMMAND (order) signals on the circuit 1232 on the circuit 1230, and the byte DONE on the circuit 1233 (finishing) signal.The exchange logic interface has comprised the logical circuit of handling these signals, and reconfigurable hardware cell is positioned under the appropriate mode to carry out the different operating that needs operation.This interface links to each other with CTRL_FPGA unit (perhaps FPGA i/o controller).
For 3 byte SPACE signals, be based upon simulation system computing environment on the pci bus and the data transmission between the reconfigurable hardware cell and be assigned to specific I/O address space in the software/hardware border--among REG (register), CLK (software clock), S2H (software is to hardware) and the H2S (hardware is to software).Just as mentioned before, simulation system is mapped to hardware model in four address spaces of primary memory according to different unit types and control function: the designated register parts that are used in REG space; The designated software clock that is used in CLK space; The designated output that is used for the software test platform parts to hardware model in S2H space; The then designated output that is used for hardware model to the software test platform parts in H2S space.In the time of system initialization, these special-purpose I/O cushion spaces will be mapped to the primary storage space of kernel.
Following table G has shown the description of having described each SPACE signal:
Table G:SPACE signal
Read/write signal on the circuit 1231 shows that these data read or write.End during DMA data transmission of DONE signal indication on the circuit 1233.
4 byte COMMAND show that this data transfer operation is that the new user's design of reading and writing, configuration enters in the reconfigurable hardware cell, or break simulation.The COMMAND agreement is shown in following table H:
Table H:COMMAND signal
With reference now to Figure 54, the communication exchange agreement is discussed in constitutional diagram.At state 1400, the device driver simulation system free time.Short of new order is sent, and this system will keep idle condition shown in path 1401.When new order is sent, command processor will be handled new order in state 1402.In the present embodiment, command processor is the FPGA i/o controller.
If COMMAND=0000 or COMMAND=0001 will read or write at designated space system will show as the SPACE index of state 1403.If COMMAND=0010, system will begin with a user design FPGA in reconfigurable hardware to be configured at state 1104, perhaps with new user's design FPGA will be configured.System will be configured the ordering of information for all FPGA, with the part that can simulate in hardware in the modelling user design.Yet if COMMAND=0011, reconfigurable hardware cell will be interrupted with the break simulation system at state 1405 in system, because timeslice is overtime, will be prepared as new user/program and will change to new emulation mode.After these states 1403,1404,1405 finished, simulation system can enter DONE state 1406 to produce the DONE signal, was return state 1400 then and kept idle condition to occur up to new order.
The time-sharing operation feature of emulating server when handling a plurality of operation that is on the different priorities is discussed now.Figure 52 has provided an example: have four to be about to the operation (operation A, operation B, operation C, operation D) carried out in the simulation job formation.But the priority of these four operations is different: operation A and operation B are designated as high priority I, and operation C and operation D then are low priority I I.As Figure 52 the time shown in the line chart, the priority level of job queue co-pending is depended in the use of reconfigurable hardware time-sharing operation.In the time 1190, simulation is permitted it and is visited reconfigurable hardware cell from operation A.In the time 1191, operation A is seized by operation B, because operation B and operation A have equal priority, scheduler program provides the time-sharing operation access rights of equality to two operations.Operation B is the reconfigurable hardware cell of visit now.In the time 1192, operation A seizes operation B and carries out to finishing in the time 1193.In the time 1193, operation B then carries out and carries out to finishing in the time 1194.In the time 1194, be in next bit in the job queue but the operation C lower than the priority of operation A and operation B visits reconfigurable hardware cell and begin and carry out.In the time 1195, operation D seizes operation C, carries out time-sharing operation visit, because they have that All factors being equal, preference will be give to level.The access rights of operation D were ended to the time 1196, and this moment, it was seized by operation C.Operation C was finished in the time 1197, and operation D regained access rights and is finished in the time 1198 in the time 1197 then.
VIII. storer simulation
Storer of the present invention simulation or memory mapped are characterized as simulation system provides an effective method to manage the relevant memory block of the hardware model that configures a plurality of and user's design, and the program that this user designs is to be organized in the fpga chip array of reconfigurable hardware cell.By implementing embodiments of the invention, the storer modeling scheme does not need that the pin of any special use comes the processing memory visit on the fpga chip.
Here used " memory access " is meant that one between fpga logic device and the SARAM storage arrangement read or a visit of writing, user design is configured in the fpga logic device, and the SRAM memory means stores all design relevant memory block with the user.Therefore, write operation relates to the data transmission of fpga logic device to the SRAM storage arrangement, and the data that read operation relates to from the SRAM storage arrangement to the fpga logic device transmit.With reference to Figure 56, the fpga logic device comprises 1201 (FPGA1), 1202 (FPGA3), 1203 (FPGA0) and 1204 (FPGA2), and the SRAM storage arrangement then comprises storage arrangement 1205 and 1206.
Simultaneously, " transmission of DMA data " except the common usage that the person of ordinary skill in the field understood, and it refers to that also the data between computing system and simulation system transmit.The computing system that Fig. 1, Figure 45 and Figure 46 show is whole based on PCI and have the system of storer, and its supports to be arranged in software and reconfigurable simulation system.Socket/system call that selected device driver and operating system are sent or received also is the part of simulation system, and this part provides the suitable interface with operating system and reconfigurable hardware cell.In one embodiment of this invention, a DMA reads and transmits the data transmission that comprises from fpga logic device (with the FPGA SRAM storage arrangement that is used for initialization and memory contents dump) to host computing system.A DMA writes and transmits the data transmission that comprises from host computing system to the fpga logic device (with the FPGA SRAM storage arrangement that is used for initialization and memory contents dump).
The saying of the term of mentioning here " FPGA data bus ", " FPGA bus ", " FD bus " and other many variations is meant high-end group of bus FD[63:32] and low side group bus FD[31:0], these buses have connected fpga logic device and the SRAM storage arrangement that comprises the user who disposes He programmed the design that remains to be debugged.
The storer simulation system comprises a memory state machine, an estimated state machine and the logic relevant with them, control and being connected: (1) host computer system and its relative storage system, (2) the SRAM storage arrangement that links to each other with the FPGA bus in the simulation system, (3) comprise the fpga logic device of the just debugged user who configures and programme design.
The fpga logic device end of storer simulation system comprises an estimated state machine, a FPGA bus driver, and the logic interfacing that links to each other of the memory interface that is used for each memory block N and user design user self, handle: the data estimation between (1) fpga logic device, and the visit of the read/writable memory device between (2) SRAM storage arrangement and the fpga logic device.FPGA i/o controller end links to each other with the fpga logic device end, it comprises memory state machine and logic interfacing and handles: (1) host computer system and SRAM storage arrangement, and the DMA between (2) fpga logic device and the SRAM storage arrangement, reading and writing operation.
According to one embodiment of the invention, the description of the operation of storer simulation system is as follows substantially.The analog read/write circulation is divided into three phases: DMA data transmission, estimation and memory access.The appearance of DATAXSFR signal indicating DMA data transfer phase, in this stage,---high-end group of bus 1212 (FD[63:32]) and low side group bus 1213 (FD[31:0])---transmits data mutually by the FPGA data bus between computing system and the SRAM storage unit.
In estimating stage, the logical circuit in each fpga logic device generates correct software clock, input starts and multiplexed enabling signal is carried out data estimation in user's design logic.Communication between the fpga logic device took place in this stage.
In the memory access stage, the storer simulation system waits for that high-end and low side group fpga logic device is positioned over their address and control signals separately on separately the FPGA data bus.The CTRL_FPGA unit will latch these addresses and control signal.If carry out write operation, then the address, control and data signal the SRAM storage arrangement that will be sent to from the fpga logic device separately.If carry out read operation, then address and control signal will be provided for the SRAM storage arrangement of appointment, and data-signal then passes fpga logic device to separately by the SRAM storage arrangement.After the memory block that in all fpga logic devices all need was accessed, the circulation of storer analog read/write was finished, and the storer simulation system kept idle state before next storer analog read/write circulation arrives.
That Figure 56 shows is the high-level structure figure of storer analog configuration according to an embodiment of the invention.Simulate incoherent signal with storer of the present invention, be connected and bus is not revealed.The CTRL_FPGA unit of narrating above 1200 is connected with bus 1210 by circuit 1209.In one embodiment, CTRL_FPGA unit 1200 is the programmable logic device (PLD) of fpga chip form, for example Altera 10K50 chip.Local bus 1210 makes CTRL_FPGA unit 1200 to be connected with other chip (as pci controller, EEPR0M, clock buffer) with other analog array plates (if any).Circuit 1209 transmits the DONE signals, the finishing of this signal indication analog D MA data transfer phase.
Figure 56 has shown the functional blocks that other are main with the form of logical unit and storage arrangement.In one embodiment of this invention, logical unit is with the programmable logic device of fpga chip form (PLD), for example Altera 10K130 or 10K250 chip.Therefore, 8 Altera FLEX 10K100 chips are different with having in the array among the embodiment shown in the front, and present embodiment has only used four Altera FLEX 10K130 chips.Storage arrangement is synchronous pipeline high-speed cache SRAM, for example a Cypress 128Kx32 CY7C1335 or a CY7C1336 chip.Logical unit comprises 1201 (FPGA1), 1202 (FPGA3), 1203 (FPGA0) and 1204 (FPGA2).Sram chip comprises low side group storage arrangement 1205 (L-SRAM) and high-end group of storage arrangement 1206 (H-SRAM).
These logical units and storage arrangement are connected with CTRL_FPGA unit 1200 by high-end group of bus 1212 (FD[63:32]) and low side group bus 1213 (FD[31:0]).Logical unit 1201 (FPGA1) is connected with high-end group of bus 1212 with bus 1225 by bus 1223 respectively with 1202 (FPGA3), and logical unit 1203 (FPGA0) then is connected with low side group data bus 1213 with bus 1226 by bus 1224 respectively with 1204 (FPGA2).High-end group of storage arrangement 1206 is connected with high-end group of bus 1212 by bus 1220, and low side group storage arrangement 1205 is connected with low side group bus 1213 by bus 1219.This dual group bus structure can make simulation system with higher percent of pass and be advanced into high-end group of device and low side group device.This dual group data bus structure is supported other signals simultaneously, as control signal and address signal, so just can control the analog read/write circulation.
Translate into Figure 61, each analog read/write circulation comprises DMA data transfer phase, estimating stage and memory access stage.The combination control of various control signal also shows whether simulation system is in certain stage relative with an other stage.DMA data transmission between the logical unit 1201 to 1204 of host computer system and reconfigurable hardware cell is passed through pci bus (as the bus among Figure 46 50), local bus 1210 and 1236, and FPGA bus 1212 (FD[63:32]) and 1213 (FD[31:0]) carry out.The related storage arrangement 1205 and 1206 of DMA data transmission is to carry out initialization and memory contents dump.Estimated data in the reconfigurable hardware cell between the logical unit 1201 to 1204 transmission is to be undertaken by interconnection (as indicated above) and FPGA bus 1212 (FD[63:32]) and 1213 (FD[31:0]).Memory access between logical unit 1201 to 1204 and storage arrangement 1205 and 1206 is undertaken by FPGA bus 1212 (FD[63:32]) and 1213 (FD[31:0]).
Return Figure 56, CTRL_FPGA unit 1200 provides and has received many controls and address signal, with the circulation of control analog read/write.CTRL_FPGA unit 1200 respectively to logical unit 1201 and 1203, and provides DATAXSFR and EVAL signal on the circuit 1211 to logical unit 1202 and 1204 respectively by circuit 1222 by circuit 1221.CTRL_FPGA unit 1200 also provides MA[18:2 by bus 1229 and 1214 to low side group storage arrangement 1205 and high-end group of storage arrangement 1206 respectively] signal.In order to control the pattern of these storage arrangements, CTRL_FPGA unit 1200 provides chip to select to read (with writing) signal by bus 1216 and 1215 to low side group storage arrangement 1205 and high-end group of storage arrangement 1206 respectively.The storer simulation system can be on bus 1209 be sent or is received the DONE signal to CTRL_FPGA unit 1200 and computing system and shows finishing of DMA data transmission.
As in front in conjunction with Fig. 9,11,12,14 and mentioned in 15 o'clock, logical unit 1201 to 1204 links together by the multiplexed chip address indicator link of striding, and this indicator chain is presented as two groups of SHIFTIN/SHIFTOUT circuits in Figure 56: circuit 1207,1227 and 1218 and circuit 1208,1228 and 1217.These sets of lines begin to locate to be initiated at Vcc on circuit 1207 and 1208 chain.Spread out of in the fpga logic device of SHIFTIN signal from the group of front and begin memory access current fpga logic device.After finishing by one group of specific displacement that chain carried out, last logical unit sends LAST signal (being LASTL or LASTH) to CTRL_FPGA unit 1200.For high-end group, logical unit 1202 sends LASTH to CTRL_FPGA unit 1200 and shifts out signal on circuit 1218, and for the low side group, logical unit 1204 sends the LASTL signal to CTRL_FPGA unit 1200 on circuit 1217.
Implement and Figure 56 as for circuit board, one embodiment of the invention with parts (as logical unit 1201-1204, storage arrangement 1205-1206 and CTRL_FPGA unit 1200) and bus (as FPGA bus 1212-1213 and local bus 1210) be incorporated on the circuit board.This circuit board is connected with mainboard by motherboard connector.Therefore, four logical units (each organizes last two) are arranged, two storage arrangements (each organizes), and bus on a circuit board.Then comprise it self logical unit (being generally four) on second circuit board, storage arrangement (being generally two), FPGA i/o controller (CTRL_FPGA unit) and bus.But pci controller only is installed on first circuit board.Circuit mother daughter board connector mentioned above is arranged between the circuit board, and the logical unit on all circuit boards can connect together like this, and can communicate with one another when estimation, is provided with local bus in addition between all circuit boards.All be provided with FPGA bus FD[63:0 on each circuit board], but they do not cross over a plurality of circuit boards.
In this circuit board arrangement, simulation system is carried out memory mapped between logical unit on each circuit board and storage arrangement, but does not support the memory mapped between the various boards.Therefore, the logical unit on the circuit board 5 can only be mapped to the storage arrangement on the circuit board 5 with memory block and can not be mapped to storage arrangement on other circuit boards.But in other embodiments, simulation system then can be mapped to storage arrangement on the another one circuit board from the logical unit on the circuit board with memory block.
Operation according to the storer simulation system of one embodiment of the invention is roughly as follows.Simulation read-write circulation is divided into three phases: DMA data transmission, estimation and memory access.For showing that a simulation read-write round-robin finishes, the storer simulation system can be on circuit 1209 to/from CTRL_FPGA unit 1200 and computing system transmission/reception DONE signal.The generation of the DATAXSFR signal indicating DMA data transfer phase on the bus 1211, in this stage, computing system and fpga logic device 1201 to 1204 transmit data mutually by FPGA data bus, high-end group of bus 1212 (FD[63:32]) and low side group bus 1213 (FD[61:0]).Generally speaking, DMA transmits and occurs between host computer system and the fpga logic device.For initialization and memory contents dump, DMA transmits and occurs between host computer system and SRAM storage arrangement 1205 and 1206.
In estimating stage, the logical circuit among each fpga logic device 1201-1204 generates appropriate software clock, input startup and multiplexed enabling signal to carry out data estimation for user's design logic.Communication between the fpga logic device betides this stage.CTRL_FPGA unit 1200 also starts the duration of estimation counter with the control estimation cycle.The quantity of counting, and the duration of estimating the cycle accordingly set by the longest path of determining signal by system.Path is relevant with the step of concrete quantity.System uses this Step Information and calculates and makes the estimation circulation can finish the needed count number of operation.
In the memory access stage, the storer simulation system waits for that high low side group fpga logic device 1201-1204 deposits in its address and control signal respectively on separately the FPGA data bus.CTRL_FPGA unit 1200 will latch these addresses and control signal.For write operation, the address, control and data signal the SRAM storage arrangement 1205 and 1206 that to be sent to from fpga logic device 1201-1204 separately.For read operation, address and control signal will be sent to separately SRAM storage arrangement 1205 and 1206 from fpga logic device 1201-1204, and data-signal then passes fpga logic device 1201-1204 to separately by SRAM storage arrangement 1205 and 1206.At the fpga logic device end, the FD bus driver places the address and the control signal of a memory block on the FPGA data bus (FD bus).If being placed on, write operation, write data be used for this memory block on the FD bus.If read operation, double buffer will be changed to memory block latch data memory block on the FD bus from the SRAM memory device.This operates in each memory block of each fpga logic device and carries out in order, only carries out in a memory block at every turn.When all memory blocks that need are all accessed on the fpga logic device, the storer simulation system will enter the next fpga logic device of each group and begin visit to the memory block of this fpga logic device.After all memory blocks that need on all fpga logic device 1201-1204 were all accessed, the circulation of storer analog read/write was finished, and the storer simulation system will keep idle state, arrive up to next storer analog read/write round-robin.
Figure 57 is the more detailed structural drawing of storer simulation of the present invention aspect, comprises CTRL_FPGA unit 1200 more detailed structural drawing and simulates all relevant logical units with storer.Figure 57 has shown the part (its similar is in the part of other logical unit 1201,1202 and 1204) of CTRL_FPGA unit 1200 and logical unit 1203.CTRL_FPGA unit 1200 comprises limited memory state machine (MEMFSM) 1240, AND gate 1241, estimation (EVAL) counter 1242, low side group storage address/control lock storage 1243, low side group address/control multiplexer 1244, address counter 1245, high-end group of storage address/control lock storage 1247 and high-end group address/control multiplexer 1246.Each logical unit, for example shown logical unit 1203 among Figure 57 comprises estimation finite state machine (EVALFSMx) 1248 and data bus multiplexer (FDO-MUXx that is used for logical unit FPGA0 1203) 1249.The associated certain logic device of appended " x " expression of EVALFSM ending (FPGA0, FPGA1, FPGA2, FPGA3), the number " x " expression from 0 to 3 in this example.Therefore, EVALFSM0 is relevant with the FPGA0 of logical unit 1203.Generally speaking, each logical unit is all relevant with some digital x.And for N the logical unit that uses, x then represents certain number from 0 to N-1.
In each logical unit of 1201-1204, a large amount of memory blocks with configure also mapped user and design relevant.Therefore, the memory block interface 1253 in the user logic provides approach for the memory block that computing system enters the fpga logic apparatus array that needs.Memory block interface 1253 also offers FPGA data bus multiplexer (FDO-MUXx) 1249 with the memory write data on the bus 1295, and receives memory read data on the buses 1297 from memory read data double buffer 1251.
Each fpga logic device all has memory areas blocks of data/logic interfacing t298.Each such memory areas blocks of data/logic interfacing 1298 is all connecting FPGA data bus multiplexer (FDO-MUXx) 1249, estimation finite state machine (EVALFSMx) 1248 and FPGA bus FD[63:0].Memory areas blocks of data/logic interfacing 1298 comprises the memory block interface 1253 of memory read data double buffer 1251, address compensating unit 1250, memory model 1252 and each memory block N.They equally also are present among any given fpga logic device 1201-1204 of each memory block N.Therefore, if 5 memory blocks are arranged, 5 groups of memory areas blocks of data/logic interfacings 1298 are so just arranged; That is to say that to each memory block N (mem_block_N) 5 groups of memory read data double buffers 1251, address compensating unit 1250, memory model 1252 are arranged, and memory block interface 1253.
The same with EVALFSMx, " x " among the FDO-MUXx also represents associated certain logic device (FPGA0, FPGA1, FPGA2, FPGA3), here the number " x " expression from 0 to 3.The output of FDO-MUXx 1249 is positioned on the bus 1282, bus 1282 then with high-end group of bus FD[63:32] or low side group bus FD[31:0] link to each other, depend on that as for linking to each other FDO MUXx 1249 is relevant with which chip (FPGA0, FPGA1, FPGA2, FPGA3) with which bus.In Figure 57, FDO-MUXx is the FDO-MUX0 that links to each other with low side group logical unit FPGA01203.Therefore, the output on the bus 1282 is at low side group bus FD[31:0] on.Bus 1283 parts are used to from high-end group of bus FD[63:32] or low side group bus FD[31:0] transmit read data to read bus 1283, to be entered in the memory read data double buffer 1251.Therefore, write data is transferred to high-end group of bus FD[63:32 by the memory block of FDO-MUXx 1249 from each logical unit 1201-1204] or low side group bus FD[31:0], read data then by read bus 1283 from high-end group of bus FD[63:32] or low side group bus FD[31:0] be transferred to the memory read data double buffer 1251.The double buffering mechanism of utilizing memory read data double buffer latchs the data in first impact damper, and then cushions, and discharges simultaneously to be latched data so that minimum deviation.Memory read data double buffer 1251 will go through hereinafter.
Return memory model 1252, it is converted into user's type of memory the SRAM type of storer simulation system.Because the type of memory in user's design may have nothing in common with each other, therefore for each user's design, memory block interface 1253 also may be unique.For example, the user storage type may be DRAM, flash memory or EEPROM.But in the various memory block interfaces 1253, all have storage address and control signal (as reading and writing, chip selection, mem_clk).In one embodiment of this invention, the storer simulation is with the SRAM type of user memory conversion in type for using in the storer simulation system.If the user memory type is SRAM, it is very simple to the conversion of SRAM type memory model so.Therefore, storage address and control signal are present on the bus 1296 and are transferred to memory model 1252, and carry out conversion by 1252.
Memory model 1252 is providing the memory block address information and providing control information on bus 1292 on the bus 1293.Address compensating unit 1250 receives the address information that is used for each memory block, and provides modification compensation address according to the original address on the bus 1293 on bus 1291.Because specific memory device block address may cover mutually, therefore compensate necessary.For example, a memory block may take and be present in space 0-2K, and another memory block then may take and be present in space 0-3K.Because two memory blocks are overlapping on the 0-2K of space, so if do not have a kind of address compensation mechanism just to be difficult to carry out the individual address read-write.Therefore, first memory block can take and be present in space 0-2K, and second memory block then can take and be present in the above space to 5K of 2K.Be combined from the compensation address of address compensating unit 1250 and the control signal on the bus 1292 and offer bus 1299 and be transferred in the FPGA bus multiplexer (FDO-MUXx) 1249.
SPACE2 data on the FPGA data bus multiplexer FDO-MUXx reception bus 1289, the SPACE3 data on the bus 1290, the address/control data on the bus 1299, and the memory write data on the bus 1295.As previously mentioned, SPACE2 and SPACE3 are concrete SPACE indexes.By FPGA i/o controller (327 among Figure 10; Figure 22) the SPACE index of Chan Shenging is selected specific address space (be that REG reads, REG writes, and S2H reads, and H2S writes, and CLK writes).In this address space, system of the present invention will select in order with accessed specific word.SPACE2 refers to be exclusively used in hardware and reads the storage space of transmission to the DMA of software H2S data.SPACE3 refers to that then the DMA that is exclusively used in the REGISTER_READ data reads the storage space of transmission.Ginseng sees the above table G.
As its output, FDO_MUXx 1249 provides data for low side group or high-end group of bus on bus 1282.Selector signal is from the selection signal on startup (output_en) signal of the output on the circuit 1284 of EVALFSMx unit 1248 and the circuit 1285.Output enabling signal on the circuit 1284 starts the operation of (or forbidding) FDO_MUXx 1249.For the data access on the FPGA bus, the output enabling signal is activated and allows FDO_MUXx performance function.Selection signal on 1285 circuits is produced by EVALFSMx unit 1248, so that from the SPACE2 data on the bus 1289, the SPACE3 data on the bus 1290, the address/control signal on the bus 1299, and select among a plurality of inputs of the memory write data on the bus 1295.EVALFSMx unit 1248 produces selects signal further to be discussed below.
EVALFSMx unit 1248 is operation cores of each logical unit 1201-1204 of relevant storer simulation system.Column signal was imported as it under EVALFSMx unit 1248 received: the SHIFTIN signal on the circuit 1279, the EVAL signal from CTRL_FPGA unit 1200 on the circuit 1274, and the write signal wrx on the circuit 1287.Mail to the selection signal that mails to FDO_MUXx 1249 on the output enabling signal that mails to FDO_MUXx 1249 on the read latch signal rd_latx, circuit 1284 of memory read data double buffer 1251, the circuit 1285 on the signal of SHIFTOUT on EVALFSMx unit 1248 outlet lines 1280, the circuit 1286, and three signals (input_en, mux_en and clk_en) that mail to user logic on the circuit 1281.
It is as described below usually according to an embodiment of the invention, to be used for the operation of fpga logic device 1201-1204 of storer simulation system.When the EVAL signal was in logical one, the data estimation in the fpga logic device 1201-1204 began; Otherwise simulation system is just being moved DMA data transmission or memory access.When EVAL=1, EVALFSMx unit 1248 produces the clk_en signals, input_en signal, and signal so that allow user logic respectively the estimated data, latch related data, and the signal between the multipath transmission logical unit.EVALFSMx unit 1248 produces second trigger (seeing Figure 19) in all clock edge register flip in the design logic that the clk_en signals start the user.The clk_en signal also is taken as software clock in addition.If user's type of memory is synchronous, clk_en also starts the second clock of the memory read data double buffer 1251 in each memory block.EVALFSMx unit 1248 is that user's design logic produces the input_en signal, to latch the input signal that is passed to user logic by the DMA transmission from CPU.The input_en signal is that second trigger (seeing Figure 19) in the major clock register provides the startup input.At last, EVALFSMx unit 1248 produces the mux_en signal, opens the multi-channel transmission channel in each fpga logic device, communicates with other fpga logic device in beginning and the array.
Thereafter, if fpga logic device 1201-1204 comprises at least one memory block, the storer simulation system just waits for that selecteed data are moved into selected fpga logic device so, produce output_en and select signal for the FPGA data bus driver then, so that the address of memory block interface 1253 (mem_block_N) and control signal are placed on the FD bus.
If the write signal wrx on the circuit 1287 is activated (that is, logical one), select signal and output_en signal also to be activated so, write data is placed on low side or the high-end group of bus, this depends on which group fpga chip is connected on.In Figure 57, logical unit 1203 is FPGA0, and is connected to low side group bus FD[31:0] on.If the write signal wrx on the circuit 1287 is disabled (promptly, logical zero), select signal and output_en signal disabled so, the read latch signal rd-latx on the circuit 1286 will allow memory read data double buffer 1251 latch also double buffering from the selected data of SRAM by low side or high-end group of bus (depending on which group fpga chip is connected to).The wrx signal is the memory write signals that originates from the memory interface of user's design logic.Really, the wrx signal on the circuit 1287 via control bus 1292 from memory model 1252.
This process that reads or writes data all can take place in each fpga logic device.After all memory blocks all obtained handling via the SRAM visit, EVALFSMx unit 1248 produced the SHIFTOUT signal, so that allow the next fpga logic device in the link to carry out the SRAM visit.Note that the memory access at the device of high-end and low side group is parallel the generation.Sometimes, may before memory access, finish at the memory access of a group at another group.For all these visits, suitable latent period is inserted into, and is ready to and data when can use so that have only when logic, and logic is just understood deal with data.
1,200 one sides in the CTRL_FPGA unit, MEMFSM 1240 is in the core of storer simulation of the present invention aspect.It sends and receives many control signals, so that control store is simulated the activation in Writing/Reading cycle, and the various operations that the cycle is supported is controlled.MEMFSM 1240 is via the DATAXSFR signal on circuit 1258 receiving liness 1260.This signal also is provided to each logical unit on the circuit 1273.When the DATAXSFR step-down (, logic low), the DMA data transfer cycle finishes, and estimation and memory access cycle then begin.
MEMFSM 1240 is a LASTH signal on the receiving lines 1254 also, and a LASTL signal on the circuit 1255, between computing system and simulation system, obtain visit to show institute's word selection relevant via pci bus and FPGA bus with selected address space.Shift out the relevant MOVE signal of process (for instance with this by each logical unit, logical unit 1201-1204) propagates, obtain visit up to required word, and the MOVE signal becomes LAST signal (that is, for high-end group LASTH with for the LASTL of low side group) at the link end at last.In EVALFSM 1248 (that is, Figure 57 is shown as EVALFSMO for logical unit FPGA0 1203), corresponding LAST signal is the SHIFTOUT signal on the circuit 1280.Because specific logical unit 1203 is not last logical unit (logical device 1204 is last logical device in the low side memory bank link among Figure 56) in the low side group link shown in Figure 56, so be not the LAST signal at the SHIFTOUT signal of EVALFSMO.If EVALFSM 1248 is corresponding to the EVALFSM2 among Figure 56, the SHIFTOUT signal on the circuit 1280 is exactly the LASTL signal that is provided to MEMFSM on circuit 1255 so.Otherwise the SHIFTOUT signal on the circuit 1280 just is provided to logical unit 1204 (seeing Figure 56).Similarly, the SHIFTIN signal on the circuit 1279 is represented the Vcc of FPGA0 logical unit (seeing Figure 56) 1203.
LASTL and LASTH signal are the inputs that enters AND gate 1241 respectively via circuit 1256 and circuit 1257.AND gate 1241 has an open-drain.The output of AND gate 1241 produces the DONE signal on the circuit 1259, and this signal is provided to computing system and MEMFSM 1240.Therefore, have only when LASTL and LASTH signal all be logic high, when showing that shifting out chain process finishes, AND gate is just exported a logic high.
MEMFSM 1240 produces a start signal to EVAL counter 1242 on circuit 1261.As this title hinted, the beginning of start signal triggers EVAL counter 1242, and after the DMA data transfer cycle is finished, be sent out.Start signal produces when detecting the transition of (1 to 0) from high to low of DATAXSFR signal.EVAL counter 1242 is programmable counters, the clock period of a predetermined number of its counting.Determine the length in estimation cycle in the EVAL counter 1242 through the counting step of programming.The output of the EVAL counter 1242 on the circuit 1274 is a logical level 1 or 0, and this depends on that whether counter is at counting.When EVAL counter 1242 was being counted, the output on the circuit 1274 was in logical one, and this output is provided to each fpga logic device 1201-1204 via EVALFSMx 1248.When EVAL=1, communication between fpga logic device 1201-1204 operation FPGA in case in user's design the estimated data.The output of EVAL counter 1242 is also fed back to MEMFSM unit 1240 on circuit 1262, so that realize its track.When program count finished, a logic zero signal on the EVAL counter 1242 generation circuits 1274 and 1262 showed the end in estimation cycle.
If do not need memory access, the MEM_EN signal on the circuit 1272 is asserted to logical zero so, and is provided to MEMFSM unit 1240, and in this case, the storer simulation system is waited for another one DMA data transfer cycle.Memory access if desired, the MEM_EN signal on the circuit 1272 is asserted to logical one so.In fact, the MEM_EN signal be a SRAM storage arrangement on the circuit board that is used for initiated access fpga logic device, from the control signal of CPU.Here, MEMFSM unit 1240 waits for that fpga logic device 1201-1204 is placed on address and control signal on the FPGA bus, i.e. FD[63:32] and FD[31:0].
The associated control signal of remaining functional element and they and circuit are to be used for the write and read data for address/control information being offered the SRAM storage arrangement.These unit comprise the storage address/control lock storage 1243 that is used for the low side group, the address control multiplexer 1244 that is used for the low side group, be used for storage address/control lock storage 1247 of high-end group, be used for high-end group address control multiplexer 1246 and address counter 1245.
Being used for the storage address of low side group/control lock storage 1243 receives from FPGA bus FD[31:0] 1275 address and control signal, be consistent with a latch signal on bus 1213 and the circuit 1263.Latch 1243 produces the mem_wr_L signal on circuit 1264, and will be from FPGA bus FD[31:0 via bus 1266] access address/control signal offer address/control multiplexer 1244.This mem_wr signal selects write signal identical with chip.
Address/control multiplexer 1244 receives address on the buses 1266 and control information and imports as it from the address information of address counter 1245 via bus 1268.As output, it sends to address/control information low side group SRAM storage arrangement 1205 on bus 1276.Selection signal on the circuit 1265 provides the correct selection signal from MEMFSM unit 1240.Address/control information on the bus 1276 is corresponding to the MA[18:2 on bus among Figure 56 1229 and 1216] and chip selection read/write signal.
The information that address counter 1245 receives from SPACE4 and SPACE5 via bus 1267.SPACE4 comprises that DMA writes transmission information.SPACE5 comprises that DMA reads transmission information.Therefore, these DMA are transmitted on the pci bus and take place between computing system (via high-speed cache/primary memory of workstation CPU) and simulation system ( SRAM storage arrangement 1205,1206).Address counter 1245 offers bus 1288 and 1268 with its output, and offers address/control multiplexer 1244 and 1246.The suitable selection signal that is used for the low side group on the circuit 1265 has been arranged, address/control multiplexer 1244 or address/control information on the bus 1266 is placed on the bus 1276 so that carry out the Writing/Reading memory access between SRAM device 1205 and fpga logic device 1203,1204, or the DMA Writing/Reading transmission data from SPACE4 or SPACE5 on the bus 1267 are placed on the bus 1276.
In memory access cycle, MEMFSM unit 1240 is provided to storage address/control lock storage 1243 with the latch signal on the circuit 1263, so that obtain from FPGA bus FD[31:0] input.MEMFSM unit 1240 extracts from FD[31:0] on the mem_wr_L control information of address/control signal so that make further control.If the mem_wr_L signal on the circuit 1264 is a logical one, just need a write operation, and MEMFSM unit 1240 will produce suitable selection signal and send to address/control multiplexer 1244 on circuit 1265, so that the low side group SRAM that address on the bus 1266 and control signal are sent on bus 1276.Thereafter, a write data is transferred to the SRAM storage arrangement from the fpga logic device.If the mem_wr_L signal on the circuit 1264 is a logical zero, so need a read operation, so simulation system wait for be placed on FPGA bus FD[31:0 by the SRAM storage arrangement] on data.In case DSR, read data just are transferred to the fpga logic device from the SRAM storage arrangement.
High-end group has similar configuration and operation.The storage address/control lock storage 1247 that is used for high-end group receives from FPGA bus FD[63:32] 1278 address and control signal, a latch signal on this signal and bus 1212 and the circuit 1270 is consistent.Latch 1270 produces the mem_wr_H signal on circuit 1271, and will be from FPGA bus FD[63:32 via bus 1239] access address/control signal offer address/control multiplexer 1246.
Address/control multiplexer 1246 receive address on the buses 1239 and control information and via bus 1268 from the address information of address counter 1245 as input.As output, it sends to high-end group of SRAM storage arrangement 1206 with the address/control information on the bus 1277.Selection signal on the circuit 1269 provides the correct selection signal from MEMFSM unit 1240.Address/control information on the bus 1277 is corresponding to the MA[18:2 on the bus among Figure 56 1214 and 1215] and chip selection read/write signal.
Address counter 1245 as indicated above receptions from the information of SPACE4 and SPACE5 via bus 1267 are transmitted so that carry out the DMA write and read.Address counter 1245 offers bus 1288 and 1268 and address/ control multiplexer 1244 and 1246 with its output.Had and be used for high-end group suitable selection signal on the circuit 1269, address/control multiplexer 1246 or address/control information on the bus 1239 is placed on the bus 1277 so that at SRAM device 1206 and fpga logic device 1201, carry out the Writing/Reading memory access between 1202, perhaps the method that can substitute is, the DMA Writing/Reading transmission data from SPACE4 or SPACE5 on the bus 1267 are placed on the bus 1277.
In memory access cycle, MEMFSM unit 1240 is provided to storage address/control lock storage 1247 with the latch signal on the circuit 1270 so that obtain from FPGA bus FD[63:32] input.MEMFSM unit 1240 extracts from FD[63:32] on the mem_wr_H control information of address/control signal so that make further control.If the mem_wr_H signal on the circuit 1271 is a logical one, so just need a write operation, and MEMFSM unit 1240 will produce suitable selection signal and send to address/control multiplexer 1246 on circuit 1269, be sent to high-end group of SRAM so that signal is made in address on the bus 1239 and control on bus 1277.Thereafter, a write data is transferred to the SRAM storage arrangement from the fpga logic device.If the mem_wr_H signal on the circuit 1271 is a logical zero, so need a read operation, simulation system can be waited for by the SRAM storage arrangement and be placed on FPGA bus FD[63:32 like this] on data.In case DSR, read data just are transferred to the fpga logic device from the SRAM storage arrangement.
Shown in Figure 57, address and control signal are provided to low side group SRAM storage arrangement and high-end group of memory devices via bus 1276 and 1277 respectively.Be used for of bus 1229 and 1216 combinations of the bus 1276 of low side group corresponding to Figure 56.Similarly, be used for high-end group bus 1277 corresponding to the bus 1214 of Figure 56 and 1215 combinations.
According to an embodiment of the invention, the operation of the CTRL_FPGA unit 1200 of storer simulation system is as described below usually.DONE signal on the circuit 1259 is provided to computing system and the MEMFSM unit 1240 in CTRL_FPGA unit 1200, shows finishing of simulation Writing/Reading cycle.The generation of the DMA data transfer cycle in the DATAXSFR signal indicating simulation Writing/Reading cycle on the circuit 1260.FPGA bus FD[31:0] and FD[63:32] on storage address/control signal be provided to respectively and be used for high-end and storage address/control lock storage 1243 and 1247 low side group.For arbitrary group, MEMFSM unit 1240 all produces latch signal (1263 or 1269) and comes latch address and control information.This information is provided to the SRAM storage arrangement then.The mem_wr signal is used to decision needs a write operation or read operation.Write operation if desired, data just are transferred to the SRAM storage arrangement via the FPGA bus from fpga logic device 1201-1204.Read operation if desired, simulation system just wait for that the SRAM storage arrangement is placed on the PFGA bus requested data so that transmit between SRAM storage arrangement and fpga logic device.In order to carry out the DMA data transmission of SPACE4 and SPACE5, the selection signal on the circuit 1265,1269 can select the output of address counter 1245 as the data of transmitting between the SRAM storage arrangement in host computer system and the simulation system.To all these visits, inserts suitable latent period, be ready to and the data time spent logic ability deal with data but have only like this when logic.
The more detailed view of Figure 60 display-memory read data double buffer 1251 (Figure 57).Each memory block N in each fpga logic device has a double buffer, is used for latching the relevant data that may enter at different time, finally cushions out simultaneously this relevant latch data then.In Figure 60, the double buffer 1391 that is used for memory block 0 comprises two D flip-flops 1340 and 1341.The output 1343 of first D flip-flop 1340 is connected to the input of second D flip-flop 1341.The output 1344 of second D flip-flop 1341 is the output of double buffer, and it is provided to the memory block N interface in user's design logic.Be provided to first trigger 1340 on the global clock Input Online road 1393, on circuit 1394, be provided for second trigger 1341.
First D flip-flop 1340 on circuit 1342 via bus 1283 and be used for high-end group FPGA bus FD[63:32] and the FD[31:0 that is used for the low side group] receive data input from the SRAM storage arrangement.Start input and be connected to circuit 1345, this circuit be each fpga logic device reception from the rd_latx of EVALFSMx unit (rd_lat0) signal for instance.Therefore, to read operation (that is, wrx=0), the EVALFSMx unit produces the rd_latx signal, so as with the data latching on the circuit 1342 to circuit 1343.The input data that are used for all double buffers of all memory blocks may enter at different time, and double buffer guarantees that all data are at first latched.In case all data are latched to D flip-flop 1340, clk_en signal (that is, software clock) just is provided on the circuit 1346, as the clock input of second D flip-flop 1341.When the clk_en signal was asserted, the latch data on the circuit 1343 was cushioned and enters D flip-flop 1341, arrives circuit 1344 again.
Next memory block 1 has another double buffer 1392 that equals double buffer 1391 in fact.From the data of SRAM storage arrangement is input on circuit 1396.The global clock signal is the input on the circuit 1397.Clk_en (software clock) signal is transfused to second trigger (not shown) in the double buffer 1392 on circuit 1398.These circuits are connected to the analog signal lines of other all double buffers of first double buffer 1391 of memory block 0 and other memory blocks N.The double buffering data of output are output on the circuit 1399.
(for instance, rd_latl) be independent of other rd_latx signals of other double buffers, be provided on the circuit 1395 at the rd_latx signal of second double buffer 1392.More double buffer is provided to other memory block N.
Now the constitutional diagram of MEMFSM unit 1240 will be discussed at one embodiment of the present of invention.Figure 58 has shown a constitutional diagram of the finite state machine of the MEMFSM unit in the CTRL_FPGA unit.Three cycles in the constitutional diagram among Figure 58 in formulation/read cycle also show with their corresponding states.Therefore, the corresponding DMA data transfer cycle of state 1300-1301; The corresponding estimation cycle of state 1302-1304; State 1305-1314 corresponding stored device access cycle.In discussing hereinafter in conjunction with Figure 58 and with reference to Figure 57.
Generally speaking, set the signal sequence of DMA transmission, estimation and memory access.In one embodiment, order is as follows: DATA_XSFR triggers DMA data transmission (if DATA_XSFR is arranged).Be used for the high-end and LAST signal low side group and when the DMA data transmission is finished, produce, and triggering DONE signal, finishing of DMA data transfer cycle shown.Produce the XSFR_DONE signal then, and then begin estimation (EVAL) cycle.When EVAL finishes, can begin memory read/write.
Turn to the top of Figure 58, when the DATAXSFR signal was in logical zero, state 1300 all was idle.This shows the DMA data transmission does not take place at that time.When the DATAXSFR signal was in logical one, MEMFSM unit 1240 just proceeded to state 1301.Here, computing system need carry out the DMA data transmission between computing system (Fig. 1,45, and the primary memory in 46) and simulation system (fpga logic device 1201-1204 among Figure 56 or SRAM storage arrangement 1205,1206).Suitable latent period is inserted into, and finishes up to the DMA data transmission.When the DMA transmission was finished, the DATAXSFR signal turned back to logical zero.
When the DATAXSFR signal is got back to logical zero, be triggered being created in the MEMFSM unit 1240 of state 1302 start signals.Start signal starts EVAL counter 1242, and this is a programmable counter.Program count duration in the EVAL counter equals to estimate the duration in cycle.As long as the EVAL counter is counted at state 1303, the EVAL signal just is asserted to logical one so, and is provided to the EVALFSMx and the MEMFSM unit 1240 of each fpga logic device.When counting finished, the EVAL signal that the EVAL counter will be in logical zero was provided to each fpga logic device interior EVALFSMx and MEMFSM unit 1240.When MEMFSM unit 1240 received EVAL signal logic 0, it opened the EVAL_DONE mark at state 1304.The EVAL_DONE mark is used to refer to the estimation cycle by MEMFSM to be finished, and if desired, memory access cycle can be carried out now.CPU will check that EVAL_DONE and XSFR_DONE completed successfully so that determine DMA transmission and EVAL by reading XSFR_EVAL register (K sees the following form) before the next DMA transmission of beginning.
Yet in some cases, simulation system may not thought run memory visit at the moment.At this moment, simulation system maintenance memory enable signal MEM_EN is a logical zero.(logical zero) MEM_EN signal of this forbidding remains on idle condition 1300 to the MEMFSM unit, waits for the data estimation of DMA data transmission or fpga logic device.On the other hand, if memory enable signal MEM_EN is in logical one, show that so simulation system has the needs that carry out memory access.
State in Figure 58 1304 times, constitutional diagram are divided into two parallel sections that carry out.Section comprises the state 1305,1306,1307,1308 and 1309 at the memory access of low side group.Another section comprises the state 1311,1312,1313,1314 and 1309 at high-end group of memory access.
At state 1305, simulation system waits for that one-period is so that the fpga logic device of current selection is placed on FPGA bus FD[31:0 with address and control signal] on.At state 1306, MEMFSM on circuit 1263, produce latch signal to storage address/control lock storage 1243 so that obtain from FD[31:0] input.To or be read or be written to the SRAM storage arrangement from the SRAM storage arrangement corresponding to the address of this specific acquisition and the data of control signal.In order to determine that simulation system is to need a read operation or a write operation, the memory write signals mem_wr_L that will be used for the low side group from address and control signal extracts.If mem_wr_L=0, a read operation is requested.If mem_wr_L=1, a write operation is requested.As indicated above, this mem_wr signal equals chip and selects write signal.
At state 1307, produce suitable selection signal, so that address and control signal are sent to low side group SRAM at address/control multiplexer 1244.Mem_wr signal and LASTL signal are checked in the MEMFSM unit.If mem_wr_L=1 and LASTL=0, a write operation is requested, but last data also is not moved out of in the fpga logic device link.Therefore, simulation system is got back to state 1305, waits for that one-period is so that the fpga logic device is placed into FD[31:0 with more address and control signal] on.This process is proceeded, and to the last data are moved out of the fpga logic device.Yet if mem_wr_L=1 and LASTL=1, last data just has been moved out of the fpga logic device.
Similarly, if read operation of mem_wr_L=0 indication, MEMFSM just proceeds to state 1308.At state 1308, simulation system waits for that one-period is so that the SRAM storage arrangement is placed on FPGA bus FD[31:0 with data] on.If LASTL=0, last data in the fpga logic device link also are not moved out of.Therefore, simulation system is got back to state 1305, waits for one-period, so that the fpga logic device is placed on FD[31:0 with more address and control signal] on.This process is proceeded, and to the last data are moved out of the fpga logic device.Note write operation (mem_wr_L=1) and read operation (mem_wr_L=0) energy interleaving access or carry out alternately mutually, up to LASTL=1.
When LASTL=1, MEMFSM proceeds to state 1309, when DONE=0, keeps waiting for.When DONE=1, LASTL and LASTH are in logical one, and therefore, the simulation Writing/Reading cycle finishes.Simulation system proceeds to state 1300 then, and at state 1300, as long as DATAXSFR=0, it just keeps idle.
Identical process also is applicable to high-end group.At state 1311, simulation system is waited for one-period, so that current selected fpga logic device is placed into FPGA bus FD[63:32 with address and control signal] on.At state 1312, MEMFSM produces latch signal to storage address/control lock storage 1247 on 1270 circuits, so that obtain from FD[63:32] input.To or from the SRAM storage arrangement, read or write the SRAM storage arrangement corresponding to the address of this specific acquisition and the data of control signal.In order to determine that simulation system is to need a read operation or a write operation, extracts the memory write signals mem_wr_H at high-end group from address and control signal.If mem_wr_H=0, a read operation is requested.If men_w_H=1, a write operation is requested.
At state 1313, produce the suitable selection signal that is used at address/control multiplexer 1246, so that address and control signal are sent to high-end group of SRAM.Mem_wr signal and LASTH signal are checked in the MEMFSM unit.If mem_wr_H=1 and LASTH=0, a write operation is requested, but last data also is not moved out of in the fpga logic device link.Therefore, simulation system is got back to state 1311, waits for one-period, so that the fpga logic device is placed on FD[63:32 with more address and control signal] on.This process is proceeded, and to the last data are moved out of the fpga logic device.Yet if mem_wr_H=1 and LASTH=1, last data has been moved out of the fpga logic device so.
Similarly, if read operation of mem_wr_H=0 indication, MEMFSM just proceeds to state 1314.At state 1314, simulation system is waited for one-period, so that the SRAM storage arrangement is placed on FPGA bus FD[63:32 with data] on.If LASTH=0, last data also is not moved out of in the fpga logic device link so.Therefore, simulation system is got back to state 1311, waits for one-period, so that the fpga logic device is placed on FD[63:32 with more address and control signal] on.This process is proceeded, up to being moved out of the fpga logic device to last data.Note write operation (mem_wr_H=1) and read operation (mem_wr_H=0) energy interleaving access or carry out alternately mutually, up to LASTH=1.
When LASTH=1, MEMFSM proceeds to state 1309, keeps waiting for when DONE=0.When DONE=1, LASTL and LASTH are in logical one, therefore simulate the Writing/Reading cycle and finish.Simulation system proceeds to state 1300 then, and is at state 1300, idle as long as DATAXSFR=0 just keeps.
Perhaps, according to a further embodiment of the invention, to high-end group and low side group, state 1309 and 1310 does not all have to carry out.Therefore, in the low side group, MEMFSM will be directly to state 1300 afterwards by state 1308 (LASTL=1) or 1307 (MEM_WR_L=1 and LASTL=1).In high-end group, MEMFSM will be directly to state 1300 afterwards by state 1314 (LASTH=1) or 1313 (MEM_WR_H=1 and LASTH=1).
The constitutional diagram of EVALFSM unit 1248 will be discussed in conjunction with one embodiment of the present of invention.Figure 59 shows a constitutional diagram of this EVALFSMx finite state machine in each fpga chip.As Figure 58, two cycle states corresponding with them of simulation Writing/Reading cycle the inside show together in the constitutional diagram among Figure 59.Therefore, corresponding estimation cycle of state 1320-1326A, state 1326B-1336 corresponding stored device access cycle.In the following discussion with reference to Figure 57 and in conjunction with Figure 59.
EVALFSMx unit 1248 receives from the EVAL signal (seeing Figure 57) on the circuit 1274 of CTRL_FPGA unit 1200.When EVAL=0, the data estimation that the fpga logic device carries out does not take place.Therefore, at state 1320, when EVAL=0, EVALFSMx is in the free time.When EVAL=1, EVALFSMx proceeds to state 1321.
State 1321,1322, with 1323 with FPGA between communicate by letter relevantly, wherein data are estimated by user's designing institute via the fpga logic device.Here, EVALFSMx produces signal input_en, mux_en and clk_en (1281 among Figure 57) to user logic.At state 1321, EVALFSMx produces the clk_en signal, and this signal starts second trigger (seeing Figure 19) of all the clock edge register flip in user's design logic in this cycle.The clk_en signal also is considered to software clock.If the user memory type is synchronous, clk_en also is enabled in second clock of the memory read data double buffer 1251 in each memory block.In this cycle, output is sent to user's design logic at the SRAM data of each memory block.
At state 1322, EVALFSMx produces the input_en signal for user's design logic, to latch the input signal that the DMA transmission sends from CPU to the user logic.The input_en signal starts input (seeing Figure 19) for second trigger in the major clock register provides.
At state 1323, EVALFSMx produces the mux_en signal, so that open the multiplex electronics in each fpga logic device, other fpga logic devices communicate in beginning and the array.Such as explained before, wire line is often by multiplexed, so that use pin resource limited in each fpga logic device chip effectively between FPGA.
At state 1324, as long as EVAL=1, EVALFSM just waits for.Therefore when EVAL=0, the estimation cycle finishes, and state 1325 needs the EVALFSMx mux_en signal of stopping using.
If the number of memory block M (wherein M is an integer, comprises 0) is zero, EVALFSMx gets back to state 1320, if EVAL=0, it just keeps idle.Under most situation, M>0, therefore, EVALFSMx proceeds to state 1326A/1326B." M " is the number of memory block in the fpga logic device.Be from a constant that shines upon and be configured to the user's design in the fpga logic device; It can countdown.If M>0, the right side part (memory access cycle) of Figure 59 will be configured in the fpga logic device.If M=0 has only the left-hand component (EVAL cycle) of Figure 59 to be configured.
At state 1327, as long as SHIFTIN=0, EVALFSMx just will remain on waiting status.When SHIFTIN=1, previous fpga logic device has been finished its memory access, and current fpga logic device is ready to the memory access task of bootup window.The method that can substitute is that when SHIFTIN=1, current fpga logic device is first logical unit in group, and the SHIFTIN incoming line is connected to Vcc.Yet the current fpga logic device of reception indication of SHIFTIN=1 signal has been ready to the run memory visit.At state 1328, memory block is counted N and is set at N=1.Number N will increase when each circulation takes place, so that can be done at the memory access of specific memory block N.At first, N=1, so EVALFSMx will proceed to and is memory block 1 reference-to storage.
At state 1329, EVALFSMx is that FPGA bus driver FDO_MUXx 1249 produces selection signal on the circuits 1285 and the output_en signal on the circuit 1284, so that the address and the control signal of Mem_Block_N interface 1253 is placed on FPGA bus FD[63:32] or FD[31:0] on.Write operation if desired, wr=1; Otherwise, need read operation, then a wr=0.Wr signal on the EVALFSMx receiving lines 1287 is as one of its input.Based on this wr signal, the correct selection signal on the circuit 1285 will be asserted.
When wr=1, EVALFSMx proceeds to state 1330.EVALFSMx selects and the output_en signal for the FD bus driver produces, so that the write data of Mem_Block_N1253 is placed on FPGA bus FD[63:32] or FD[31:0] on.Thereafter, EVALFSMx waits for one-period, so that allow the SRAM storage arrangement finish write cycle time.EVALFSMx gets the hang of 1335 then, and the memory block number N obtains an increment; That is to say N=N+1.
Yet if at the wr=0 of state 1329, a read operation is requested, and EVALFSMx gets the hang of 1332, waits for one-period, gets the hang of 1333 then, waits for the another one cycle.At state 1334, EVALFSMx produces the rd_latch signal on the circuit 1286, so that allow the memory read data double buffer 1251 of memory block N that the SRAM data are taken out of on the FD bus.EVALFSMx proceeds to state 1335 then, and wherein the memory block number N obtains an increment; That is to say N=N+1.Therefore, if before increment state 1335 N=1, N is exactly 2 now, so the order memory access will be adapted to memory block 2.
If present memory block number N be less than or equal in the user design the memory block sum M (just, N≤M), EVALFSMx proceeds to state 1329, based on the operation write operation still be read operation and ÷ be that the FD bus driver produces specific selection and output_en signal.Then, the write or read operation of carrying out for next memory block N will take place.
Yet, (just, N>M), EVALFSMx proceeds to state 1336 if the memory block sum M during the number N of current memory block designs than the user is bigger, open the SHIFTOUT output signal, so that next fpga logic device visit SRAM storage arrangement in the permission group.Thereafter, EVALFSMx proceeds to state 1320, need carry out data estimation up to simulation system between the fpga logic device, it be always idle (just, EVAL=1).
Figure 61 has shown and has simulated according to an embodiment of the invention the Writing/Reading cycle.Figure 61 shown at numbering 1366 places in the simulation Writing/Reading cycle three the cycle-DMA data transfer cycle, estimation cycle and memory access cycle.Though show that not can draw hint, promptly previous DMA transmission, estimation and memory access may take place in advance.In addition, arrive/may be different from the transmission time sequence of high-end group of SRAM from the data transmission sequential of low side group SRAM.Succinct in order to narrate, Figure 61 has shown an example, is identical at low side with high-end group accessing time sequence wherein.Global clock GCLK1350 provides clock signal for parts all in the system.
DATAXSFR signal 1351 shows the generation of DMA data transfer cycle.When on the track 1367 during DATAXSFR=1, the DMA data transmission just takes place between host computer system and fpga logic device or SRAM storage arrangement.Therefore, data are provided to high-end group of bus FD[63:32 of FPGA] 1359 and track 1369 on, and FPGA low side group bus FD[31:0] 1358 and track 1368 on.The signal of a logical zero to 1 of DONE signal 1364 usefulness (for instance, track 1390) shows finishing of memory access cycle, perhaps use logical zero (combination at the edge of the edge of track 1390 and track 1370 for instance) to show the duration of simulating the Writing/Reading cycle.During the DMA transmission cycle, the DONE signal is in logical zero.
When the DMA transmission cycle finished, the DATAXSFR signal became logical zero from logical one, triggered the beginning in estimation cycle.Therefore, indicated as track 1371, EVAL1352 is in logical one.The EVAL signal duration that is in logical one is scheduled to, and able to programme.During this estimation cycle, estimate with the data in 1353 pairs of user's design logics of clk_en signal, shown in track 1372,1353 are in logical one, shown in track 1373, input_en signal 1354 also is in logical one, shown in track 1374, mux_en signal 1355 also is in logical one, but duration is longer than clk_en and input_en.Estimate according to obtaining in this specific fpga logic device mileage.When mux_en signal 1355 at track 1374 places when logical one enters 0, and at least one memory block is present in the fpga logic device, estimates that so the cycle will finish, memory access cycle then begins.
SHIFTIN signal 1356 is asserted to logical one at track 1375.This FPGA before showing has finished its estimation, and all data that need entered by access/from the fpga logic device before this.Now, next fpga logic device has been ready to begin memory access in the group.
In track 1377 to 1386, will use following term.The ACj_k presentation address is relevant with memory block k with FPGAj with control signal, and wherein j and k comprise 0 integer.WIDj_k represents to be used for the write data of FPGAj and memory block k.RDj_k represents to be used for the read data of FPGAj and memory block k.Therefore, AC3_1 represents address and the control signal relevant with FPGA3 and memory block 1.Low side group SRAM visit and high-end group of SRAM visit 1361 are shown as track 1387.
How ensuing several track 1377 to 1387 is finished the display-memory visit.Based on the wrx signal logic level that mails to EVALFSMx, and the mem_wr signal level that mails to MEMFSM accordingly, write operation or read operation will obtain carrying out.Write operation if desired, memory model is connected with the interface (the Mem_Block_N interface 1253 among Figure 57) of user memory block N, provides wrx as one of its control signal.This control signal wrx is provided to FD bus driver and EVALFSMx unit.If wrx is in logical one, suitable selection signal and output_en signal are provided to the FD bus driver so that the memory write data are placed on the FD bus.This identical control signal that is currently located on the FD bus can be by the storage address in the CTRL_FPGA unit/control latches.Storage address/control lock storage is via a MA[18:2]/control bus sends to SRAM with address and control signal.The wrx control signal that is in logical one is extracted from the FD bus, and because a write operation is requested, so the data relevant with address and control signal on the FD bus are sent to the SRAM storage arrangement.
Therefore, shown in Figure 61, this next fpga logic device, promptly the logical unit FPGA0 in the low side group shown in track 1377, is placed into FD[31:0 with AC0_0] on.Simulation system is write operation of WD0_0 operation.Then, AC0_1 is placed on FD[31:0] on the bus.Yet, if a read operation is requested, AC0_1 is being placed into FD bus FD[31:0] afterwards, be placed on the FD bus of SRAM storage arrangement at RD0_0 (rather than corresponding to AC0_0 WD0_0) before, some time delays can appear.
Note, indicated as track 1383, at MA[18:2]/place ACO_0 and on the FD bus, place the delay that address, control and data are compared to be had a little on the control bus.This is because address/control signal that the MEMFSM unit needs the time to latch to be come from the FD bus, extract the mem_wr signal, and for address/suitable selection signal of control multiplexer generation so that address/control signal can be placed on MA[18:2]/control bus on.In addition, at MA[18:2]/place address/control signal to after the SRAM storage arrangement on the control bus, simulation system must wait for that the corresponding data from the SRAM storage arrangement is placed on the FD bus.Example is the time migration between track 1384 and the track 1381, wherein is placed on MA[18:2 at AC1_1]/control bus on after, RD1_1 just is placed on the FD bus.
On high-end group, FPGA1 is being placed on AC1_0 bus FD[63:32] on, follow placement by WD1_0.Thereafter, AC1_1 is placed on bus FD[63:32] on.This is shown by track 1380.When AC1_1 was placed on the FD bus, control signal showed a read operation in this example.Therefore, according to above describing, when AC1_1 is placed on MA[18 shown in track 1384; 2]/control bus on the time, the correct wrx and the mem_wr signal that are in logical zero are provided in address/control signal, and are sent to EVALFSMx and MEMFSM unit.Because simulation system knows that this is a read operation, write data just can not be transferred to the SRAM storage arrangement; On the contrary, relevant with AC1_1 read data is placed on the FD bus so that user's design logic read by analog memory block interface afterwards by the SRAM storage arrangement.This is represented by the track 1381 on high-end group.On the low side group, shown in track 1378, at MA[18:2]/placing after the AC0_1 (not shown) on the control bus, RD0_1 is placed on the FD bus.
When EVALFSMx such as track 1388 were depicted as memory read data double buffer in the analog memory block interface and produce rd_lat0 signal 1362, user's design logic had just been finished via the read operation of analog memory block interface.This rd_lat0 signal is provided to low side group FPGA0 and high-end group of FPGA1.
Thereafter, the next memory block of each fpga logic device is placed on the FD bus.AC2_0 is placed on the low side group FD bus, and AC3_0 is placed on the high-end group of FD bus.Write operation if desired, WD2_0 is placed on the low side group FD bus, and WD3_0 is placed on the high-end group of FD bus.Shown in track 1385, AC3_0 is placed on high-end group of MA[18:2]/control bus on.This process is proceeded, so that next memory block carries out the write and read operation.Notes, may on inconsistent time and speed, take place, and Figure 61 shows a specific example that its low and middle-end is identical with high-end group sequential at the write and read operation of low side group and high-end group.In addition, the write operation of low side and high-end group takes place together, and so the read operation on latter two group is followed thereafter.But situation is not always like this.The existence of low side and high-end group allows to be connected to the equipment parallel work-flow of these groups; That is to say that the activity on the low side group is independent of the activity on high-end group.Also can imagine other situation, promptly move a series of write operation when the low side group, and high-end group just when a series of read operation of parallel running.
When last data in last the fpga logic device that runs into each group, shown in track 1376, SHIFTOUT signal 1357 is asserted.For read operation, shown in track 1389, be asserted to the RD3_1 that reads on RD2_1 and the track 1379 of reading on the track 1382 corresponding to the rd_latl signal 1363 of the FPGA3 on the FPGA2 on the low side group and high-end group.Because last data of last FPGA unit are accessed, so shown in track 1390, DONE signal 1364 shows finishing of simulation Writing/Reading cycle.
Following table H has listed and has described register/memory, the PCI storage address of various parts on the simulation system circuit board and their correspondences, and the local address.
Table H: memory mapped
In table J, shown the data layout that is used for configuration file according to an embodiment of the invention below.CPU sends a word and disposes a bit so that be that FPGAs is parallel on all plate by pci bus is each.
Table J: configuration data form
Following table K has listed the XSFR_EVAL register.It is present in all circuit boards.The XSFR_EVAL register is used for the EVAL cycle is programmed, controls the DMA read/write and reads EVAL_DONE and the state of XSFR_DONE field by the host computer system.The host computer system also uses this register to start memory access.About the operation of the simulation system of this register will be described together with Figure 62 and 63 below.
Table K: the XSFR_EVAL register (local address: 0h) of all 6 circuit boards
Following table L has listed CONFIG_JTAG[6:1] content of register.CPU configuration fpga logic device, and be the sweep test of fpga logic device running boundary by this register.Each circuit board all has a special register.
Table L:CONFIG_JTAG[6:1] register
Figure 62 and 63 has shown the sequential chart of another one embodiment of the present invention.These two figure have represented the operation of the simulation system of relevant XSFR_EVAL register.The XSFR_EVAL register is used for the DMA read/write is programmed, controlled to the EVAL cycle by the host computer system, and the state of reading EVAL_DONE and XSFR_DONE field.The host computer system also uses this register to start memory access.One of main difference point between these two figure is the state of WAIT_EVAL field.When the WAIT_EVAL field was configured to " 0 ", as the situation of Figure 62, DMA read to begin after being transmitted in CLK_EN.When the WAIT_EVAL field was configured to " 1 ", as the situation of Figure 63, DMA read to begin after being transmitted in EVAL_D0NE.
In Figure 62, WR_XSFR_EN and RD_XSFR_EN are configured to " 1 ".These two fields start DMA Writing/Reading transmission, and can be removed by XSFR_DONE.Because two fields all are configured to " 1 ", so moving DMA at first automatically, the CTRL_FPGA unit writes transmission, move DMA then and read transmission.Yet the WAIT_EVAL field is configured to " 0 ", shows that DMA reads to be transmitted in CLK_EN and begin (and beginning) after asserting after the DMA write operation is finished.Therefore, in Figure 62, in case CLK_EN signal (software clock) is detected, the DMA read operation almost takes place after the DMA write operation is finished at once.DMA reads to transmit operation and does not wait for finishing of EVAL cycle.
At the place that begins of sequential chart, when the contention of a plurality of fpga logic device was noted, the EVAL_REQ_N signal can experience contention.Such as previously explained, if any one fpga logic device asserts that this signal, EVAL_REQ_N (or EVAL_REQ#) signal just are used to start the estimation circulation.In DTD, in the estimation cycle, comprise the operation of address pointer initialization and software clock, so that help estimation process.
The DONE signal that produces when the DMA data transfer cycle is finished when a plurality of LAST signals (from disconnected shiftin and the shiftout signal of each fpga logic device output) are produced and be provided to the CTRL_FPGA unit, also can experience contention.When all LAST signals all are received and pass through processing, just produce the DONE signal, and can begin a new DMA data transmission operation.The EVAL_REQ_N signal uses identical circuit with the DONE signal on the basis of timesharing, we will discuss its mode hereinafter.
System begins DMA at first automatically and writes transmission, shown in the WR_XSFR signal at times 1409 place.The beginning of WR_XSFR signal partly comprises some expenses relevant with pci controller, in one embodiment, is PCI9080 or 9060.Thereafter, the host computer system is via local bus LD[31:0] and FPGA bus FD[63:0] to being connected to FPGA bus FD[63:0] DMA write operation of fpga logic device operation.
At times 1412 place, the WR_XSFR signal is deactivated, and shows finishing of DMA write operation.The 125EVAL signal is in that a predetermined sequential of 1410 is activated from the time 1412 to the time.The EVALTIME duration is programmable, and is set in 8+X at first, and wherein X derives from the longest signal trajectory path.The XSFR_DONE signal also is activated in a short time, shows finishing of this DMA transmission operation, and wherein current operation is a DMA write operation.
Equally also in the time 1412, the contention between the EVAL_REQ_N signal has stopped, but the current transmission of the circuit EVAL_REQ_N signal that carries the DONE signal is given the CTRL_FPGA unit.For 3 clock period, the EVAL_REQ_N signal obtains handling by the circuit that carries the DONE signal.After 3 clock period, the EVAL_REQ_N signal is no longer produced by the fpga logic device, but the EVAL_REQ_N signal that before had been sent to the CTRL_FPGA unit will obtain handling.The EVAL_REQ_N signal is that the maximum sequential that gated clock produces approximately is 23 clock period by the fpga logic device no longer.The EVAL_REQ_N signal longer than this cycle will be left in the basket.
At times 1413 place, big 2 clock period (in the end of DMA write operation) after the time 1412, the CTRL_FPGA unit sends to pci controller (for instance, PLX PCI9080) with a write address gating signal WPLX_ADS_N, and beginning DMA reads transmission.After 1413 about 24 clock period of time, pci controller will start DMA and read transmission course, and the DONE signal produces simultaneously.At times 1414 place, before pci controller began the DMA read procedure, the RD_XSFR signal was activated, and started DMA and read transmission.Some PLX overhead datas are at first transmitted and are handled.At times 1415 place, during this overhead data was processed, the DMA read data was placed on FPGA bus FD[63:0] and local bus LD[31:0] on.When 24 clock period since the time 1413 finish, when producing from the DONE signal enabling of fpga logic device and EVAL_REQ_N signal, pci controller is by will be from FPGA bus FD[63:0] and local bus LD[31:0] data transmission handle the DMA read data to mainframe computer system.
At times 1410 place, the DMA read data will continue to obtain to handle, and the EVAL signal is deactivated, and the EVAL_DONE signal will be activated, and shows finishing of EVAL cycle.When the fpga logic device produced the EVAL_REQ_N signal, the contention between them also began to occur.
At times 1417 place, just the DMA read cycle before the times 1416, the place finished, mainframe computer system with poll PLX interrupt register to determine whether dma cycle wanes to the close.Pci controller knows how many cycles finish the DMA data transmission procedure needs.After the cycle of predetermined number, pci controller will be set a special position in its interrupt register.Interrupt register in this pci controller of CPU poll in the mainframe computer system.If this position is set, CPU just knows that dma cycle almost finishes.CPU in the host computer system is poll interrupt register all the time not, because it will hinder pci bus with a read cycle then.Therefore, in one embodiment of the invention, before the poll interrupt register, the CPU in the mainframe computer system is programmed the cycle of waiting for some.
After the sequential of a weak point, the end of DMA read cycle is in times 1416 place's generation, and RD_XSFR is deactivated simultaneously, and the DMA read data also no longer is positioned at FPGA bus FD[63:0] or local bus LD[31:0] on.The XSFR_DONE signal also the times 1416 place be activated, also begun for the contention that produces the DONE signal between the LAST signal.
Producing in the whole dma cycle of time 1417 from times 1409 WR_XSFR signal, CPU in the mainframe computer system does not visit the analog hardware system, in one embodiment, the duration in this cycle is expense sequential of (1) pci controller sequential 2 and the number of words order of (2) WR_XSFR and RD_XSFR, and the summation of (for example SunULTRASparc) PCI expense of (3) mainframe computer system.In the time of the interrupt register in the CPU poll pci controller, first visit after the dma cycle is in times 1419 place's generation.
In the time 1411,3 clock period are located after the time 1416 greatly, and the MEM_EN signal is activated, and to start SRAM storage arrangement on the plate, the memory access between fpga logic device and the SRAM storage arrangement just can begin like this.Memory access continued up to the time 1419, and in one embodiment, essential 5 clock period of each visit.If do not need DMA to read transmission, memory access can more early begin at times 1410 place so, rather than the times 1411 place begin.
When memory access at FPGA bus FD[63:0] go up when between fpga logic device and SRAM storage arrangement, taking place, the CPU in the mainframe computer system can be via local bus LD[31:0] 1429 communicate from the time 1418 to the time with pci controller and CTRL_FPGA unit.This occurs in after the interrupt register that CPU finishes the poll pci controller.CPU is write data on different registers, so that prepare next data transmission.The duration in this cycle is greater than 4 microseconds.If memory access is shorter than this cycle, so FPGA bus FD[63:0] will be without the conflict of what being successively held.At times 1429 place, the XSFR_DONE signal is deactivated.
In Figure 63, sequential chart is different from the sequential chart of Figure 62, because in Figure 63, the WAIT_EVAL field is configured to " 1 ".In other words, DMA reads the transmission cycle and begins after the EVAL_DONE signal has been activated and has almost finished.The approaching of its wait EVAL cycle finished, rather than gets started after the DMA write operation is finished.The EVAL signal is in that a scheduled timing 1410 is activated from the time 1412 to the time.At times 1410 place, the EVAL_DONE signal is activated, and shows finishing of EVAL cycle.
In Figure 63, after times 1412 DMA of place write operation, the CTRL_FPGA unit produces write address gating signal WPLX_ADS_N up to times 1420 place's ability position pci controller, and this approximately is 16 clock period before the EVAL end cycle.The XSFR_DONE signal also is lengthened to times 1423 place.At times 1423 place, the XSFR_DONE field is set, and produces the WPLX_ADS_N signal then, so that start the DMA read procedure.
In the time 1420, i.e. 16 clock period before the EVAL_DONE signal activation greatly, the CTRL_FPGA unit sends to pci controller (for instance, PLXPCI9080) so that beginning DMA reads transmission with a write address gating signal WPLX ADS_N.Locating since 1420 about 24 clock period of time, pci controller will start DMA and read transmission course, and the DONE signal is also produced.At times 1421 place, begin to carry out before DMA reads to handle at pci controller, the RD_XSFR signal is activated, and reads transmission to start DMA.Some PLX overhead datas at first obtain transmission and handle.At times 1422 place, this overhead data processed during, the DMA read data is placed on FPGA bus FD[63:0] and local bus LD[31:0] on.The times 1424 place 24 clock period finish in, pci controller is by will be from FPGA bus FD[63:0] and local bus LD[31:0] data transmission handle the DMA read data to mainframe computer system.The remainder of sequential chart is identical with the remainder of Figure 62.
Therefore, the activation of the RD_XSFR signal among Figure 63 late than among Figure 62.RD_XSFR signal among Figure 63 the EVAL cycle near after finishing the place, so that postpone the DMA read operation.RD_XSFR signal among Figure 62 is write at DMA and is transmitted the detecting of following the CLK_EN signal after finishing.
IX. collaborative check system
Collaborative check system of the present invention can be by providing flexibly software simulation and quickening the design/construction cycle because of the faster speed of using a hardware model to the deviser.The hardware and software part of design can both obtain check before ASIC makes, and did not have all restrictions based on the collaborative verification instrument of emulator.Debug function is enhanced, and whole debug time also may be shortened significantly.
With ASIC is the collaborative verification instrument of tradition of Devices to test
Figure 64 has shown the final design of a typical PCI additional card (add-on card), for example video, multimedia, Ethernet or SCSI card.This card 2000 comprises the direct interface connector 2002 that a permission is communicated by letter with other peripherals.Connector 2002 is connected to bus 2001, so that send the vision signal from video recorder, camera or TV tuner; Output video and audio frequency are to display or loudspeaker; And transfer signals to communication or disk drive interface.According to user's design, the person skilled in art can predict other interface requirements.A large amount of functions of the design are present in via bus 2003 and are connected in the chip 2004 of interface connector 2002, and are used to produce the local oscillator 2005 of a local clock signal and via the storer 2006 of bus 2008 via bus 2007.Additional card 2000 also comprises a PCI connector 2009, is used for being connected with pci bus 2010.
Before the design of implementing as an additional card as shown in Figure 64, the design is reduced to the ASIC form, and this is the purpose in order to test.Shown that at Figure 65 a traditional hardware/software works in coordination with the verification instrument.User's design is embodied as the form of ASIC---in Figure 65, be denoted as Devices to test (or " DUT ") 2024.In order to obtain from the excitation that is designed to connected multiple resource, Devices to test 2024 is placed within the goal systems 2020, and this system is the central computer system 2021 on the mainboard and the combination of some peripheral hardwares.Goal systems 2020 comprises a central computer system 2021,2021 comprise a CPU and storer, and operate to move some application programs under certain operations system (as the Solaris of Microsoft Windows or Sun Microsystem company).As known to persons skilled in the art, the Solaris of Sun MicroSystem company is an operating environment and software product combination, supports Internet (the Internet), Intranet (in-house network) and enterprise calculation.The Solaris operating environment is based on industrial standard unix system V edition 4, and be designed in a distributed network environment to carry out client computer one server application, for less working group provides suitable resource, and provide ecommerce needed WebTone.
The device driver 2022 of Devices to test 2024 is in central computer system 2021, to realize the communication between operating system (and Any Application) and the Devices to test 2024.As known to persons skilled in the art, special software that device driver is control computer system hardware parts or peripheral hardware.A device driver is responsible for the hardware register of access means, and generally includes the interruption that an interrupt handling routine produces with treatment facility.Device driver constitutes other part of lowermost level of operating system nucleus often, and by these parts, when kernel was built, they were coupled.There is the driver that can be written in some systems more recently, and can install from file after operating system.
Devices to test 2024 and central computer system 2021 are connected on the pci bus 2023.Other peripheral hardwares in the goal systems 2020 comprise an Ethernet PCI additional card 2025, be used for goal systems being connected to a network 2030 by bus 2034, a SCSI PCI additional card 2026 that is connected to SCSI driver 2027 and 2031 via bus 2036 and 2035, one is connected to the video recorder 2028 (if being essential) of Devices to test 2024 and a display and/or a loudspeaker 2029 (if being essential) that is connected to Devices to test 2024 via bus 2033 via bus 2032 in the design of Devices to test 2024 in the design of Devices to test 2024.As known to persons skilled in the art, SCSI represents " small computer system interface ", this is a kind of standard that is independent of processor of carrying out system-level interface between computing machine and smart machine (for example, hard disk, floppy disk, CD, printer, scanner and Geng Duo equipment).
In this goal systems environment, Devices to test 2024 can be used to obtain detecting from the multiple excitation of central computer system (being operating system, application program) and peripheral hardware.If the time is not the problem that will consider, and the deviser only seeks a kind of simply by/failure test, and this collaborative verification instrument should be able to fully satisfy their needs.Yet in most of situations, a design item has strict budget, and before product is released strict predetermined schedule is arranged.Just as explained above, this specific collaborative verification instrument based on ASIC is also unsatisfactory, because not existing, its debug function (do not have masterful technique, the reason of test that the deviser can't isolate " failure ", and project can't predict " reparation " number of each mistake of discovery when beginning, and therefore also can't predict schedule and budget).
Work in coordination with the verification instrument with emulator as the tradition of equipment to be tested
Figure 66 for example understands the collaborative verification instrument of the tradition that has an emulator.Different with the device that above illustrates in Figure 64, Devices to test is programmed in emulator 2048, and emulator 2048 is connected to goal systems 2040 and some peripheral hardwares and a testing workstation 2052.Emulator 2048 comprises a simulation clock 2066 and is programmed the Devices to test of including in the emulator.
Emulator 2048 is connected to goal systems 2040 via pci bus bridge 2044 and pci bus 2057 and operation circuit 2056.Goal systems 2040 comprises the central computer system 2041 on the mainboard and the combination of some peripheral hardwares.Goal systems 2040 comprises a central computer system 2041,2041 comprise processor and storer, and operate the application program with the operation some under certain operations system (as the Solaris of MicrosoftWindows or Sun Microsystem company).The device driver 2042 of Devices to test 2024 is in central computer system 2041, to realize the communication between the Devices to test in operating system (and Any Application) and the emulator 2048.In order to communicate with emulator 2048 and as other equipment of a computing environment part, central computer system 2041 is connected on the pci bus 2043.Other peripheral hardwares in the goal systems 2040 comprise an Ethernet PCI additional card 2045, be used for goal systems being connected to 2049, one SCSI PCI additional card 2046 that are connected to SCSI driver 2047 and 2050 via bus 2060 and 2059 of a network by bus 2058.
Emulator 2048 also is connected to testing workstation 2052 via bus 2062.Testing workstation 2052 comprises a CPU and storer, carries out its function.Testing workstation 2052 also may comprise test case 2067 and be used for other modeled but not with the device model 2068 of the equipment of emulator 2048 physical connections.
At last, emulator 2048 is connected to some other peripheral hardwares via bus 2061, as frame buffer or traffic logging/Play System 2051.This frame buffer or traffic logging/Play System 2051 also may be connected to communication facilities or channel 2053, be connected to video tape recorder 2054 via bus 2064 via bus 2063, and are connected to display and/or loudspeaker 2055 via bus 2065.
As known to persons skilled in the art, the travelling speed of simulation clock is much more slowly than actual goal systems speed.Therefore, the dash area of Figure 66 moves with simulation velocity, and other shadeless part is with the goal systems speed operation of reality.
As indicated above, this collaborative verification worker who has emulator has some limitation.When using a logic analyser or sampling and preservation equipment to obtain the internal state information of Devices to test, the deviser must compile his design, takes a sample so that the relevant signal that he wants to check, be used for debugging purpose can be provided to output pin.If the deviser wants a different piece of design is debugged, he just must determine that this part has the output signal that can be taken a sample by logic analyser or sampling and preservation equipment, otherwise he must recompilate his design in emulator 2048, so that these signals can be realized the sampling purpose on output pin.The time of these recompilities may need a couple of days or a few weeks longer, and for a time requirement strict design/development time table, this may be too tediously long delay.In addition, because this collaborative verification instrument uses signal, thus complicated circuit must be provided or these conversion of signals are become data, or the sequential control of some signals to signal is provided.And each signal that will take a sample must use a lot of circuit 2061 and 2062, and this has just increased burden and time that debugging is provided with.
Have the simulation of reconfigurable computing array
As a brief summary, Figure 67 for example understands a kind of high level configuration with the reconfigurable calculating of single engine (RCC) array system of the present invention, and this patent right instructions once above had been described it.This single engine RCC system will be integrated in the collaborative check system according to an embodiment of the invention.
In Figure 67, RCC array system 2080 comprises 2081, one reconfigurable calculating of a rcc computing system (RCC) hardware array 2084 and the pci bus 2089 that they are linked together.Importantly, rcc computing system 2081 comprises the complete model of user's design in the software, and RCC hardware array 2084 comprises the hardware model that the user designs.Rcc computing system 2081 comprises CPU, storer, an operating system and moves the necessary software of single engine RCC system 2080.Provide a software clock 2082 so that the strictness control of software model in the realization rcc computing system 2081 and the hardware model in the RCC hardware array 2084.Test platform data 2083 also are stored in the rcc computing system 2081.
RCC hardware array system 2084 comprises 2085, one groups of RCC hardware of pci interface array board 2086, and the various buses that realize the interface purpose.RCC hardware array board group 2086 comprises the part (just, hardware model 2087) of the user design of modelling in hardware at least and is used for test platform memory of data 2088.In one embodiment, during disposing, the each several part of this hardware model is distributed between a plurality of reconfigurable logic elements (fpga chip for instance).Because use more reconfigurable logic element or chip, therefore need to use more circuit board.In one embodiment, have four reconfigurable logic elements on a single circuit board.In other embodiment, have eight reconfigurable logic elements on the single circuit board.The capacity of the reconfigurable logic element in four-core sheet circuit board has different significantly with ability with the capacity and the ability of the reconfigurable parts of eight chip circuit plates.
Bus 2,090 2087 provides different clocks for hardware model from pci interface 2085 to hardware model.Bus 2091 provides other I/O data via connector 2093 and internal bus 2094 between pci interface 2085 and hardware model 2087.Bus 2092 plays the function of the pci bus between pci interface 2085 and the hardware model 2087.The test platform data also can be stored in the storer in the hardware model 2087.As indicated above, hardware model 2087 comprises except realizing that hardware model and rcc computing system 2081 faces connect other 26S Proteasome Structure and Functions the hardware model that needed user designs.
This RCC system 2080 may be arranged in the single workstation, perhaps is connected to a network of workstations, and wherein each workstation is visited RCC system 2080 in the mode of timesharing.In fact, RCC array system 2080 is as emulating server, and it has an operation simulation program and status exchange mechanism.Server allows each user in the workstation to visit RCC hardware array 2084 for realizing acceleration at a high speed and hardware state switching purpose.After acceleration and status exchange, each user can both can discharge the control of RCC hardware array 2084 simultaneously to other users of other workstations in this locality with software simulation user design.The collaborative check system that this network model also will be used to describe below.
RCC array system 2080 provides the ability and the dirigibility of the whole design of simulation for the deviser, the deviser can also come the part of accelerating part at the cycle build-in test point of selecting via the hardware model in the reconfigurable computing array, and obtains the internal state information of any part in its design at any time.Really, computing array single engine, reconfigurable (RCC) system can be described as a hardware-accelerated simulator by coarse, and it can be used to move following task between single limber up period: simulate alone (1); (2) with hardware-accelerated simulation, wherein the user can start at any time, stop, the value of asserting, and the internal state of checking design, (3) back sunykatuib analysis, and (4) interior circuit simulation.Because software model and hardware model all are under the strictness control of single engine via a software clock, the hardware model in the reconfigurable computing array is arrived the software simulation model by close-coupled.This allows deviser's one-period to connect one-period to debug, and quickens and the hardware model that slows down through the cycle of some, so that obtain valuable internal state information.And, because this simulation system deal with data, rather than signal, so do not need complicated signal-data-switching/sequential circuit.In addition, if the deviser wishes to check different groups of nodes, the hardware model in the reconfigurable computing array does not need to recompilate, and this point is unlike typical analogue system.Relevant further details please be looked back description above.
The collaborative check system that does not have outside I/O
One embodiment of the invention is one and does not use collaborative check system true and outside input-output apparatus of physics and destination application.Therefore, a collaborative check system according to an embodiment of the invention can be integrated in the RCC system works together with the functional of other, so that debug the software section and the hardware components of user's design, and need not use the goal systems or the input-output apparatus of any reality.On the contrary, goal systems and outside input-output apparatus by modelling in the software of rcc computing system.
With reference to Figure 68, collaborative check system 2100 comprises a rcc computing system 2101, RCC hardware array 2108 and the pci bus 2114 that they are coupled together.Importantly, rcc computing system 2101 is included in the whole model of the user's design in the software, and reconfigurable computing array 2108 comprises the hardware model that the user designs.Rcc computing system 2101 comprises CPU, storer, an operating system and moves the necessary software of the collaborative check system 2100 of single engine.Software clock 2104 is provided to realize the software model in the rcc computing system 2101 and the strictness control of the hardware model in the reconfigurable computing array 2108.Test case 2103 also is stored in the rcc computing system 2101.
According to one embodiment of the invention, rcc computing system 2101 comprises that also being labeled as 2106 destination application 2102, user designs the driver 2105 of hardware model, an equipment (for instance, a video card) model and its driver in software, and be labeled as model of other equipment (display for instance) of 2107 and its driver in software.In essence, rcc computing system 2101 comprises device model as much as possible and driver as required, is transferred to the software model and the hardware model of user's design, and real goal system and other input-output apparatus are ingredients of this computing environment.
RCC hardware array 2108 comprises a pci interface 2109, one groups of RCC hardware array board 2110 and various for realizing the bus of interface purpose.RCC hardware array board group 2110 comprise at least modelling in hardware certain customers design 2112 and be used for test platform memory of data 2113.According to description above, each circuit board comprises a plurality of reconfigurable logic elements or chip.
Bus 2,115 2112 provides various clocks for hardware model from pci interface 2109 to hardware model.Bus 2116 provides other I/O data between pci interface 2109 and the hardware model 2112 via connector 2111 and internal bus 2118.The function of bus 2117 is equivalent to the pci bus between pci interface 2109 and the hardware model 2112.The test platform data can be stored in the storer in the hardware model 2113.According to description above, hardware model comprises except making hardware model and rcc computing system 2101 faces connect other 26S Proteasome Structure and Functions the hardware model of needed user design.
For the collaborative check system among Figure 68 and traditional collaborative check system based on emulator are compared, Figure 66 has shown emulator 2048, some input-output apparatus (frame buffer or traffic logging/Play System 2051 for instance) and workstation 2052 that is connected to goal systems 2040.This emulator is configured to the trouble that the deviser has proposed a lot of problems and set.Emulator needs a logic analyser or sampling and preservation equipment, so that the user of measurement modelization in emulator designs internal state.Because logic analyser and sampling and preservation equipment need signal, therefore require to have complicated signal-data converting circuit.In addition, also need complicated signal-signal sequence control circuit.Each signal all needs many leads to measure the internal state of emulator, and this will further increase the burden that the user runs into when arranging.Between limber up period, when each user wants to check a different set of internal logic circuit, he must recompilate emulator, and the proper signal from that logical circuit is used for measuring as output like this, and carries out record by logic analyser or sampling and preservation equipment.Very long recompility time cost is too expensive.
Do not connect outside input-output apparatus in the collaborative check system of the present invention, goal systems and other input-output apparatus in software, are not therefore needed physical presence real physical target system and input-output apparatus by modelling.Because rcc computing system 2101 deal with data, so undesired signal-data converting circuit or signal-signal sequence control circuit.The lead number is not subjected to the signal limited in number yet, therefore, assembles simple relatively.In addition, because collaborative check system deal with data rather than signal, so the different piece of logical circuit does not need to recompilate yet in the hardware model of debugging user design.Because rcc computing system is with the clock (just, software clock and clock edge circuit for detecting) of software control control RCC hardware array, so the beginning of hardware model and stop to have become easily.Because the model of whole user design is in the software, and software clock realizes synchronously, so also be easy from the hardware model read data.Therefore, the user can only debug by software simulation, the part or all of user's design in the accelerating hardware, and one-period connects one-period through each test point that needs, check the internal state (just, register and combinatorial logic state) of software and hardware model.For instance, the user can design with some test platform digital simulations, then internal state information is downloaded to hardware model, various test platform data with the band hardware model are quickened design, by the internal state value that register/combinational logic is rebuild and the value of being written into is checked final hardware model from the hardware model to the software model, and the user can adopt software to come other parts of analog subscriber design at last by using the result of hardware model accelerator.
Yet,,, still need a workstation for debug procedures control according to description above.In a network configuration, a workstation may be connected in the collaborative check system remote access tune-up data by long-range.In a non-network configuration, a workstation may be connected to collaborative check system in this locality, and perhaps in some other embodiment, workstation may be incorporated in inside with working in coordination with check system, so that can be at this accessing tune-up data.
The collaborative check system that has outside I/O
In Figure 68, various input-output apparatus and destination application by modelling in rcc computing system 2101.Yet when too many input-output apparatus and destination application moved in rcc computing system 2101, bulk velocity can slow down.If have only a single-processor in the rcc computing system 2101, just must more time handle various data from all devices model and destination application.In order to increase volume of transmitted data, true input-output apparatus and destination application (rather than software model of input-output apparatus and destination application) may be by physical connection to collaborative check systems.
One embodiment of the invention is one and uses collaborative check system true and outside input-output apparatus of physics and destination application.Therefore, when using real goal systems and input-output apparatus, a collaborative check system can merge the RCC system together with other functions, with the software section and the hardware components of debugging user design.For test purpose, collaborative check system can use from the test platform data of software with from the excitation of external interface (goal systems and outside input-output apparatus for instance).The test platform data can not only be used to pin to user design and distribute test data is provided, and provide test data for the internal node of user in designing.Only may be introduced into the user from the true input/output signal of outside input-output apparatus (or goal systems) designs pin and distributes.Therefore, from an external interface (for instance, goal systems or outside input-output apparatus) the test data and one of the key distinction of the test platform in the software between handling, just be that the test platform data can be used to come the test subscriber to design by the excitation that is applied to pin distribution and internal node, and can only be applied to user's design via its pin distribution (or node of in user's design, representing pin to distribute) from the True Data of goal systems or outside input-output apparatus.In following discussion, we will present about the collaborative check system structure of a goal systems and outside input-output apparatus and its configuration.
As with Figure 66 in the comparison carried out of system configuration, collaborative check system according to an embodiment of the invention has been replaced the 26S Proteasome Structure and Function in the frame of broken lines 2070.In other words, Figure 66 shows the interior emulator and the workstation of scope of dotted line 2070, and one embodiment of the present of invention comprise collaborative check system 2140 (with its workstation that links) at dotted line 2070 places, as the shown collaborative check system 2140 of Figure 69.
With reference to Figure 69, collaborative check system configuration according to an embodiment of the invention comprises a goal systems 2120, collaborative check system 2140, some optional input-output apparatus and control/ data bus 2131 and 2132 that they are coupled together.Goal systems 2120 comprises a central computer system 2121, this computing system comprises a CPU and storer, and in the certain operations system, move, for example the Solaris of Microsoft Windows or Sun Microsystem company is so that the application program 2122 and the test case 2123 of operation some.The device driver 2124 that is used for the hardware model of user's design is comprised in central computer system 2121, to realize the communication between operating system (and Any Application) and the user design.In order to communicate with collaborative check system and other equipment of constituting this computing environment part, central computer system 2121 is connected to pci bus 2129.Other peripheral hardwares in the goal systems 2120 comprise an Ethernet PCI additional card 2125, be used for goal systems is connected to a network, be connected to a SCSI PCI additional card 2126 and a pci bus bridge 2127 of SCSI driver 2128 via bus 2130.
Collaborative check system 2140 comprises a rcc computing system 2141, a RCC hardware array 2190, one external interface 2139 of outside I/O extender form is with a pci bus 2171 that rcc computing system 2141 is connected together with RCC hardware array 2190.Rcc computing system 2141 comprises CPU, storer, an operating system and moves the collaborative check system 2140 necessary softwares of single engine.Importantly, rcc computing system 2141 comprises the whole model of user's design in the software, and RCC hardware array 2190 comprises the hardware model that the user designs.
According to discussion above, the single engine of collaborative check system is from major software kernel acquisition its ability and a dirigibility, and this kernel resides in the primary memory of rcc computing system 2141, and the whole operations and the execution of the collaborative check system 2140 of control.As long as any test platform is in active state, or be sent to collaborative check system from any signal in the external world, the test platform parts that kernel activates with regard to estimation, estimation clock unit, detecting clock edge are with the renewal RS, simultaneously also propagate the combinational logic data, and the propulsion module pseudotime.This major software kernel has been realized the close-coupled character of rcc computing system 2141 and RCC hardware array 2190.
Software kernel produces the software clock signal from a software clock source 2142, and this signal is provided to the RCC hardware array 2190 and the external world.Clock source 2142 can produce multiple clock according to the various objectives ground of software clock on different frequencies.Usually, software clock guarantees to estimate synchronously at register and system clock that the user designs in the hardware model, and the phenomenon that upsets without any holding time.Software model can be detected the clock edge that influences the hardware model register value in software.Therefore, a kind of clock detecting mechanism guarantees that in the major software model clock edge detecting can be converted into the detecting of hardware model clock.More detailed discussion about software clock and clock edge detecting logic please refer to the corresponding text in Figure 17-19 and the patent specification.
Situation according to an embodiment of the invention is that rcc computing system 2141 also may comprise one or more models of the input-output apparatus of some, although other real physics input-output apparatus may be connected in the collaborative check system.For instance, rcc computing system 2141 may comprise 2143 the equipment of being labeled as (for instance, a loudspeaker) the test platform data in model and its driver and the software, and be labeled as 2144 another one equipment (graphics accelerator for instance) model together with the test platform data in its driver and the software.The user determines which equipment (and their driver and test platform data separately) and to merge in the rcc computing system 2141 by modelling, and in fact which equipment will be connected in the collaborative check system.
Collaborative check system comprises a steering logic, this steering logic provides: (1) rcc computing system 2141 and RCC hardware array 2190, and the control of the traffic between (2) external interface (being connected to the interface of goal systems and outside input-output apparatus) and the RCC hardware array 2190.Because some input-output apparatus may be by modelling in rcc computing system, so have some data to pass through between RCC hardware array 2190 and rcc computing system 2141.In addition, rcc computing system 2141 has whole the designing a model in the software, comprises the certain customers design of modelling in RCC hardware array 2190.As a result, rcc computing system 2141 also must be able to be visited through all data between external interface and the RCC hardware array 2190.Steering logic guarantees that rcc computing system 2141 can visit these data.Hereinafter will be described in more detail steering logic.
RCC hardware array 2190 comprises the array board of some.In this specific embodiment that shows in Figure 69, hardware array 2190 comprises circuit board 2145-2149.Circuit board 2146-2149 comprises the hardware model of most of configuration.Circuit board 2145 (or circuit board m1) comprises reconfigurable computing element (for instance, fpga chip) 2153, collaborative check system can use this element to come configuration section hardware model at least, also comprises an externally outside i/o controller 2152 of direct traffic and data between interface (goal systems and input-output apparatus) and the collaborative check system 2140.Circuit board 2145 allows rcc computing system 2141 to visit externally all data of transmission between the world (just, goal systems and the input-output apparatus) and RCC hardware array 2190 via outside i/o controller.This visit is very important, because the rcc computing system 2141 in the collaborative check system is included in a model of whole user's design in the software, and rcc computing system 2141 also can be controlled the function of RCC hardware array 2190.
If the excitation from an outside input-output apparatus is provided to hardware model, software model must also can be visited this excitation, so that the user of this collaborative check system can control next debugging step selectively, this step may comprise the design internal state value of inspection as this application excitation result.As above about circuit-board laying-out and interconnect scheme discuss, first is comprised in the hardware array 2190 with last circuit board.Therefore, circuit board 1 (being labeled as circuit board 2146) and circuit board 8 (being labeled as circuit board 2149) are comprised in the hardware array of being made up of eight circuit boards (not comprising circuit board m1).Except circuit board 2145-2149, may also have the circuit board m2 (in Figure 69, do not show, but see Figure 74) that has chip m2.This circuit board m2 is similar to circuit board m1, except circuit board m2 without any external interface, and add-in card if desired, it can be used to realize the expansion purpose.
The content of these circuit boards will be discussed now.Circuit board 2145 (circuit board m1) comprises 2151, one outside i/o controllers 2152 of a pci controller, data chip (m1) 2153, storer 2154 and multiplexer 2155.In one embodiment, this pci controller is PLX 9080.Pci controller 2151 is connected to rcc computing system 2141 via bus 2171, and is connected to three condition impact damper 2179 via bus 2172.
Externally the main traffic coordinator in the collaborative check system between the world (goal systems 2120 and input-output apparatus) and the rcc computing system 2141 is that an outside i/o controller 2152 is (at Figure 69,71, with 73 in be also referred to as " CTRLXM "), this controller is connected in other circuit boards 2146-2149, goal systems 2120 and the true outside input-output apparatus in rcc computing system 2141, the RCC hardware array.Certainly, as as described above, main traffic coordinator between rcc computing system 2141 and the RCC hardware array 2190 always is the single inner i/o controller (i/ o controller 2156 and 2158 for instance) among each array board 2146-2149 and the combination of pci controller 2151.In one embodiment, these single inner i/o controllers, for example controller 2156 and 2158 is above to be described and illustrational FPGA i/o controller in Figure 22 (unit 700) and Figure 56 exemplary view such as (unit 1200).
Outside i/o controller 2152 is connected to three condition impact damper 2179, so that allow outside i/o controller and rcc computing system 2141 faces to connect.In one embodiment, in some cases, three condition impact damper 2179 allows to pass to local bus 2180 from the data of rcc computing system 2141, prevent from simultaneously to pass to rcc computing system 2141 from the data of local bus, and under the other situation, allow data to be passed to rcc computing system 2141 from local bus 2180.
Outside i/o controller 2152 also is connected to chip (m1) 2153 and storer/external buffer 2154 via data bus 2176.In one embodiment, chip (m1) the 2153rd, reconfigurable calculating unit, fpga chip for example, it can be used to the part hardware model (or all hardware model, enough little if the user designs) of configure user design at least.In one embodiment, external buffer 2154 is DRAM DIMM, and can be used by chip 2153 and be used for realizing multiple purpose.External buffer 2154 provides many memory spans, surpasses the single SRAM storage arrangement that is connected to each reconfigurable logic element (reconfigurable logic element 2157 for instance) in this locality.This big memory span allows rcc computing system to store lot of data, for example test platform data, microcontroller embedded code (if user's design is a microcontroller), and the large-scale LUT table in storage arrangement.According to top description, external buffer 2154 also can be used to store the essential data of hardware modeling.In fact, this external buffer 2154 can partly play another high-end or the same function of low side group SRAM storage arrangement described and illustrated above, the SRAM1205 among Figure 56 and 1206 for example, but it has more storer.External buffer 2154 also can be used for storing the data of receiving from the input-output apparatus of goal systems 2120 and outside by collaborative check system, so that these data can be reclaimed by rcc computing system 2141 after a while.Chip (m1) 2153 and external buffer 2154 also are included in the memory mapped logic of describing in patent specification " storer simulation " part.
In order to visit the data of the needs in the impact damper 2154 externally, chip 2153 and rcc computing system 2141 (via outside i/o controller 2152) can both be the data transfer address of needs.Chip 2153 provides the address on address bus 2182, outside i/o controller 2152 provides the address on address bus 2177.These address buss 2182 and 2177 are inputs of multiplexer 2155, and it provides selected address on output 2178 circuits that are connected to external buffer 2154.Selection signal at multiplexer 2155 is provided via circuit 2181 by outside i/o controller 2152.
Outside i/o controller 2152 also is connected on other circuit boards 2146-2149 via bus 2180.In one embodiment, bus 2180 is above to obtain describing and illustrational local bus in Figure 22 (local bus 708) and Figure 56 exemplary view such as (local buss 1210).In this embodiment, have only five circuit boards (comprising circuit board 2145 (circuit board m1)) to be used to, the true number of circuit board will be decided by the complicacy and the size of the user design of modelling in hardware.User with intermediate complex designs hardware model and designs hardware model needs circuit board still less than the user with higher complexity.
In order to realize scalability, the interconnection line, circuit board 2146-2149 comes down to identical except between some plates.These interconnection lines make at a chip (for instance, chip 2157 in the circuit board 2146) certain customers of lining design hardware model, communicate with the other part that is placed on the hardware model in (chip 2161 in the circuit board 2148 for instance) in the other chip, the same subscriber design by physics.Briefly, understand the interconnection structure of this collaborative check system with reference to Figure 74, simultaneously with reference to Fig. 8 and 36-44, and their corresponding descriptions in the patent specification.
Circuit board 2148 is representational circuit boards.Circuit board 2148 is the 3rd circuit boards in this layout of being made up of four circuit boards (getting rid of circuit board 2145 (circuit board m1)).Therefore, it is not the end plate of the suitable interconnection line terminal of needs.Circuit board 2148 comprises an inner i/o controller 2158, some reconfigurable logic elements (fpga chip for instance) 2159-2166, high-end group of FD bus 2167, low side group FD bus 2168, high-end group of storer 2169 and low side group storer 2170.As indicated above, in one embodiment, inner i/o controller 2158 is to obtain describing and illustrational FPGA i/o controller in Figure 22 (unit 700) and Figure 56 exemplary view such as (unit 1200) as mentioned.Similarly, high-end and low side group memory devices 2169 and 2170 is above, for instance, Figure 56 (SRAM1205 and 1206)) in be described and illustrational SRAM storage arrangement.In one embodiment, high-end and low side group FD bus 2167 and 2168 is to obtain description and illustrational FD bus or FPGA bus in Figure 22 (FPGA bus 718 and 719), Figure 56 (FD bus 1212 and 1213) and Figure 57 exemplary view such as (FD buses 1282) as mentioned.
Be connected to goal systems 2120 and other input-output apparatus in order to work in coordination with check system 2140, an external interface 2139 with the form appearance of an outside I/O extender is provided.In goal systems one side, outside I/O extender 2139 is connected on the PCI bridge 2127 via secondary pci bus 2132 and an operation circuit 2131, is used for the transmitting software clock.In input-output apparatus one side, outside I/O extender 2139 is connected to various input-output apparatus via bus 2136-2138, is used for the operation circuit 2133-2135 and the pin distributed data of software clock.The number that can be connected to the input-output apparatus of I/O extender 2139 is determined by the user.In any case as required, data bus as much as possible and software clock operation circuit are provided in the outside I/O extender 2139, so that input-output apparatus as much as possible is connected to collaborative check system 2140, successful operation debug procedures.
In collaborative check system 2,140 one sides, outside I/O extender 2139 is connected to outside i/o controller 2152 via data bus 2175, software clock operation circuit 2174 and scan control circuit 2173.Data bus 2175 is used to externally transmit between the world (goal systems 2120 and the outside input-output apparatus) and collaborative check system 2140 the pin distributed data.Software clock operation circuit 2174 is used to from rcc computing system 2141 to external world transmitting software clock data.
The software clock that appears on operation circuit 2174 and 2131 is produced by the major software kernel in the rcc computing system 2141.Rcc computing system 2141 is given outside I/O extender 2139 via pci bus 2171, pci controller 2151, bus 2171, three condition impact damper 2179, local bus 2180, outside i/o controller 2152 and software clock of operation circuit 2174 transmission.From outside I/O extender 2139, software clock is used as the clock input and offers goal systems 2120 (via PCI bridge 2127), and offers other outside input-output apparatus via operation circuit 2133-2135.Because software clock plays the function of master clock source, so goal systems 2120 and input-output apparatus are with slow speed operation.Yet the data that are provided to goal systems 2120 and outside input-output apparatus are synchronized to software clock speed, as the hardware model in software model in the rcc computing system 2141 and the RCC hardware array 2190.Similarly, data and the software clock from goal systems 2120 and outside input-output apparatus is transferred to collaborative check system 2140 synchronously.
Therefore, I/O data and the software clock that externally transmits between interface and the collaborative check system is synchronous.In essence, when data were transmitted between them, software clock kept synchronously the operation of outside input-output apparatus and goal systems and the operation of collaborative check system (in rcc computing system and RCC hardware array).Software clock is used to carry out data input operation and data output operation.For data input operation, when an indicator (hereinafter discuss) latched software clock from rcc computing system 2141 to external interface, the selected internal node of other indicators will the hardware model from external interface to RCC hardware array 2190 latched these I/O data.Be transferred to external interface in this cycle at software clock, indicator will latch these I/O data in mode one by one.When all data all were latched, rcc computing system can produce another software clock again, so that latch more multidata in the cycle at another software clock when needed.For data output operation, rcc computing system can be transferred to external interface with software clock, and subsequently under the help of indicator, the gating of the data of the hardware model internal node of control from RCC hardware array 2190 to external interface.Again, indicator with one by one mode internally node to external interface data are carried out gate.If more data need be transferred to external interface, rcc computing system can produce another software clock cycle, starts selected indicator then and comes data are carried out gate, delivers to external interface.The generation of software clock is subjected to strict control, therefore allows collaborative check system that data transmission and data estimation between any outside input-output apparatus of working in coordination with check system and being connected to external interface are kept synchronously.
Scan control circuit 2173 is used to allow collaborative check system 2140 at any data scanning data bus 2132,2136,2137 and 2138 that may occur.Logical circuit in the outside i/o controller 2151 of support sweep signal is an indicator logical circuit, and wherein various inputs are provided as the output that enters a next input special time cycle before via a MOVE signal rows.This logical circuit is similar to the scheme among Figure 11.In fact, sweep signal plays a function at the selection signal of a multiplexer, except it selects various inputs to multiplexer in rotational order.Therefore, in the cycle, the sweep signal on the scan control circuit 2173 is for carrying out sampling operation to data bus 2132 from the data of goal systems 2120 a sequential.In cycle, the sweep signal on the scan control circuit 2173 is for carrying out sampling operation to data bus 2136 from the data that may be connected outside input-output apparatus there in next sequential.In cycle, data bus 2137 is sampled in next sequential, or the like, so collaborative check system 2140 can receive and handle all the pin distributed datas from goal systems 2120 or outside input-output apparatus between this limber up period.Any data of being received from the process of sampled data bus 2132,2136,2137 and 2138 by collaborative check system 2140 all are transferred to external buffer 2154 via outside i/o controller 2152.
Notice that the configuration hypothetical target system 2120 among Figure 69 comprises host CPU, and user's design is some peripheral hardwares, for example other support equipment, card or logical circuit of Video Controller, network adapter, graphics adapter, mouse or some.Therefore, goal systems 2120 comprises the destination application (comprising operating system) that is connected to main pci bus 2129, and collaborative check system 2140 comprises user design, and is connected to secondary pci bus 2132.According to the theme of user's design, this configuration may be diverse.For instance, if user's design is a CPU, destination application can move in the rcc computing system 2141 of collaborative check system 2140, and goal systems 2120 no longer comprises central computer system 2121.In fact, bus 2132 can be a main pci bus now, and bus 2129 can be a secondary pci bus.In fact, user's design is not to be the peripheral hardware of supporting central computer system 2121, and on the contrary, it is the host computer center that the user designs current, and other all peripheral hardwares support that all the user designs.
For the steering logic of the transmission data between external interface (outside I/O extender 2139) and the collaborative check system 2140 is arranged in each circuit board 2145-2149.The major part of steering logic is arranged in outside i/o controller 2152, but other parts are positioned at various inner i/o controllers (for instance, 2156 and 2158) and in the reconfigurable logic element (fpga chip 2159 and 2165 for instance).For realizing illustrative purposes, be necessary only to show the some parts of this steering logic, rather than the repetition logical organization that all chips are identical in all circuit boards.Collaborative check system 2140 parts in the dotted line 2150 among Figure 69 comprise a subclass of steering logic.Discuss this steering logic in more detail now with reference to Figure 70-73.
Parts in this particular subset of steering logic comprise outside i/o controller 2152, three condition impact damper 2179, inner i/o controller 2156 (CTRL1), reconfigurable logic element 2157 (chip 0_1, the chip 0 of indication circuit plate 1) and part be connected to the various buses and the part operation circuit of these parts.Particularly, for example clear data steering logic part in input cycle that is used for of Figure 70, wherein the data from external interface (outside I/O extender 2139) and rcc computing system 2141 are transferred to RCC hardware array 2190.Figure 72 for example understands the data sequential chart in input cycle.Figure 71 for example understands the steering logic part that is used for data output period, and wherein the data from RCC hardware array 2190 are transferred to rcc computing system 2141 and external interface (outside I/O extender 2139).Figure 73 for example understands the sequential chart of data output period.
The data input
The data input control logic is responsible for handling the data from rcc computing system or external interface to the transmission of RCC hardware array according to an embodiment of the invention.A particular subset 2150 (seeing Figure 69) of data input control logic is displayed among Figure 70, and comprises outside i/o controller 2200, three condition impact damper 2202, inner i/o controller 2203, reconfigurable logic element 2204 and allow to carry out therein various buses of data transmission and operation circuit.In this data input embodiment, also shown external buffer 2201.This subclass is for example understood the necessary logic of data input operation, and wherein the data from external interface and rcc computing system are transferred to RCC hardware array.The data input timing figure of Figure 70 data input control logic and Figure 72 will obtain discussing together.
Two types of cycle data obtain using in this data input embodiment of the present invention---and an overall cycle and a software are to hardware (S2H) cycle.The overall situation is used to the data of all chips in any sensing RCC hardware array the cycle, for example clock, reset the S2H data of the many different nodes in other directed RCC hardware arrays with some.Via the overall situation cycle these data are seen off when " overall situation " S2H data for these latter's, more feasible method, rather than via follow-up S2H data.
Software was used in a sequential manner the test platform of data from rcc computing system be handled from a chip to another chip in all circuit boards to the hardware cycle, sent to RCC hardware array.Because the hardware model of user's design is distributed on several circuit boards, the test platform data must be provided to each chip so that carry out data estimation.Therefore, data are transferred to each internal node in each chip in a sequential manner, once are transferred to an internal node.Follow-up transmission allows a particular data that is assigned to a specific internal node to be handled by all chips in the RCC hardware array, because hardware model is distributed among a plurality of chips.
For this data estimation, collaborative verification provides two address space-S2H and CLK.As indicated above, S2H and CLK space are the primary inputs from the kernel to the hardware model.Hardware model keeps all register parts and combiners of subscriber's line circuit design in fact.In addition, software clock, and is provided in the CLK input/output address space in software by modelling, so that be connected with hardware model.Kernel promotes simulated time, seeks effective test platform parts, and the estimation clock unit.When any clock edge was detected by kernel, RS was updated, and value is propagated by combiner.Therefore, if hardware-accelerated pattern is selected, any value in these spaces changes all will trigger hardware model change logic state.
When data transmission, the DATA_XSFR signal is in logic " 1 ".During this, local bus 2222-2230 will be used for according to following cycle data transmission data by collaborative check system:
(1) from rcc computing system to RCC hardware array with the global data in CLK space; (2) global data from external interface to RCC hardware array and external buffer; (3) the S2H data from rcc computing system to RCC hardware array, in each circuit board, next chip.Therefore, initial two data cycles are the parts in overall situation cycle, and last cycle data is the part in S2H cycle.
For first part (wherein the global data from rcc computing system is sent to RCC hardware array) in data input overall situation cycle, outside i/o controller 2200 makes a CPU_IN signal on 2255 circuits become logic " 1 ".Circuit 2255 is connected to a startup input end of three condition impact damper 2202.By the logic on the circuit 2255 " 1 ", three condition impact damper 2202 allows the data on the local bus 2222 to pass to the local bus 2223-2230 of three condition impact damper 2202 opposite sides.In this specific examples, local bus 2223,2224,2225,2226,2227,2228,2229 and 2230 corresponds respectively to LD3, LD4 (from outside i/o controller 2200), LD6 (from outside i/o controller 2200), LD1, LD6, LD4, LD5 and LD7.
Global data is transferred to bus line 2231-2235 the inner i/o controller 2203 from these local buss, and then to FD bus line 2236-2240.In this example, FD bus line 2236,2237,2238,2239 and 2240 corresponds respectively to FD bus line FD1, FD6, FD4, FD5 and FD7.
These FD bus lines 2236-2240 is connected to the input end of the latch 2208-2213 in the reconfigurable logic element 2204.In this example, reconfigurable logic element is corresponding to chip 0_1 (chip 0 in the circuit board 1 just).Simultaneously, FD bus line 2236 is connected to latch 2208, and FD bus line 2237 is connected to latch 2209 and 2211, and FD bus line 2238 is connected to latch 2210, FD bus line 2239 is connected to latch 2212, and FD bus line 2240 is connected to latch 2213.
The startup of each of these latchs 2208-2213 input is connected to some overall indicators and software to hardware (S2H) indicator.The startup input of latch 2208-2211 is connected to overall indicator, and the startup of latch 2212-2213 input is connected to the S2H indicator.Some exemplary overall indicators comprise GLB_PTR0, the GLB_PTR1 on the circuit 2242, the GLB_PTR2 on the circuit 2243 and the GLB_PTR3 on the circuit 2244 on the circuit 2241.Some exemplary S2H indicators comprise S2H_PTR0 on the circuit 2245 and the S2H_PTR1 on the circuit 2246.Because the startup of these latchs input is connected to these indicators, thus latch when separately correct indicator signal not, can not their plan the destination node that latchs in the hardware model of user's design with data latching.
These overall situations and S2H indicator signal are produced in output 2254 by a data input pointer state machine 2214.Data input pointer state machine 2214 is controlled by DATA_XSFR on the circuit 2253 and F_WR signal.Inner i/o controller 2203 produces DATA_XSFR and F_WR signal on circuit 2253.Whenever RCC hardware array and rcc computing system or and external interface between when needing data transmission, DATA_XSFR always is in logic " 1 ".The F_WR signal is opposite with the F_RD signal, and when needs carried out a write operation to RCC hardware array, it was in logic " 1 ".One via the read operation of F_RD signal need be from RCC hardware array to rcc computing system or external interface data transmission.If DATA_XSFR and F_WR signal all are in logic " 1 ", data input pointer state function produces the suitable overall situation or S2H indicator signal according to suitable programmed order.
The output 2247-2252 of these latchs is connected to the user and designs various internal nodes in the hardware model.Some this internal nodes distribute corresponding to the input pin of user's design.User's design has other internal nodes, cannot distribute via pin under their normal conditions and visit, but these non-pin distribution internal nodes can be realized other debugging purpose, for the deviser that need excitation be applied to various internal nodes in user's design provides dirigibility, no matter whether they are that the input pin distributes.For the excitation of the meticulous hardware model that is applied to user's design by external interface, data input logic and those are related corresponding to the internal node that the input pin distributes.For instance, if user's design is CRTC 6845 Video Controllers, some input pins distributions may be as described below so:
LPSTB-a light pen strobe pin (a light pen strobe pin)
~RESET-low level signal to reset the 6845 controller (low level signal is with 6845 controllers that reset)
RS-register select (register selection)
E-enable (startup)
CLK-clock (clock)
~CS-Chip select (chip selection)
In this Video Controller, it also is available that other input pins distribute.Based on the number that the input pin that is connected with the external world distributes, the number of node determined, and therefore the number of latch and the number of indicator also can be determined.Be configured in some hardware models in the RCC hardware array, have, for instance, 30 separate latch, they are relevant with each GLB_PTR0, GLB_PTR1, GLB_PTR2, GLB_PTR3, S2H-PTR0 and S2H_PTR1, form add up to 180 latch (=30X6).In other design, can use if necessary more overall indicator for example GLB_PTR4 to GLB_PTR30.Similarly, more S2H indicators for example S2H_PTR2 also can use as required to S2H_PTR30.The latch of these indicators and their correspondences is based on the demand of the hardware model of each user's design.
Get back to Figure 70 and 72, have only when latch is started with suitable overall indicator or S2H indicator signal, the data on the FD bus line just manage to enter these internal nodes.Otherwise these internal nodes can not get the driving of any data on the FD bus.In the preceding semiperiod in CPU_IN=1 sequential cycle, when F_WR was in logic " 1 ", GLB_PTR0 was in logic " 1 ", so that the data that drive on the FD1 via circuit 2247 arrive corresponding internal node.If exist to rely on other latchs that GLB_PTR0 starts, these latchs also can latch data to the internal node of their correspondences.In back the semiperiod in CPU_IN=1 sequential cycle, F_WR enters logic " 1 " again, triggers GLB_PTR1 and rises to logic " 1 ".Data on this driving FD6 are to the internal node that is connected to circuit 2248.This also sends to the software clock signal on the circuit 2223, so that be latched device 2205 and start GLB_PTR1 signal latch on the circuit 2215 to circuit 2216.This software clock is transferred to the external clock input end of goal systems and other outside input-output apparatus.Because GLB_PTR0 and GLB_PTR1 only are used to the first in data input overall situation cycle, CPU_IN gets back to logic " 0 ", and this has just finished the transmission of global data from rcc computing system to RCC hardware array.
The second portion in data input overall situation cycle will be discussed now, and wherein the global data from external interface is transferred to RCC hardware array and external buffer.Equally, must be provided to hardware model and software model from various input pin distribution signals goal systems or outside input-output apparatus, that be drawn towards user's design.By using suitable indicator, these data can be transferred to hardware model, and are latched so that drive internal node.By at first they being stored in the external buffer 2201 so that fetch by rcc computing system after a while, and the internal state of update software model, these data also are transferred to software model.
CPU_IN is current to be in logic " 0 ", and EXT_IN is in logic " 1 ".Therefore, the three condition impact damper 2206 in the outside i/o controller 2200 is activated, so that data are placed on (for example bus line 2217 and 2218) on the pci bus circuit.These pci bus circuits also are connected to FD bus line 2219, so that be stored in the external buffer 2201.Be in preceding semiperiod in sequential cycle of logic " 1 " at the EXT_IN signal, GLB_PTR2 is in logic " 1 ".This will be latched in the data (via bus line 2217,2224 and local bus line 2228 (LD4)) on the FD4, and these data will be latched to the internal node in the hardware model that is connected on the circuit 2249.
The second half in sequential cycle that is in logic " 1 " at the EXT_IN signal is interim, and GLB_PTR3 is in logic " 1 ".This will latch the data (via bus line 2218,2225 and local bus line 2227 (LD6)) on the FD6, and these data will be latched to the internal node in the hardware model that is connected on the circuit 2250.
As indicated above, so that fetched by rcc computing system after a while and the internal state of update software model, these data from goal systems or some other outside input-output apparatus also are transferred to software model by at first they being stored in the impact damper 2201.These data on the bus line 2217 and 2218 are at FD bus FD[63:0] be provided for external buffer 2201 on 2219.The particular memory address that externally stores each data in the impact damper 2201 is provided to external buffer 2201 by memory address counter 2207 via bus 2220.In order to realize these storages, the WR_EXT_BUF signal is provided to external buffer 2201 via circuit 2221.Before externally impact damper 2201 was full of, rcc computing system will be read the content of external buffer 2201, so that software model is carried out suitable renewal.Any data that are transferred to the various internal nodes of hardware model in RCC hardware array may cause some internal states of hardware model to change.Because rcc computing system has the model of whole user's design in software, these internal state change in hardware model also should obtain reflection in software model.This has just finished the overall cycle of data inputs.
Now will be discussed the S2H cycle.The S2H cycle is used to from rcc computing system to classify each circuit board then according to the order of sequence as and to transmit that data from a chip to another chip to RCC hardware array test transmission platform data.The CPU_IN signal enters logic " 1 ", and the EXT_IN signal enters logic " 0 ", shows that data transmission is to carry out between rcc computing system and RCC hardware array.Do not relate to external interface.The CPU_IN signal also starts three condition impact damper 2202, so that allow data to enter inner i/o controller 2203 from local bus 2222.
The place that begins in the CPU_IN=1 sequential cycle, S2H_PTR0 enters logic " 1 ", it can latch data on the FD5 (via local bus 2222, local bus 2229, bus line 2234, and FD bus 2239), these data will be latched to the internal node in the hardware model that is connected on the circuit 2251.In the second portion in CPU_IN=1 sequential cycle, S2H_PTR1 enters logic " 1 ", it can be latched in data on the FD7 (via local bus 2222, local bus 2230, bus line 2235, and FD bus 2240), these data will be latched to the internal node in the hardware model that is connected on the circuit 2252.During the follow-up data estimation, data from rcc computing system at first are transferred to chip m1, arrive chip 0_1 then (just, chip 0 on the circuit board 1), chip 1_1 (just, the chip 1 on the circuit board 1), to the last last chip on circuit board, chip 7_8. (chip 7 on the circuit board 8 just).As fruit chip m2 is available, and data are also by this chip of shift-in.
In this DTD, DATA_XSFR returns logic " O ".Note, be taken as global data from the data of external interface I/O, and during the overall situation cycle, obtain handling.This has just finished the discussion in data input control logic and data input cycle.
Data output
Data output control logic embodiment of the present invention is discussed now.Be responsible for handling the data that are transferred to rcc computing system and external interface from RCC hardware array according to the data output control logic of the embodiment of the invention.During responsing excitation (outside or other) deal with data, hardware model produces the data that certain export target application program or some input-output apparatus may need.These output datas may be data, address, control information or other application programs of essence or equipment may in self handles, need other for information about.These output datas that enter rcc computing system (it has the model of other outside input-output apparatus in software), goal systems or outside input-output apparatus are provided on the various internal nodes.As mentioned about the discussion of data input logic like that, some this internal nodes distribute corresponding to the output connecting pin of user's design.User's design has other internal node, they normally can not distribute via pin and visit, but the internal node that these non-pins distribute is at other debugging purpose, so that for the deviser provides dirigibility, they wish that the various internal nodes in user design read and analyze excitation, no matter whether they are that output connecting pin distributes.For from the meticulous hardware model of user design, be applied to for external interface or rcc computing system (it may at the model that other input-output apparatus are arranged in the software) must encourage data output logic and be related corresponding to those internal nodes of output leading foot.
For instance, if user's design is CRTC 6845 Video Controllers, some output connecting pins distribute may be as follows:
MA0-MA13-memory address (storage address)
D0-D7-data bus (data bus)
DE-display enable (show and start)
CURSOR-cursor position (cursor position)
VS-vertical synchronization (vertical synchronization)
HS-horizontal synchronization (horizontal synchronization)
Other output leading foots in this Video Controller also are available.Based on the number that the output connecting pin that is connected with the external world distributes, the number of the number of node and gate logic and indicator number also can obtain being easy to determine.Therefore, the output connecting pin distribution MA0-MA13 on the Video Controller provides storage address for video-ram.The VS output connecting pin is distributed as vertical synchronization signal is provided, and therefore causes a vertical flyback on the display.Output connecting pin distribution D0-D7 is eight terminals, and their formation BDB Bi-directional Data Bus are used for goal systems CPU and visit inner 6845 registers.These output connecting pins distribute corresponding to some internal node in the hardware model.Certainly, the number of these internal nodes and character are according to user's design and different.
Data from these output connecting pin distribution internal nodes must be provided to rcc computing system, because rcc computing system comprises a model of whole user's design in software, and any incident that takes place in hardware model all must be passed to software model, so that make corresponding variation.Like this, software model will have the information consistent with hardware model.In addition, rcc computing system has the device model of input-output apparatus, and user or deviser decision in software, rather than is attached to a real equipment one of port on the outside I/O extender with these device modelings.For instance, the user may make decision, think display or loudspeaker modelling inserted a real display than one of port on the I/O extender externally in software or loudspeaker easier and more effective.In addition, the data from these internal nodes in the hardware model must be provided to goal systems and any other outside input-output apparatus.In order to allow the data in these output connecting pin distribution internal nodes be transferred to rcc computing system and goal systems and other outside input-output apparatus, the data output control logic is provided in the collaborative check system according to an embodiment of the invention.
The data output control logic uses data output period, and this cycle relates to from RCC hardware array 2190 to rcc computing system 2141 and the data transmission of external interface (outside I/O extender 2139).In Figure 69, be used for externally that the steering logic of transmission data is present in each circuit board 2145-2149 between the interface (outside I/O extender 2139) and collaborative check system 2140.The major part of steering logic is present in the outside i/o controller 2152, but other parts are present in various inner i/o controllers (for instance, 2156 and 2158) and in the reconfigurable logic element (fpga chip 2159 and 2165 for instance).Equally, for illustrative purposes, in all circuit boards, only represent the some parts of this steering logic for all chips, rather than identical repetition logical organization.Collaborative check system 2140 parts of dotted line 2150 the insides comprise a subclass of steering logic among Figure 69.Go through this steering logic now with reference to Figure 71 and 73.Figure 71 for example understands the steering logic part that is used for data output period.Figure 73 for example understands the sequential chart of data output period.
A particular subset of data output control logic is displayed among Figure 71, and comprise outside i/o controller 2300, three condition impact damper 2301, inner i/o controller 2302, a reconfigurable logic element 2303, and the various buses and the operation circuit that allow data to transmit betwixt.This subclass has shown that being used for data output moves essential logic, and wherein the data from external interface and rcc computing system are transferred to RCC hardware array.The data output control logic of Figure 71 and the data output timing diagram of Figure 73 will obtain discussing together.
With the data two cycles type opposite in input cycle, data output period includes only the cycle of a type.The data output control logic need be arrived by sequential delivery from the data of RCC hardware model: (1) rcc computing system, arrive (2) rcc computing system and external interface (to goal systems and outside input-output apparatus) then.Particularly, data output period need the data from the hardware model internal node at first be transferred to rcc computing system in RCC hardware array, arrive rcc computing system then, secondly to the interior external interface of each chip, in each circuit board, one next chip, and next circuit board.
As the data input control logic, indicator will be used to internally node and select (or gate) data to rcc computing system and external interface.In an embodiment of Figure 71 and 73 illustrated, a data output indicator state machine 2319 produces five indicator H2S_PTR[4:0 on bus 2359], both be used for hardware to software data, also be used for hardware to the external interface data.Data output indicator state machine 2319 is by DATA_XSFR on the circuit 2358 and F_RD signal controlling.DATA_XSFR and F_RD signal that inner i/o controller 2302 produces on the circuit 2358.As long as when needing data transmission between RCC hardware array and rcc computing system or the external interface, DATA_XSFR always is in logic " 1 ".The F_RD signal is opposite with the F_WR signal, as long as need carry out read operation from RCC hardware array, it just is in logical one.If DATA_XSFR and F_RD signal all are in logic " 1 ", data output indicator state machine 2319 just can produce suitable H2S indicator signal in suitable programmed sequence.Other embodiment may use more indicator (or less indicator) because of the needs of user's design.
These H2S indicator signals are provided to a gate logic.One group of more directed AND gate 2314-2318 of input 2353-2357 of gate logic.Another group input 2348-2352 is connected to the internal node of hardware model.Therefore, AND gate 2314 has from the input 2348 of an internal node with from the input 2353 of H2S_PTR0; AND gate 2315 has from the input 2349 of an internal node with from the input 2354 of H2S_PTR1; AND gate 2316 has from the input 2350 of an internal node with from the input 2355 of H2S_PTR2; AND gate 2317 has from the input 235 of an internal node with from the input 2356 of H2S_PTR3; AND gate 2318 has from the input 2352 of an internal node with from the input 2357 of H2S_PTR4.Do not have correct H2S_PTR indicator signal, internal node just can not be driven to rcc computing system or external interface.
The 2343-2347 of output separately of these AND gates 2314-2318 is connected to OR-gate 2310-2313.Therefore, AND gate output 2343 is connected to the input of OR-gate 2310; AND gate output 2344 is connected to the input of OR-gate 2311; AND gate output 2345 is connected to the input of OR-gate 2311; AND gate output 2346 is connected to the input of OR-gate 2312; AND gate output 2347 is connected to the input of OR-gate 2313.Notice that the output 2344 of AND gate 2315 is not connected to an OR-gate that exclusively enjoys; On the contrary, output 2344 is connected to the output 2345 that OR-gate 2311,2311 also is connected to AND gate 2316 simultaneously.Other input 2360-2366 of OR-gate 2310-2313 can be connected to the output (not shown) of other AND gate, and these AND gates oneself are connected to other internal nodes and H2S_PTR indicator.The hardware model that the use of these OR-gates and their specific input designs and is configured based on the user.Therefore, in other design, may use more indicator, and be connected to a different OR-gate from the output 2344 of AND gate 2315, rather than OR-gate 2311.
The output 2339-2342 of OR-gate 2310-2313 is connected to FD bus line FD0, FD3, FD1 and FD4.In this specific examples of user's design, have only four output connecting pin distribution signals will be transferred to rcc computing system and external interface.Therefore, FD0 is connected to the output of OR-gate 2310; FD3 is connected to the output of OR-gate 2311; FD1 is connected to the output of OR-gate 2312; FD4 is connected to the output of OR-gate 2313.These FD bus lines are connected to local bus circuit 2330-2333 via the internal wiring 2334-2338 in the inner i/o controller 2302.In this embodiment, local bus circuit 2330 is LD0, and local bus circuit 2331 is LD3, and local bus circuit 2332 is LD1, and local bus circuit 2333 is LD4.
In order to realize that the data transmission on these local bus circuits 2330-2333 is arrived rcc computing system, these local bus circuits are connected to three condition impact damper 2301.In its normal condition, three condition impact damper 2301 allows data to enter into local bus 2320 from local bus circuit 2330-2333.Contrast during data inputs, has only when the CPU_IN signal is provided to three condition impact damper 2301, and data just are allowed to pass through to RCC hardware array from rcc computing system.
In order to make the data on these local bus circuits 2330-2333 be transferred to external interface, provide circuit 2321-2324 here.Circuit 2321 is connected to some the latch (not shown)s in circuit 2330 and the outside i/o controller 2300; Circuit 2322 is connected to some the latch (not shown)s in circuit 2331 and the outside i/o controller 2300; Circuit 2323 is connected to the latch 2305 in circuit 2332 and the outside i/o controller 2300; Circuit 2324 is connected to the latch 2306 in a circuit 2333 and the outside I/O controller 2300.
These latchs 2305 and each output of 2306 all are connected to an impact damper, and then to external interface, the suitable output connecting pin that is connected to goal systems or outside input-output apparatus then distributes.Therefore, the output of latch 2305 is connected to impact damper 2307 and circuit 2327.Equally, the output of latch 2306 is connected to impact damper 2308 and circuit 2328.The another one output of another one latch (not shown) can be connected to circuit 2329.In this example, lead 1, lead 4 and the lead 3 of circuit 2327-2329 difference respective objects system or certain outside input-output apparatus.Finally, from the hardware model to the external interface, carry out between transmission period data, the hardware model of user's design obtains configuration, so that be connected to the lead 3 on the internal node corresponding line 2329 of circuit 2350, the internal node that is connected to circuit 2351 is corresponding to the lead on the circuit 2,327 1, and the internal node that is connected to circuit 2352 is corresponding to the lead on the circuit 2,328 4.Equally, lead 3 is corresponding to the LD3 on the circuit 2331, and lead 1 is corresponding to the LD1 on the circuit 2332, and lead 4 is corresponding to the LD4 on the circuit 2333.
A LUT table 2309 is connected to the startup input of these latchs 2305 and 2306.LUT table 2309 is controlled by the F_RD signal that triggers 2304 operations of LUT table address counter on the circuit 2367.At each counter-increments place, indicator starts a specific row in LUT table 2309.If one (or bit) in this particular row is in logic " 1 ", the LUT outlet line that is connected to that particular item in the LUT table 2309 will start its corresponding latch, and driving data enters external interface, and the required destination in final target approach system or certain the outside input-output apparatus.For instance, LUT outlet line 2325 is connected to the startup input of latch 2305, and LUT outlet line 2326 is connected to the startup input of latch 2306.
In this example, the capable 0-3 of LUT table 2309 is programmed the latch that starts corresponding to the internal node output connecting pin distribution wires among the chip m1.Equally, going 4-6 is programmed the latch that starts corresponding to the internal node output connecting pin distribution wires among the chip 0_1 (chip 0 in the circuit board 1 just).Be expert in 4, bit 3 is in logic " 1 ".Be expert in 5, bit 1 is in logic " 1 ".Be expert in 6, bit 4 is in logic " 1 ".Every other project or bit all are in logic " O ".Because a single output connecting pin distribution circuit can not drive multiple input-output apparatus, so, have only one to be in logic " 1 " for any given bit (or row) in the LUT table.In other words, an output connecting pin distribution internal node in the hardware model can only provide data for a uniline that is connected to external interface.
As indicated above, the data output control logic needs the data in each reconfigurable logic element in each chip in the RCC hardware model to be arrived by sequential delivery: (1) rcc computing system, arrive (2) rcc computing system and external interface (to goal systems and outside input-output apparatus) then.Rcc computing system needs these data, because it has the model of some input-output apparatus in software, and those are not used to the data of one of these modeled input-output apparatus, rcc computing system need be monitored them, so that the state of the hardware model in its internal state and the RCC hardware array is consistent.In this example of Figure 71 and 73 illustrated, have only seven internal nodes to be driven, be used to output to rcc computing system and external interface.Two nodes in those internal nodes are arranged in chip m1, and other five internal nodes are arranged in chip 0_1 (chip 0 in the circuit board 1 just).Certainly, for this specific user design, may be at other internal nodes in these chips and other chips, but Figure 71 and 73 will only illustrate this seven nodes.
During data transmission, the DATA_XSFR signal is in logic " 1 ".During this, local bus 2330-2333 will be used for the data sequential delivery from each chip in each circuit board in the RCC hardware array is arrived rcc computing system and external interface by collaborative check system.The operation of DATA_XSFR and F_RD signal control data output indicator state machine is so that produce suitable indicator signal H2S_PTR[4:0] the suitable gate circuit of output connecting pin distribution internal node.The F_RD signal is also controlled LUT table address counter 2304, so that the internal node data transmission is arrived external interface.
Internal node among the chip m1 will at first obtain handling.When F_RD was raised to logic " 1 " when data transfer cycle begins when, the H2S_PTR0 among the chip m1 entered logic " 1 ".This enters in the rcc computing system via data in those internal nodes among three condition impact damper 2301 and the local bus 2320 chip for driving m1, that depend on H2S_PTR0.2304 pairs of LUT tables 2309 of LUT table address counter row 0 are counted and are pointed to, so that the proper data of chip m1 is latched into external interface.When the F_RD signal enters logic " 1 " again, can be transferred to rcc computing system and external interface by the data that H2S_PTR1 drives in the internal node.H2S_PTR1 enters logic " 1 ", and corresponding to the 2nd F_RD signal, 2304 pairs of LUT tables 2309 of LUT table address counter row 1 are counted and pointed to, so that the proper data of chip m1 is latched into external interface.
Five internal nodes of reconfigurable logic element 2303 (just, chip 0_1, or the chip in the circuit board 1 0) will obtain handling now.In this example, the data from two internal nodes relevant with H2S_PTR0 and H2SPTR1 will only be transferred to rcc computing system.From with H2S_PTR2, the data of three internal nodes that H2S_PTR3 is relevant with H2S_PTR4 will be transferred to rcc computing system and external interface.
When F_RD was raised to logic " 1 ", the H2S_PTR0 in the chip 2303 entered logic " 1 ".The data that depend on H2S_PTR0 in the internal node in this chip for driving 2303 make it enter rcc computing system via three condition impact damper 2301 and local bus 2320.In this example, the internal node that is connected to circuit 2348 depends on the H2S_PTR0 on the circuit 2353.When the F_RD signal enters logic " 1 " again, can be transferred to rcc computing system by the data that H2S_PTR1 drives in the internal node.Here, the internal node that is connected on the circuit 2349 is affected.These data are driven the LD3 on circuit 2331 and 2322.
When the F_RD signal entered logic " 1 " again, H2S_PTR2 entered logic " 1 ", and the data in the internal node that is connected to circuit 2350 are provided on the LD3.These data are provided to rcc computing system and external interface.Three condition impact damper 2301 allows data transmission to local bus 2320, enters within the rcc computing system then.As for external interface, by starting the H2S_PTR2 signal, these data are driven to the LD3 on circuit 2331 and 2322.In response to the F_RD signal, the row 4 of LUT table address counter 2304 countings and sensing LUT table 2309 are so that be latched into the circuit 2329 (lead 3) that is positioned at external interface with suitable data from the internal node that this is connected on the circuit 2350.
When the F_RD signal entered logic " 1 " again, H2S_PTR3 entered logic " 1 ", and the data in the internal node that is connected to circuit 2351 are provided on the LD1.These data are provided to rcc computing system and external interface.Three condition impact damper 2301 allows data transmission to local bus 2320, enters within the rcc computing system then.As for external interface, by starting the H2S_PTR3 signal, these data are driven to the LD1 on circuit 2332 and 2323.In response to the F_RD signal, the row 5 of LUT table address counter 2304 countings and sensing LUT table 2309 are so that be latched into the circuit 2327 (lead 1) that is positioned at external interface with suitable data from the internal node that this is connected on the circuit 2351.
When the F_RD signal entered logic " 1 " again, H2S_PTR4 entered logic " 1 ", and the data in the internal node that is connected to circuit 2352 are provided on the LD4.These data are provided to rcc computing system and external interface.Three condition impact damper 2301 allows data transmission to local bus 2320, enters within the rcc computing system then.As for external interface,, be driven to LD4 on this online data road 2333 and 2324 by starting the H2S_PTR4 signal.To in response to the F_RD signal, LUT table address counter 2304 countings also point to the row 6 of LUT table 2309, so that suitable data are latched into the circuit 2328 (lead 4) that is positioned at external interface from the internal node that this is connected on the circuit 2352.
Data in the internal node of chip m1 at first are driven into rcc computing system to be proceeded on other chips in a sequential manner to this process of rcc computing system and external interface then.At first, the internal node of chip m1 is activated.Secondly, the internal node of chip 0_1 (chip 2303) is activated.Then, if chip 1_1 has any internal node, will be activated.This process is proceeded, and to the last last node in last chip in circuit board is driven.Therefore, as fruit chip 7_8 any internal node is arranged, it all will be activated.At last, as fruit chip m2 any internal node is arranged, it will be activated.
Though Figure 71 has shown the data output control logic that only is used for driving internal node in chip 2303, other chips also have internal node need be driven into rcc computing system and external interface.No matter the internal node number what, the data output logic enters rcc computing system with driving data from a chip internal node, then in another cycle, the internal node that drives in the identical chips not on the same group enters rcc computing system and external interface.The data output control logic continues to advance to next chip then, and moves identical two steps operation, promptly at first drives the data that are assigned to rcc computing system, drives the data that are assigned to external interface then and arrives rcc computing system and external interface.Even data will be used to external interface, rcc computing system also must be understood those data, because rcc computing system has a model of whole user design in software, this model must have with RCC hardware array in the consistent internal state information of hardware model information.
Circuit-board laying-out
The circuit layout of collaborative check system according to an embodiment of the invention is discussed now with reference to Figure 74.Circuit board is installed in the RCC hardware array.Circuit-board laying-out is similar to the layout that Fig. 8,36-44 illustrate and related text is described.
RCC hardware array comprises six circuit boards, in one embodiment.Circuit board m1 is connected to circuit board 1, and circuit board m2 is connected to circuit board 8.Being connected and being arranged in above of circuit board 1, circuit board 2, circuit board 3 and circuit board 8 obtains describing with reference to Fig. 8 and 36-44.
Circuit board m1 comprises chip m1.The interconnection structure that circuit board m1 is relevant to other circuit boards is convenient to the interconnection that the south of chip 0, chip 2, chip 4 and chip 6 that chip m1 is connected to circuit board 1 makes progress.Similarly situation is that circuit board m2 comprises chip m2.The interconnection structure that circuit board m2 is relevant to other circuit boards is convenient to the interconnection that the south of chip 0, chip 2, chip 4 and chip 6 that chip m2 is connected to circuit board 8 makes progress.
X. example
For the operation of one embodiment of the invention is described, the subscriber's line circuit design of a hypothesis will be used.In structural register TL translation level (RTL) HDL sign indicating number, exemplary subscriber's line circuit design is as follows:
module register(clock,reset,d,q)(; input clock,d,reset; outputq; reg q; always@(posedge clock or negedge reset)<dp n="d243"/> if(~reset) q=0; else q=d;endmodulemodule example; wire d1,d2,d3; wire q1,q2,q3; reg sigin; wire sigout; reg clk,reset; register reg1(clk,reset,d1,q1); register reg2(clk,reset,d2,q2); register reg3(clk,reset,d3,q3); assign d1=sigin^q3; assign d2=q1^q3; assign d3=q2^q3; assign sigout=q3; //a clock generator always begin clk=0;<dp n="d244"/> #5; clk=1; #5; end //a signal generator always begin #10; sigin=$random; end //initialization initial begin reset=0; sigin=0; #1; reset=1; #5; $monitor($time,″%b,%b,″sigin,sigout); #1000 $finish; end end module
This sign indicating number is reproduced in Figure 26.Be appreciated that the present invention, needn't understand the particular functionality details of this circuit design.Yet the reader should understand, and the user produces this HDL sign indicating number and is used for simulation so that design a circuit.Carry out some designed functions of user so that respond input signal by the circuit of this segment encode representative, and produce an output.
Figure 27 has shown the circuit diagram of the HDL sign indicating number of discussing with reference to Figure 26.Under most of situation, in fact the user may produce a circuit diagram of this character before representing with the HDL form.Some schematic diagrams are seized instrument and are allowed input n-lustrative circuit diagram, and after handling, these instruments produce spendable sign indicating number.
As shown in figure 28, simulation system execution unit type analysis.The original HDL coding that presents in Figure 26 as the representative of consumer specific circuit design has obtained analysis now.By " module register (clock, reset, d, q)))) " beginning, and be ends with " end module ", the step mark of going forward side by side yard is a register definitions section for numbering 900 initial several go.
A few down row sign indicating numbers, numbering 907 has been represented some wire interconnects information.Those skilled in the art should understand, and the lead variable among the HDL is used to the actual binding between the structural entity of typical example such as gate circuit and so on.Because HDL mainly is used to the modelling digital circuit, so the lead variable is essential variable.Usually, " q " (for instance, q1, q2, q3) represented the output lead circuit, and " d " (d1, d2, d3 for instance) represents the input lead circuit.
Numbering 908 has shown " sigin " as a test platform input.Numbering 909 has shown " sigout " as a test platform output.
Numbering 901 has shown register parts S1, S2 and S3.Numbering 902 has shown combiner S4, S5, S6 and S7.Notice that combiner S4-S7 has output variable d1, d2 and d3, they are inputs of register parts S1-S3.Numbering 903 has shown clock unit S8.
The row of the sign indicating number of following several series number has shown the test platform parts.Numbering 904 has shown test platform parts (driver) S9.Numbering 905 has shown test platform parts (initialization) S10 and S11.Numbering 906 has shown test platform parts (display) S12.
Following table has been summarized the unit type analysis:
S1 |
Register |
S2 |
Register |
S3 |
Register |
S4 |
Combiner |
S5 |
Combiner |
S6 |
Combiner |
S7 |
Combiner |
S8 |
Clock |
S9 |
Test platform (driver) |
S10 |
Test platform (initialization) |
S11 |
Test platform (initialization) |
S12 |
Test platform (monitor) |
Based on the unit type analysis, system is that entire circuit produces a software model, and is that register and combiner produce a hardware model.S1-S3 is the register parts, and S4-S7 is a combiner.These parts will be by modelling in hardware, so that allow the user of Analog Simulation System or in software, simulate entire circuit, and perhaps simulation and in hardware, carry out selectivity and quicken in software.No matter under any situation, the user can control simulation and hardware-accelerated pattern.In addition, the user can come artificial circuit with a goal systems, still keeps simultaneously software control, and circulation connects a loop start, stops, check the value and the value of asserting.
Figure 29 has shown a signal network analysis of same structural RTL level HDL sign indicating number.As shown in the figure, S8, S9, S10 and S11 are by modelling or be provided in the software.S9 is the test platform program in essence, and it produces the sigin signal, and S12 is test platform display process in essence, and it receives the sigout signal.In this example, S9 produce one at random sigin come the signal of mimic channel.Yet, register S1-S3 and combiner S4-S7 by modelling in hardware and software.
For the software/hardware border, system will be for being used to connect various retention signals (just, q1, q2, q3, CLK, sigin, sigout) the allocate memory space of software model and hardware model.Following table has been listed the distribution of storage space:
Signal | The memory address space |
q1 | REG |
q2 | REG |
q3 | REG |
clk | CLK |
sigm | S2H |
sigout | H2S |
Figure 30 divides the result for this illustrative circuit design has shown software/hardware.Figure 30 is the more attainable illustration figure that relevant software/hardware is divided.Software end 910 is connected to hardware end 912 by software/hardware border 911 and pci bus 913.
Software end 910 comprises software kernel, and by its control.Generally speaking, kernel is the major control loop of the whole Analog Simulation System operations of control.As long as any test platform program is activated, kernel is just estimated effective test platform parts, estimation clock unit, detecting clock edge so that upgrade RS, and propagates the combinational logic data, promotes simulated time.Even kernel resides in the software end, its some operations or statement also can move in hardware, because exist a hardware model to be used for those statements and operation.Therefore, software control software model and hardware model.
Software end 910 comprises the whole model of subscriber's line circuit, comprises S1-S12.Software/hardware boundary member in the software end comprises input/output (i/o) buffer or address space S2H, CLK, H2S, and REG.Notice that driver test platform program S9 is connected to the S2H storage space, display apparatus test platform program S12 is connected to the H2S storage space, and clock generator S8 is connected to the CLK storage space.The register S1-S3 of output signal q1-q3 will be assigned to the REG space.
Hardware model 912 has the model of combiner S4-S7, and it resides in pure hardware end.On the software/hardware border of hardware model 912, sigout, sigin, register output q1-q3 and software clock 916 are implemented.
Except the model of customization circuit design, system also produces software clock and address pointer.Software clock provides signal to start the input of register S1-S3.As indicated above, software clock according to the present invention has avoided race condition and holding time to upset problem.When major clock detected clock edge in software when, the detecting logic triggered the detecting logic of a correspondence in hardware.916 pairs of registers of clock edge register start input and produce an enabling signal in time, so that any data that reside in the register input end are carried out gate.
For the purpose of describing and being familiar with, also shown address pointer 914 here.Address pointer is in fact accomplished in each fpga chip, and allows data to be transferred to its destination in the mode of selectivity and succession.
Combiner S4-S7 also is connected to register parts S1-S3, sigin, and sigout.These signals on input/output bus 915 from or sent to pci bus 913.
Before mapping, layout and step line step, one completely hardware model be displayed among Figure 31, do not comprise address pointer.System also is not mapped to concrete chip to model.Register S1-S3 is connected to input/output bus and combiner S4-S6.Combiner S7 only is the output q3 of register S3.Sigin, sigout and software clock 920 are also by modelling.
In case hardware model has been determined, system just can then shine upon model, layout, and connects up within one or more chips.In fact this specific examples can realize on single Altera FLEX 10K chip, but the purpose in order to instruct, and this example need to suppose two chips to realize this hardware model.Figure 32 divides the result for this example has shown a concrete hardware model-chip of this example.
In Figure 32, complete model (except I/O and clock edge register) shows with the chip boundary of being represented by dotted line.This result was produced before last configuration file produces by the compiler of Analog Simulation System.Therefore, hardware model needs three leads at lead 921,922 and 923 at least between these two chips.For the pin/lead number that will need between these two chips (chip 1 and chip 2) minimizes, can produce another model-chip and divide, or use a multiplexing scheme.
Analyze this particular division result who shows among Figure 32, the lead number between these two chips can be reduced to two, and method is that sigin lead 923 is moved to chip 1 from chip 2.In fact, Figure 33 has shown that such one is divided the result.As if though only consider from the lead number, the particular division among Figure 33 is better than the division that shows among Figure 32, this example has been selected the hypothetical simulation analogue system division of Figure 32 after having carried out mapping, layout and wiring.The division result of Figure 32 will be used as the basis that produces configuration file.
Figure 34 has shown the logical patch operation for same hypothesis example, has wherein shown the final realization in two chips.This system uses the division result of Figure 32 to produce configuration file.Yet, for easy not explicit address indicator.Two fpga chips 930 and 940 have been shown among the figure.Chip 930 comprises, the part that the subscriber's line circuit design is divided, a TDM unit 931 (receiver end), software clock 932 and input/output bus 933, and miscellaneous part.Chip 940 comprises the part that the subscriber's line circuit design is divided, the TDM unit 941 of transmitting terminal, software clock 942 and input/output bus 943, and other parts.TDM unit 931 and 941 has obtained discussion with reference to Fig. 9 (A), 9 (B) and 9 (C).
These chips 930 and 940 have two interconnecting leads 944 and 945, and they link together hardware model.These two interconnecting leads are parts of the interconnection structure that shows among Fig. 8.With reference to Fig. 8, a such interconnection is the interconnection 611 between chip F32 and F33.In one embodiment, the lead of each interconnection/pin maximum number is 44.In Figure 34, modeled circuit only needs two lead/pins between chip 930 and 940.
These chips 930 and 940 are connected to group bus 950.Realize two chips because of needing only, so two chips all are arranged in identical group, or each chip belongs to a different group.Best method is, a chip is connected to a group bus, and another chip is connected to another group bus, so that guarantee to equal at the transmission quantity of FPGA interface the transmission quantity at pci interface place.
A preferred embodiment of the present invention has been described for illustration and illustrative purposes in the front.This explanation is not to be not omit, and is not the particular form that presents in order to limit the invention to yet.Obviously, those skilled in the art can clearly find many modifications and changes.Those skilled in the art will find easily that other application can be replaced application provided herein, and can not deviate from the spirit and scope of the present invention.Therefore, the present invention only should be subjected to the restriction of claims.