US4594655A - (k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions - Google Patents
(k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions Download PDFInfo
- Publication number
- US4594655A US4594655A US06/475,286 US47528683A US4594655A US 4594655 A US4594655 A US 4594655A US 47528683 A US47528683 A US 47528683A US 4594655 A US4594655 A US 4594655A
- Authority
- US
- United States
- Prior art keywords
- instruction
- data flow
- inputs
- instructions
- facility
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000012545 processing Methods 0.000 claims abstract description 34
- 230000001419 dependent effect Effects 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 9
- 230000003111 delayed effect Effects 0.000 claims description 3
- 210000000056 organ Anatomy 0.000 abstract description 16
- 230000001934 delay Effects 0.000 abstract 1
- 230000009471 action Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 239000000872 buffer Substances 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3853—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
Definitions
- This invention relates to stored program computers, and particularly to an improved (k)-instructions-at-a-time pipelined processor for parallel execution of inherently serial instructions by a specially equipped secondary data flow facility which optimizes the instruction processing capability of a pipelined processor by emulating the result of a prerequisite instruction.
- a cycle is the period of time required to complete one phase of the pipeline.
- pipelines have three actions:
- Each of these three actions may take one or more phases.
- An address generation interlock is the inability to compute the address of an operand needed by an instruction until the completion of a previous instruction.
- a data dependency is the ability to obtain an operand until the completion of a previous instruction.
- a facility lockout is the inability to use one or more organs of a processor until the completion of a previous instruction which requires the use of the critical organ or organs.
- a special subprocessor reviews the registers in the main processor for branch conditions and obtains the next (branch) address while the main processor is finishing routine processing.
- U.S. Pat. No. 3,932,845 Beriot, shows plural execution units having differing speeds, similar to Shimoi, with the difference that Beriot places the fast execution unit and the slow execution unit in parallel, and tries to fit in several short operations during the period taken by one long operation.
- Tulpule does not disclose any provision for handling dependent instructions, but rather discloses a standard pipeline in which operands of one instruction are read in parallel with the execution of the previous instruction.
- Tulpule identifies certain sequences of instructions which can benefit from a mode change from "forward operations" to "reverse operations," a sort of factoring operation to simplify the processing by restating the instruction in a different mode, and implements procedures to convert from forward to reverse mode by manipulating addresses.
- Tulpule makes special provision for handling reverse register to register instructions by exchanging address pairs within a given execution unit.
- a bypass mechanism provides that data can be bypassed to the address adder, permitting the address generation cycle to occur a cycle earlier. Initiation of the bypass function occurs when the register to be loaded is the same as required in the subsequent address generation. At the same time the returning data is being sent to the addressed general register, it will also be sent directly to the address adder for use.
- This bypass technique overcomes a facilities interlock and permits parallel execution of certain instructions otherwise requiring queuing because of facility needs--but there is no parallel execution of inherently serial instructions.
- Patel "Pipelines with Internal Buffers," The 5th Annular Symposium on Computer Architecture, Apr. 3-5, 1978, pp. 249-254, 78CH1284-9C 1979 IEEF, describes a pipelined computer with internal buffers and priority schemes to control queue lengths.
- Pomerene et al, SEQUENTIAL I-FETCHING MECHANISMS, IBM Technical Disclosure Bulletin, Vol. 25, No. 1, June 1982, pp. 124-125m shows a two-cycle putaway technique which provides for a better overlap by sharng facilities between two operations which are not required simultaneously. If the putaway requires only one cycle, then the next sequential instruction fetch requires only the second putaway cycle while the store operation requires the first putaway cycle. This two-cycle putaway permits the minimization of conflicts of appropriate types, but does not resolve conflicts by parallel execution of inherently serial instructions.
- Performance improvement is achieved by decoding and executing a limited set of groups of instructions simultaneously, even though some of the members of the groups of instructions are inherently serial. Since these groups previously required separate cycles, but are now treated as groups and executed simultaneously, performance improvement is achieved with a small increment in hardware over the hardware required for a one-instruction-at-a-time pipelined machine.
- An object of the invention is to provide instruction pipeline capability and (k)-instruction-at-a-time processing capability for sequences of instructions so as to circumvent a limited set of interdependency lockouts and thereby take advantage of the capability of the (k)-instruction-at-a-time pipelined processor, not only for unrelated instructions but for certain groups of related instructions which might normally be subject to lockout.
- n is the number of inputs to the minimum facility, and k is the data flow facility number.
- An advantage is that with a limited increment of facility hardware it is possible to provide a relatively large increment of throughput for programs having significant occurrences of instructions having adjacent-instruction data dependencies of specific kinds.
- FIG. 1 is a schematic diagram showing data flow through the pipelined processor, which includes primary execution unit and secondary execution unit.
- FIG. 2 illustrates control information for the pipelined processor of FIG. 1.
- FIG. 3 compares a representative sequence of pipelined three-phase (Staging; Execution; Putaway) instruction processing according to normal sequencing with a representative sequence according to this invention.
- FIG. 4 (4A-4F) is a detail chart of data handling during representative sequences according to the invention.
- FIG. 5 is a diagram of instruction gating according to the preferred embodiment of the invention. This is an expansion of Instruction Register and Interlock Detection and Resolution mechanism. (Reference character 12, FIG. 1.)
- This invention provides a central processing unit the capability to decode, execute and resolve some interdependencies between two instructions in a single cycle.
- Operations require at most two source registers and one sink register, except STORE instructions which use three source registers (two for address generation and one for data).
- the only non-register-to-register operations are LOAD, STORE and BRANCH instructions.
- the processor comprises two related but non-identical data flows.
- the first data flow (items 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 13, and 28, FIG. 1) contains all the functional organs of a three-stage pipelined processor; the first adder (reference character 2, FIG. 1) is a two-input adder.
- the second data flow (items 21-26, FIG. 1) has only a three-input adder (reference character 22, FIG. 1). This increased adder capacity, the ability to add an additional operand input, is essential to the invention.
- Processor control mechanism includes a second instruction decoder. With the three-input binary adder 22 in the secondary processor 21 and with additional staging registers 25-26, the processor can decode and execute two instructions simultaneously under control of interlock detection and resolution logic.
- path from (23) to (9) and the path from (3) to (9) are for sending addresses to the caches.
- the path for (7) to (8) is the path for storing data into the Data cache.
- the path from (9) to (12) is for bringing instructions from the instruction cache.
- the path from (8) to (10,11) is for bringing data from the Data cache.
- the primary execution unit (right portion of the data path) consists of those functional organs required in a conventional processor, specifically a set of general purpose registers 4, a set of data staging registers 5-6, and an arithmetic and logical unit (ALU1) 2.
- Alu1 contains a shifter and other functional organs which may be desirable from a performance standpoint.
- the secondary execution unit (left portion of FIG. 1) comprises a limited set of functional organs, including a second set of general purpose registers 24, three-input adder 22, additional ports to and from general purpose registers 24, input multiplexers and input staging registers 25-26, and output staging register 23.
- the secondary execution unit enables the pipelined processor to process an additional instruction, in addition to the primary instruction undergoing execution in the "conventional facilities" of the primary execution unit. Certain logical interlocks presented by the first instruction may be circumvented by the action of the secondary execution unit, as determined and controlled by the control mechanism.
- Instruction unit 12 is any pipelined processor instruction unit appropriate for the computer system, replicated for each of the (k) data flow paths.
- the function of an instruction unit in a pipelined processor is to accept an instruction in a sequence of instructions, to decode the instruction, to identify instruction interdependencies, and to identify to an execution control unit the execution actions appropriate to the instruction.
- the instruction unit detects different instruction groups and the execution control means issues different control signals upon the detection of the different instruction groups.
- execution is straightforward excluding provision for the interdependency.
- the instruction code is tested against the instruction code for the previous instruction to determine whether in fact an interdependency occurs.
- the instruction is in a group for which an interdependency always occurs, execution including provision for the interdependency is straightforward.
- Pipelined processor instruction units are shown in the prior United States Patents and publications described above. This prior art shows techniques for identifying and dealing with instruction interdependencies in pipelined computer systems. The patents show means for determining which instructions require special handling as a function of a previous instruction, and show control means for carrying out the special handling. FIGS. 2-5 show the requirements of instruction group identification to carry out the invention.
- the second instruction is not dependent upon the execution of the first instruction, and is capable of being executed in the limited facility provided by ALU2 with its three-input adder (such as LOAD REGISTER, STORE REGISTER, and ADD REGISTERS) then it may be executed concurrently with the first instruction. This is concurrent independent instruction execution.
- K set of instructions which require only an adder for their execution.
- J Set of instructions which update a register and require only the execution capability of ALU2.
- the instructions may be executed simultaneously. This is concurrent instruction execution of the simplest kind.
- Example 0 takes place in the preferred embodiment without taking advantage of the special capability of the invention.
- ALU1's operation can be duplicated in two of the three ports of ALU2, and when possible combined with another operand from the second instruction, and executed concurrently with the first instruction. This is concurrent instruction execution of a greater complexity, where the second instruction is dependent on replication of the completion of the first instruction.
- Control unit 12 decodes two instructions simultaneously and resolves the interlocks between them and any previous instructions in the processor pipeline which have not completed execution. Control unit 12 must first determine if the first instruction can execute. This consists of determining whether facilities required to execute the first instruction are available.
- Control unit 12 (FIG. 1) must then determine whether or not the second instruction can execute, by determining whether facilities required to execute the second instruction are available, after allocating facilities for use by the first instruction and any previous instruction still requiring facilities, and whether there is an instruction dependency on the first instruction for its results. Further details will follow, in connection with discussion of FIGS. 4A-4F.
- Execution control means are shown, for example, in U.S. Pat. Nos. 3,689,895, Kitamura, MICRO-PROGRAM CONTROL SYSTEM, Sept. 5, 1972 and 3,787,873, Watson et al, PIPELINED HIGH SPEED ARITHMETIC UNIT, Jan. 22, 1974.
- instruction unit and execution control means The choice of instruction unit and execution control means and the detailing of these organs of the computer are made by the system designer. Once these organs are defined, and the repertoire of instructions is determined, then the system designer determines which instructions are so subject to conflict that interlocks, data dependencies or facility lockouts are to be identified by the instruction unit and controlled by the control means. The system designer then provides hardware or software decoding to carry out the desired identification and control.
- FIGS. 2-4 illustrate the sequences followed.
- FIG. 2 shows the control information for a representative set of instructions.
- FIG. 3 shows the control information for a representative set of instructions.
- FIG. 3 illustrates the overlap potential.
- a simple pipeline operation is carried out with overlap of the Staging step of I2 with the Execution step of I1, and so forth.
- I1 and I2 are fully overlapped; they are executed simultaneously.
- I3 and I4 are similarly overlapped; they are executed simultaneously.
- I5 and I6 are determined to be fraught with facility lockouts or address generation interlocks; they are not executed simultaneously; I6 is delayed.
- FIG. 4 is a detail chart of data handling in the preferred embodiment.
- STAGE--Access source GPRs (general purpose registers). Copy any required immediate fields from I-REG into Staging Reg.
- PUTAWAY--Copy result into sink GPR.
- the result is a data address which is sent to the data cache.
- the result is an instruction address which is sent to the instruction cache.
- FIGS. 4A-4C illustrate the controls for the Staging Cycle. A number of differing situations are accommodated, including the situation at the bottom of FIG. 4C in which the instruction 1 result is replicated internally in order to carry out instruction 2, which requires the instruction 1 result as an operand.
- FIGS. 4D-4F illustrate the controls for the Execution Cycle.
- FIG. 4F illustrates the controls for the Putaway Cycle.
- FIG. 4A includes control information for a portion of the previous PUTAWAY CYCLE along with control information for the early part of the STAGING CYCLE appropriate to the invention.
- the previous PUTAWAY CYCLE (PORTION) is enclosed in a box in FIG. 4A.
- the PUTAWAY CYCLE ends with instructions loaded into Instruction Registers 13 and 14 of FIG. 1.
- FIG. 5 illustrates the mechanism for the pipelined processors instruction fetching in this embodiment.
- instruction cache 9 contains a sequence of instructions. These instructions are read out of instruction cache 9 two-at-a-time, via buffer 32, to instruction registers 13 and 14.
- the two instructions may be inherently independent, may be inherently sequential but susceptible to parallel execution by the pipelined processor, or may be subject to interlocks which require serial processing.
- Instruction registers 13-14 make the instructions available to instruction decoders, which provide instruction control signals to the various functional organs.
- the preferred embodiment is a pipelined processor with a parimary ALU (ALU1) having a full range of functional organs, while the secondary ALU (ALU2) has only a single functional organ (adder 22) of increased capability.
- ALU1 parimary ALU
- ALU2 secondary ALU
- a shifter for example, might be the functional organ of choice for emphasis.
- n is the number of inputs to the minimum facility
- k is the facility number in increasing order of complexity
- this invention With proper care in housekeeping, it is possible through this invention to have a number of data flow facilities each having capability of accepting more than the minimum number of inputs.
- two data flow facilities each having a three-input adder for example, it is possible to provide for simultaneous execution of instructions calling for a reference capability two-input adder in the primary data flow facility, and a three-input adder in the secondary data flow facility.
- the primary three-input adder has applied to it the same two operands as would have been applied to a two-input adder.
- wasteful of adder capability may be advantageous in standardized design and in flexibility. Designation of primary data flow facility and secondary data flow facility is arbitrary in such a variation.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
Equipping a secondary data flow facility with additional capability, to emulate for certain operations the simultaneous processing of the prerequisite instruction and the dependent instruction, significantly improves simultaneous pipeline processing of inherently sequential instructions (k)-at-a-time, by eliminating delays for calculating prerequisite operands. For example, Instruction A+B=Z1 followed by Instruction Z1+C=Z2 is inherently sequential, with A+B=Z1 the prerequisite instruction and Z1+C=Z2 the dependent instruction. The specially equipped secondary data flow facility does not wait for Z1, the apparent input operand from the prerequisite instruction; it simulates Z1 instead, performing A+B+C=Z2 in parallel with A+B=Z1. All data flow facilities need not be fully equipped for all instructions; the secondary data flow facility may be generally less massive than a primary data flow facility, but is more sophisticated in a critical organ, such as the adder. The three-input adder of the secondary data flow facility emulates the result of a two-input adder of a primary data flow facility, occuring simultaneously in the two-input primary data flow facility adder, adding the third operand to the emulated result, without delay. The instruction unit decodes the instruction sequence normally to control (k)-at-a-time execution where there are no instruction interlocks or dependencies; to delay execution of dependent instructions until operands become available; and to reinstate (k)-at-a-time execution in a limited number of cases by using the additional capability of the secondary data flow facility to emulate the prerequisite operands. A control unit performs housekeeping to execute the instructions.
Description
1. Field of the Invention
This invention relates to stored program computers, and particularly to an improved (k)-instructions-at-a-time pipelined processor for parallel execution of inherently serial instructions by a specially equipped secondary data flow facility which optimizes the instruction processing capability of a pipelined processor by emulating the result of a prerequisite instruction.
2. Description of the Prior Art
Computer processor designs have traditionally incorporated many refinements to achieve increased throughput. All processors accomplish the same basic result by following the sequence of steps:
(1) FETCH INSTRUCTION,
(2) DECODE INSTRUCTION,
(3) FETCH OPERANDS,
(4) EXECUTE INSTRUCTION,
(5) STORE RESULTS.
Many different approaches have been taken to accomplish the above steps at the greatest possible rate. A common approach is the "pipeline." Multiple instructions undergo various phases of the above steps sequentially, as required, but where possible, take place simultaneously insofar as there are no conflicts in demand for hardware or instruction dependencies. Processing instructions two at a time is an obvious desire, but providing economically for hardware demand and instruction dependencies can require expensive replication of hardware and complex control supervision.
A cycle is the period of time required to complete one phase of the pipeline. Commonly, pipelines have three actions:
(1) Staging;
(2) Execution;
(3) Putaway.
Each of these three actions may take one or more phases.
The theoretical limitation on performance for serial pipelined processors is the completion of one instruction per cycle, overlapping Staging, Execution and Putaway for a set of instructions. This is seldom achieved due to instruction interdependencies. The most important interdependencies are:
(1) address generation interlocks;
(2) data dependencies; and
(3) facility lockouts.
An address generation interlock is the inability to compute the address of an operand needed by an instruction until the completion of a previous instruction. A data dependency is the ability to obtain an operand until the completion of a previous instruction. A facility lockout is the inability to use one or more organs of a processor until the completion of a previous instruction which requires the use of the critical organ or organs.
The following representative Patents and Publications demonstrate the context of the prior art:
U.S. Pat. No. 3,689,895, Kitamura, MICRO-PROGRAM CONTROL SYSTEM, Sept. 5, 1972, shows a parallel pipeline architecture in which a single micro-program memory is time-division multiplexed among plural arithmetic units.
U.S. Pat. No. 3,787,673, Watson et al, PIPELINED HIGH SPEED ARITHMETIC UNIT, Jan. 22, 1974, shows an array of computational organs arrayed for individual access so as to have simultaneous execution of arithmetic steps within the arithmetic unit as well as simultaneous execution of instructions in the instruction processing pipeline.
U.S. Pat. No. 3,840,861, Amdahl et al, DATA PROCESSING SYSTEM HAVING AN INSTRUCTION PIPELINE FOR CONCURRENTLY PROCESSING A PLURALITY OF INSTRUCTIONS, Oct. 8, 1974, shows an architecture for a two-cycle, time-offset instruction pipeline to match instructions which use two storage access cycles per execution.
U.S. Pat. No. 3,928,857, Carter et al, INSTRUCTION FETCH APPARATUS WITH COMBINED LOOK-AHEAD AND LOOK-BEHIND CAPABILITY, Dec. 23, 1975, shows an instruction pipeline with a multi-word instruction buffer deployed in anticipation of programming loops.
U.S. Pat. No. 3,949,379, Ball, PIPELINE DATA PROCESSING APPARATUS WITH HIGH SPEED SLAVE STORE, Apr. 6, 1976, shows a pipeline processor with provision to hold an address until data becomes available to store in that address.
U.S. Pat. No. 3,969,702, Tessera, ELECTRONIC COMPUTER WITH INDEPENDENT FUNCTIONAL NETWORKS FOR SIMULTANEOUSLY CARRYING OUT DIFFERENT OPERATIONS ON THE SAME DATA, July 13, 1976, shows a computer architecture with a group of differing functional units arrayed along a bus.
U.S. Pat. No. 4,057,846, Cockerill et al, BUS STEERING STRUCTURE FOR LOW COST PIPELINED PROCESSOR SYSTEM, Nov. 8, 1977, shows housekeeping for steering data along unidirectional busses with overlap of input an output functions.
U.S. Pat. No. 4,062,058, Haynes, NEXT ADDRESS SUBPROCESSOR, Dec. 6, 1977, shows a method for processing a special class of programs wherein determination of the next instruction occurs simultaneously with execution of a preceding set of instructions without the delay inherent in performance of the intervening branch conditions. A special subprocessor reviews the registers in the main processor for branch conditions and obtains the next (branch) address while the main processor is finishing routine processing.
U.S. Pat. No. 3,932,845, Beriot, shows plural execution units having differing speeds, similar to Shimoi, with the difference that Beriot places the fast execution unit and the slow execution unit in parallel, and tries to fit in several short operations during the period taken by one long operation.
U.S. Pat. No. 4,085,450, Tulpule, PERFORMANCE INVARIENT EXECUTION UNIT FOR NON-COMMUNICATIVE INSTRUCTIONS, Apr. 18, 1978, shows a pipeline technique for multiplexing, to three execution units, instructions which are subjected to a mode change if the sequence fits a criterion. Tulpule does not disclose any provision for handling dependent instructions, but rather discloses a standard pipeline in which operands of one instruction are read in parallel with the execution of the previous instruction. Tulpule identifies certain sequences of instructions which can benefit from a mode change from "forward operations" to "reverse operations," a sort of factoring operation to simplify the processing by restating the instruction in a different mode, and implements procedures to convert from forward to reverse mode by manipulating addresses. Tulpule makes special provision for handling reverse register to register instructions by exchanging address pairs within a given execution unit.
U.S. Pat. No. 4,152,763, Shimoi, CONTROL SYSTEM FOR CENTRAL PROCESSING UNIT WITH PLURAL EXECUTION UNITS, May 1, 1979, show plural small, fast, special purpose execution units for certain common instructions, with a backup shaped execution unit for other instructions. This is a parallel pipeline for the favored instructions, with serial backup for other instructions not favored. Shimoi does not deal with inherently sequential instructions.
U.S. Pat. No. 4,365,311, Fukunaga et al, CONTROL OF INSTRUCTION PIPELINE IN DATA PROCESSING SYSTEM, Dec. 21, 1982, shows an architecture for performing instruction processing by segments of instructions in parallel, with individual clocks which vary depending upon conditions.
Agerwala et al, ELIMINATING THE OVERHEAD OF FLOATING POINT LOAD AND STORE INSTRUCTIONS BY DECODING TWO INSTRUCTIONS PER CYCLE IN THE FLOATING POINT UNIT, IBM Technical Disclosure Bulletin, Vol. 25, No. 1, June 1982, pp 126-129, shows a floating point arithmetic unit which two instructions are decoded simultaneously and during the short loops of floating point executions data flows along separate paths simultaneously. The goal is to overlap loads and stores with arithmetic operations. There is no parallel execution of inherently serial instructions.
Hardin, VARIABLE I-FETCH, IBM Technical Disclosure Bulletin, Vol. 20, No. 7, December 1977, pp. 2547-2548, shows a technique for fetching the next instruction at a variable time depending on the availability of storage cycles.
Irwin, "A Pipelined Processing Unit for On-Line Division," the 5th Annular Symposium on Computer Architecture, Apr. 3-5, 1978, pp. 24-30, 78CH1284-9C 1979 IEEE, describes a procedure for designing a pipelined computer.
Irwin and Heller, "Online Pipeline Systems for Recursive Numeric Computations," The 7th Annular Symposium on Computer Architecture, May 6-8, 1980, pp. 292-299, CH1494-4/80/0000-0292 1979 IEEE, describes a pipeline system for recursive numeric computations such as are required in double precision division, and uses a multi-input redundant adder in a segment processing function to build up a full precision result.
Lang et al, "A Modeling Approach and Design Tool for Pipelined Central Processors," The 6th Annular Symposium on Computer Architecture, Apr. 23-25, 1979, pp. 122-129, CH1394-6/79/0000-0122 1979 IEEE, describes a procedure for designing and implementing a control unit for a pipelined computer.
Liptay et al, LOAD BYPASS FOR ADDRESS ARITHMETIC, IBM Technical Disclosure Bulletin, Vol. 20, No. 9, February 1978, pp. 3606-3607, shows a pipelined computer in which the operand address generation process may be dependent upon the results of a subsequent instruction that has been decoded but not yet executed. A bypass mechanism provides that data can be bypassed to the address adder, permitting the address generation cycle to occur a cycle earlier. Initiation of the bypass function occurs when the register to be loaded is the same as required in the subsequent address generation. At the same time the returning data is being sent to the addressed general register, it will also be sent directly to the address adder for use. This bypass technique overcomes a facilities interlock and permits parallel execution of certain instructions otherwise requiring queuing because of facility needs--but there is no parallel execution of inherently serial instructions.
Owens et al, "On-Line Algorithms for the Design of Pipeline Architectures, The 6th Annual Symposium on Computer Architecture, Apr. 23-25, 1979, pp. 12-19, CH1394-6/79/0000-0012 1979 IEEE, describes a procedure for designing and implementing a control unit for a pipelined computer.
Patel, "Pipelines with Internal Buffers," The 5th Annular Symposium on Computer Architecture, Apr. 3-5, 1978, pp. 249-254, 78CH1284-9C 1979 IEEF, describes a pipelined computer with internal buffers and priority schemes to control queue lengths.
Pomerene et al, SEQUENTIAL I-FETCHING MECHANISMS, IBM Technical Disclosure Bulletin, Vol. 25, No. 1, June 1982, pp. 124-125m shows a two-cycle putaway technique which provides for a better overlap by sharng facilities between two operations which are not required simultaneously. If the putaway requires only one cycle, then the next sequential instruction fetch requires only the second putaway cycle while the store operation requires the first putaway cycle. This two-cycle putaway permits the minimization of conflicts of appropriate types, but does not resolve conflicts by parallel execution of inherently serial instructions.
Sofer et al, PARALLEL PIPELINE ORGANIZATION OF EXECUTION UNIT, IBM Technical Disclosure Bulletin, Vol. 14, No. 10, March 1972, pp. 2930-2933, uses a pre-shifter and a post-shifter with the main adder in the mainstream and has a multiplier in a bypass stream connecting at the input to the main adder. A time consuming multiply or divide operation can be carried out by the multiplier while other operations are passing through the main-stream. The mainstream execution loop requires only four cycles. Five cycles are required to complete the execution of mainstream instructions; results are available on the result bus one cycle earlier and may be "fast forwarded" for use as an operand in a subsequent instruction. This fast forward technique saves one cycle out of five.
This prior art establishes a context of pipelined computers, including parallel pipelined computers, but does not teach parallel execution in a parallel pipelined computer of inherently serial instructions.
This is a limited (k)-instruction-at-a-time data processor which is able to circumvent some of the interlocks implicit in attempting to execute plural instructions simultaneously. Performance improvement is achieved by decoding and executing a limited set of groups of instructions simultaneously, even though some of the members of the groups of instructions are inherently serial. Since these groups previously required separate cycles, but are now treated as groups and executed simultaneously, performance improvement is achieved with a small increment in hardware over the hardware required for a one-instruction-at-a-time pipelined machine.
An object of the invention is to provide instruction pipeline capability and (k)-instruction-at-a-time processing capability for sequences of instructions so as to circumvent a limited set of interdependency lockouts and thereby take advantage of the capability of the (k)-instruction-at-a-time pipelined processor, not only for unrelated instructions but for certain groups of related instructions which might normally be subject to lockout.
If execution requires n operands and a one-at-a-time machine has an n operator facility, then by the addition of an (n)+(n-1) execution facility two instructions may be executed simultaneously, even though the second instruction may require the results of executing the first instruction.
This can be generalized to doing k-at-a-time operations where successive facilities have [n], [(n)+(n-1)], [(n)+(n-1)+(n-1)], [(n)+(n-1)+(n-1)+(n-1)]. . . inputs to the facility.
This simplifies to:
n+(k-1)(n-1) where n is the number of inputs to the minimum facility, and k is the data flow facility number.
An advantage is that with a limited increment of facility hardware it is possible to provide a relatively large increment of throughput for programs having significant occurrences of instructions having adjacent-instruction data dependencies of specific kinds.
FIG. 1 is a schematic diagram showing data flow through the pipelined processor, which includes primary execution unit and secondary execution unit.
FIG. 2 illustrates control information for the pipelined processor of FIG. 1.
FIG. 3 compares a representative sequence of pipelined three-phase (Staging; Execution; Putaway) instruction processing according to normal sequencing with a representative sequence according to this invention.
FIG. 4 (4A-4F) is a detail chart of data handling during representative sequences according to the invention.
FIG. 5 is a diagram of instruction gating according to the preferred embodiment of the invention. This is an expansion of Instruction Register and Interlock Detection and Resolution mechanism. (Reference character 12, FIG. 1.)
This invention provides a central processing unit the capability to decode, execute and resolve some interdependencies between two instructions in a single cycle.
For the purpose of the preferred embodiment we make the following assumptions:
1. Operations require at most two source registers and one sink register, except STORE instructions which use three source registers (two for address generation and one for data).
2. All instructions are of one length with fixed fields and each instruction fetch brings two instructions.
3. The only non-register-to-register operations are LOAD, STORE and BRANCH instructions.
4. Independent and distinct instruction and data caches exist.
5. No instruction requires more than one cycle in an ALU.
The processor comprises two related but non-identical data flows. The first data flow ( items 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 13, and 28, FIG. 1) contains all the functional organs of a three-stage pipelined processor; the first adder (reference character 2, FIG. 1) is a two-input adder. The second data flow (items 21-26, FIG. 1) has only a three-input adder (reference character 22, FIG. 1). This increased adder capacity, the ability to add an additional operand input, is essential to the invention. Processor control mechanism includes a second instruction decoder. With the three-input binary adder 22 in the secondary processor 21 and with additional staging registers 25-26, the processor can decode and execute two instructions simultaneously under control of interlock detection and resolution logic.
Note the path from (23) to (9) and the path from (3) to (9) are for sending addresses to the caches. The path for (7) to (8) is the path for storing data into the Data cache. The path from (9) to (12) is for bringing instructions from the instruction cache. The path from (8) to (10,11) is for bringing data from the Data cache.
Note the content of 24 and 4 are identical.
An overview of the pipelined processor data flows is shown in FIG. 1. The primary execution unit (right portion of the data path) consists of those functional organs required in a conventional processor, specifically a set of general purpose registers 4, a set of data staging registers 5-6, and an arithmetic and logical unit (ALU1) 2. Alu1 contains a shifter and other functional organs which may be desirable from a performance standpoint.
The secondary execution unit (left portion of FIG. 1) comprises a limited set of functional organs, including a second set of general purpose registers 24, three-input adder 22, additional ports to and from general purpose registers 24, input multiplexers and input staging registers 25-26, and output staging register 23. The secondary execution unit enables the pipelined processor to process an additional instruction, in addition to the primary instruction undergoing execution in the "conventional facilities" of the primary execution unit. Certain logical interlocks presented by the first instruction may be circumvented by the action of the secondary execution unit, as determined and controlled by the control mechanism.
Instruction unit 12 (FIG. 1) is any pipelined processor instruction unit appropriate for the computer system, replicated for each of the (k) data flow paths. The function of an instruction unit in a pipelined processor is to accept an instruction in a sequence of instructions, to decode the instruction, to identify instruction interdependencies, and to identify to an execution control unit the execution actions appropriate to the instruction.
The instruction unit detects different instruction groups and the execution control means issues different control signals upon the detection of the different instruction groups. Where the current instruction is in a group for which there can be no interdependency, execution is straightforward excluding provision for the interdependency. Where the instruction is in a group for which an interdependency might or might not occur, the instruction code is tested against the instruction code for the previous instruction to determine whether in fact an interdependency occurs. Where the instruction is in a group for which an interdependency always occurs, execution including provision for the interdependency is straightforward.
Pipelined processor instruction units are shown in the prior United States Patents and publications described above. This prior art shows techniques for identifying and dealing with instruction interdependencies in pipelined computer systems. The patents show means for determining which instructions require special handling as a function of a previous instruction, and show control means for carrying out the special handling. FIGS. 2-5 show the requirements of instruction group identification to carry out the invention.
If the second instruction is not dependent upon the execution of the first instruction, and is capable of being executed in the limited facility provided by ALU2 with its three-input adder (such as LOAD REGISTER, STORE REGISTER, and ADD REGISTERS) then it may be executed concurrently with the first instruction. This is concurrent independent instruction execution.
K=set of instructions which require only an adder for their execution.
In this embodiment this set includes:
LOADS (The adder is used for address generation.)
STORES (The adder is used for address generation.)
ADD REGISTERS
SUBTRACT REGISTERS
UPDATE ADDRESS AND LOAD
UPDATE ADDRESS AND STORE
J=Set of instructions which update a register and require only the execution capability of ALU2.
In this embodiment this set includes:
ADD REGISTERS
SUBTRACT REGISTERS
UPDATE ADDRESS AND LOAD
UPDATE ADDRESS AND STORE.
If the first instruction is an ADD REGISTER and the second instruction is also an ADD REGISTER, and the two instructions share no registers, then the instructions may be executed simultaneously. This is concurrent instruction execution of the simplest kind.
______________________________________ Ex. 0 Two independent ADD REGISTERS ______________________________________ inst. 1 R3 ← R1 + R2 inst. 2 R4 ← R5 + R6 ALU1 (Inst 1) Adder (2, FIG. 1) ALU2 (Inst 2) Adder (22, FIG. 1) ______________________________________
Execution of Example 0 takes place in the preferred embodiment without taking advantage of the special capability of the invention.
If the first instruction is an ADD REGISTER, and the second instruction is dependent upon the result, then ALU1's operation can be duplicated in two of the three ports of ALU2, and when possible combined with another operand from the second instruction, and executed concurrently with the first instruction. This is concurrent instruction execution of a greater complexity, where the second instruction is dependent on replication of the completion of the first instruction.
Examples where simultaneous execution is allowed:
______________________________________ Ex. 1 Two dependent addsinst 1 R3 ← R1 +R2 inst 2 R5 ← R3 + R4 ALU1 (Inst 1) adder (2 FIG. 1) R1 + R2 ALU2 (Inst 2) adder (22 FIG. 1) R1 + R2 + R4 (Overcomes Data Dependency) Ex. 2 Add followed by dependent load: inst I1 R3 ← R1 + R2 inst I2 Memory ← R3 + R4 (address generation) ALU1 (Inst 1) R1 + R2 ALU2 (Inst 2) R1 + R2 + R4 (Overcomes Address Generation Interlock) Ex. 3 Two independent loads: inst I1 Memory ← R1 + R2 inst I2 Memory ← R3 + R4 I1 adder R1 + R2 I2 adder R3 + R4 (Independent Group) ______________________________________
Examples where simultaneous execution is not allowed:
______________________________________ Ex. 4 Shift followed by a dependent add: inst I1 R3 ← R1 (Shifted) inst I2 R4 ← R3 + R2 I1 shifter R1 (Unresolvable Interlock) Ex. 5 Add followed by a shift: inst I1 R3 ← R2 + R1 inst I2 R5 ← R4 (Shifted) I1 adder R2 + R1 (Shifter Path Not Available to I2) ______________________________________
Control unit 12 (FIG. 1) decodes two instructions simultaneously and resolves the interlocks between them and any previous instructions in the processor pipeline which have not completed execution. Control unit 12 must first determine if the first instruction can execute. This consists of determining whether facilities required to execute the first instruction are available.
Control unit 12 (FIG. 1) must then determine whether or not the second instruction can execute, by determining whether facilities required to execute the second instruction are available, after allocating facilities for use by the first instruction and any previous instruction still requiring facilities, and whether there is an instruction dependency on the first instruction for its results. Further details will follow, in connection with discussion of FIGS. 4A-4F.
While it would be possible to treat the first instruction and second instruction as co-equals vying for facilities, in practice the instruction stream is sequenced so that the earlier instruction normally takes precedence, and in this preferred embodiment this convention is followed.
Execution control means are shown, for example, in U.S. Pat. Nos. 3,689,895, Kitamura, MICRO-PROGRAM CONTROL SYSTEM, Sept. 5, 1972 and 3,787,873, Watson et al, PIPELINED HIGH SPEED ARITHMETIC UNIT, Jan. 22, 1974.
The choice of instruction unit and execution control means and the detailing of these organs of the computer are made by the system designer. Once these organs are defined, and the repertoire of instructions is determined, then the system designer determines which instructions are so subject to conflict that interlocks, data dependencies or facility lockouts are to be identified by the instruction unit and controlled by the control means. The system designer then provides hardware or software decoding to carry out the desired identification and control.
FIGS. 2-4 illustrate the sequences followed. FIG. 2 shows the control information for a representative set of instructions. In standard fashion for a pipelined processor, there are three steps to a representative instruction processing sequence, as follows:
1. Staging;
2. Execution; and
3. Putaway.
These three steps may be overlapped for sequentially adjacent instructions which do not present dependencies.
The following definitions in FIG. 2 apply also to the other Figures:
OP=operation code
SK1=sink register for instruction 1
SK2=sink register for instruction 2
S1A, S1B=source registers for instruction 1
S2A, S2B=source registers for instruction 2
$=signal used for control purposes
FIG. 3 illustrates the overlap potential. In the upper grouping, a simple pipeline operation is carried out with overlap of the Staging step of I2 with the Execution step of I1, and so forth.
In the lower grouping of FIG. 3, the more sophisticated overlaps take place as made available by this invention:
I1 and I2 are fully overlapped; they are executed simultaneously.
I3 and I4 are similarly overlapped; they are executed simultaneously.
I5 and I6 are determined to be fraught with facility lockouts or address generation interlocks; they are not executed simultaneously; I6 is delayed.
FIG. 4 is a detail chart of data handling in the preferred embodiment. There is a three-stage pipeline, with Staging Cycle, Execution Cycle and Putaway Cycle. Normally the actions taken during these cycles are the following:
STAGE--Access source GPRs (general purpose registers). Copy any required immediate fields from I-REG into Staging Reg.
EXECUTE--Execute operation on staged data and hold result in Result Reg. For store instruction access GPR containing data to be stored and copy data into Store Raging Reg.
PUTAWAY--Copy result into sink GPR. For load and store instructions the result is a data address which is sent to the data cache. For Branch instructions the result is an instruction address which is sent to the instruction cache.
FIGS. 4A-4C illustrate the controls for the Staging Cycle. A number of differing situations are accommodated, including the situation at the bottom of FIG. 4C in which the instruction 1 result is replicated internally in order to carry out instruction 2, which requires the instruction 1 result as an operand.
FIGS. 4D-4F illustrate the controls for the Execution Cycle.
FIG. 4F illustrates the controls for the Putaway Cycle.
FIG. 4A includes control information for a portion of the previous PUTAWAY CYCLE along with control information for the early part of the STAGING CYCLE appropriate to the invention. The previous PUTAWAY CYCLE (PORTION) is enclosed in a box in FIG. 4A. The PUTAWAY CYCLE ends with instructions loaded into Instruction Registers 13 and 14 of FIG. 1.
As the STAGING CYCLE begins, controls call for "Access source regs for OP1" which accesses OP1 information from GPR's 4 in FIG. 1. Data is set into the staging registers S1 and S2 (5,6 in FIG. 1). In the appropriate situation (a Store instruction) store data is accessed from GPR's 24 (FIG. 1) for the secondary data flow.
FIG. 5 illustrates the mechanism for the pipelined processors instruction fetching in this embodiment. During operation, instruction cache 9 contains a sequence of instructions. These instructions are read out of instruction cache 9 two-at-a-time, via buffer 32, to instruction registers 13 and 14.
The two instructions may be inherently independent, may be inherently sequential but susceptible to parallel execution by the pipelined processor, or may be subject to interlocks which require serial processing.
Instruction registers 13-14 make the instructions available to instruction decoders, which provide instruction control signals to the various functional organs.
The preferred embodiment is a pipelined processor with a parimary ALU (ALU1) having a full range of functional organs, while the secondary ALU (ALU2) has only a single functional organ (adder 22) of increased capability.
Other choices of functional organ with increased capability can be made. A shifter, for example, might be the functional organ of choice for emphasis.
It is also possible to extend the invention to three-at-a-time ir (k)-at-a-time processing, by providing an extrapolated control circuit and (k) data flow facilities, where successive facilities have [n], [(n)+(n-1)], [(n)+(n-1)+(n-1)], [(n)+(n-1)+(n-1)+(n-1)]. . . inputs to the facility. Simplified, the number of inputs to a given facility is:
[(n)+(k-1)(n-1)] where n is the number of inputs to the minimum facility, and k is the facility number in increasing order of complexity.
With proper care in housekeeping, it is possible through this invention to have a number of data flow facilities each having capability of accepting more than the minimum number of inputs. With two data flow facilities each having a three-input adder, for example, it is possible to provide for simultaneous execution of instructions calling for a reference capability two-input adder in the primary data flow facility, and a three-input adder in the secondary data flow facility. The primary three-input adder has applied to it the same two operands as would have been applied to a two-input adder. Thus, while wasteful of adder capability, may be advantageous in standardized design and in flexibility. Designation of primary data flow facility and secondary data flow facility is arbitrary in such a variation.
Claims (4)
1. A pipelined processor comprising
(a) a pipeline instruction control unit;
(b) at least one primary data flow facility having a plurality of inputs and having a subassembly processing a finite number (n) of said inputs in parallel, to process a stream of instructions defined by the instruction control unit;
(c) one or more secondary data flow facilities having a plurality of inputs and each said data flow facility having a subassembly processing a greater number (n)+(k-1)(n-1) of said inputs than the finite number of inputs being processed by said primary data flow facility, said secondary data flow facility comprising at least one comparable subassembly similar in function to said subassembly in said primary data flow facility but with a greater number of inputs, whereby said secondary data flow facility can process all the inputs required to emulate the apparent result of processing by said comparable subassembly of the primary data flow facility and process said apparent result together with an additional input;
(d) an instruction unit comprising means to detect adjacent instruction groups including:
interlocked instruction sequences where on adjacent instruction necessarily must be delayed pending completion of another;
adjacent instruction sequences in which one of a group of adjacent instructions requires operand inputs including an operand input which is the result of processing the operand inputs for another instruction of the group; and
independent adjacent instruction sequences which require unrelated operand inputs;
(e) control means connected to said instruction unit and said primary and secondary data flow facilities to control simultaneous processing of dependent adjacent instruction sequences and independent adjacent instruction sequences, and to delay processing of appropriate instructions when necessary in interlocked instruction sequences;
wherein said primary data flow facility is a minimum data flow facility equipped to process inputs, and said one or more secondary data flow facilities, in sequence by complexity, are equipped to process (n)+(k-1)(n-1) inputs, where (n) is the number of inputs to said minimum facility, (n) being an integer equal to or greater than 1, and (k) is the facility sequence position number.
2. A pipelined processing comprising:
(a) a pipeline instruction control unit,
(b) a primary data flow facility having inputs and having a reference capability to process a stream of instructions defined by said instruction control unit, said primary data flow facility comprising an (n)-input adder to produce for a given instruction a result;
(c) a secondary data flow facility having inputs and capability different from said reference capability of the primary data flow facility, comprising a (2n-1)-input adder so that the secondary data flow (2n-1)-input adder can process said inputs to said primary data flow facility n-input adder to emulate said result of processing by said (n)-input adder of said primary data flow facility as a subset of its input set;
(d) an instruction unit comprising means to detect adjacent instruction groups including:
interlocked instruction sequences where one adjacent instruction necessarily must be delayed pending completion of another;
adjacent instruction sequences in which one of a group of adjacent instructions require operand inputs including an operand input which is the result of processing the operand inputs for another instruction of the group; and
independent adjacent instruction sequences which require unrelated operand inputs; and
(e) control means connected to said instruction unit and said primary and secondary data flow facilities to control simultaneous processing of dependent adjacent instruction sequences and independent adjacent instruction sequences, and to delay processing of appropriate instructions when necessary in interlock instruction sequences.
3. A pipelined processing according to claim 2, in which said (2n-1)-input adder of said secondary data flow facility is a three-input adder and the said n-input adder of said primary data flow facility is a two-input adder.
4. A pipelined processor according to claim 3, further comprising means in said primary data flow facility to provide first and second operands to said two-input adder, to produce a third operand result, and means in said secondary data flow facility simultaneously to provide first, second and fourth operands to said three-input adder, whereby said three-input adder simultaneously computes a result emulating inputs including the third operand result of computation by said two-input adder and fourth operand.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US06/475,286 US4594655A (en) | 1983-03-14 | 1983-03-14 | (k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions |
JP59024549A JPS59173850A (en) | 1983-03-14 | 1984-02-14 | Pipeline type processor |
EP84102104A EP0118830B1 (en) | 1983-03-14 | 1984-02-29 | Pipelined processor |
DE8484102104T DE3481233D1 (en) | 1983-03-14 | 1984-02-29 | PIPELINE PROCESSING UNIT. |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US06/475,286 US4594655A (en) | 1983-03-14 | 1983-03-14 | (k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions |
Publications (1)
Publication Number | Publication Date |
---|---|
US4594655A true US4594655A (en) | 1986-06-10 |
Family
ID=23886933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US06/475,286 Expired - Lifetime US4594655A (en) | 1983-03-14 | 1983-03-14 | (k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions |
Country Status (4)
Country | Link |
---|---|
US (1) | US4594655A (en) |
EP (1) | EP0118830B1 (en) |
JP (1) | JPS59173850A (en) |
DE (1) | DE3481233D1 (en) |
Cited By (122)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4736288A (en) * | 1983-12-19 | 1988-04-05 | Hitachi, Ltd. | Data processing device |
US4752873A (en) * | 1985-05-22 | 1988-06-21 | Hitachi Vlsi Eng. Corp. | Data processor having a plurality of operating units, logical registers, and physical registers for parallel instructions execution |
US4758949A (en) * | 1985-11-15 | 1988-07-19 | Hitachi, Ltd. | Information processing apparatus |
US4760520A (en) * | 1984-10-31 | 1988-07-26 | Hitachi, Ltd. | Data processor capable of executing instructions under prediction |
US4766564A (en) * | 1984-08-13 | 1988-08-23 | International Business Machines Corporation | Dual putaway/bypass busses for multiple arithmetic units |
US4773041A (en) * | 1986-06-02 | 1988-09-20 | Unisys Corporation | System for executing a sequence of operation codes with some codes being executed out of order in a pipeline parallel processor |
WO1988007724A1 (en) * | 1987-04-02 | 1988-10-06 | Stellar Computer Inc. | Control of multiple processors executing in parallel regions |
US4783736A (en) * | 1985-07-22 | 1988-11-08 | Alliant Computer Systems Corporation | Digital computer with multisection cache |
US4794518A (en) * | 1979-07-28 | 1988-12-27 | Fujitsu Limited | Pipeline control system for an execution section of a pipeline computer with multiple selectable control registers in an address control stage |
US4794521A (en) * | 1985-07-22 | 1988-12-27 | Alliant Computer Systems Corporation | Digital computer with cache capable of concurrently handling multiple accesses from parallel processors |
US4800486A (en) * | 1983-09-29 | 1989-01-24 | Tandem Computers Incorporated | Multiple data patch CPU architecture |
US4811214A (en) * | 1986-11-14 | 1989-03-07 | Princeton University | Multinode reconfigurable pipeline computer |
US4825360A (en) * | 1986-07-30 | 1989-04-25 | Symbolics, Inc. | System and method for parallel processing with mostly functional languages |
US4855947A (en) * | 1987-05-27 | 1989-08-08 | Amdahl Corporation | Microprogrammable pipeline interlocks based on the validity of pipeline states |
US4858105A (en) * | 1986-03-26 | 1989-08-15 | Hitachi, Ltd. | Pipelined data processor capable of decoding and executing plural instructions in parallel |
US4872125A (en) * | 1987-06-26 | 1989-10-03 | Daisy Systems Corporation | Multiple processor accelerator for logic simulation |
US4873656A (en) * | 1987-06-26 | 1989-10-10 | Daisy Systems Corporation | Multiple processor accelerator for logic simulation |
US4884231A (en) * | 1986-09-26 | 1989-11-28 | Performance Semiconductor Corporation | Microprocessor system with extended arithmetic logic unit |
US4888689A (en) * | 1986-10-17 | 1989-12-19 | Amdahl Corporation | Apparatus and method for improving cache access throughput in pipelined processors |
US4916647A (en) * | 1987-06-26 | 1990-04-10 | Daisy Systems Corporation | Hardwired pipeline processor for logic simulation |
US4916652A (en) * | 1987-09-30 | 1990-04-10 | International Business Machines Corporation | Dynamic multiple instruction stream multiple data multiple pipeline apparatus for floating-point single instruction stream single data architectures |
US4924377A (en) * | 1983-12-28 | 1990-05-08 | Hitachi, Ltd. | Pipelined instruction processor capable of reading dependent operands in parallel |
US4926323A (en) * | 1988-03-03 | 1990-05-15 | Advanced Micro Devices, Inc. | Streamlined instruction processor |
US4935849A (en) * | 1988-05-16 | 1990-06-19 | Stardent Computer, Inc. | Chaining and hazard apparatus and method |
US4937783A (en) * | 1987-08-17 | 1990-06-26 | Advanced Micro Devices, Inc. | Peripheral controller for executing multiple event-count instructions and nonevent-count instructions in a prescribed parallel sequence |
US4942525A (en) * | 1986-11-21 | 1990-07-17 | Hitachi, Ltd. | Data processor for concurrent executing of instructions by plural execution units |
US4943916A (en) * | 1985-05-31 | 1990-07-24 | Matsushita Electric Industrial Co., Ltd. | Information processing apparatus for a data flow computer |
US4943915A (en) * | 1987-09-29 | 1990-07-24 | Digital Equipment Corporation | Apparatus and method for synchronization of a coprocessor unit in a pipelined central processing unit |
US4954947A (en) * | 1985-05-07 | 1990-09-04 | Hitachi, Ltd. | Instruction processor for processing branch instruction at high speed |
US4967339A (en) * | 1987-04-10 | 1990-10-30 | Hitachi, Ltd. | Operation control apparatus for a processor having a plurality of arithmetic devices |
US4967343A (en) * | 1983-05-18 | 1990-10-30 | International Business Machines Corp. | Pipelined parallel vector processor including parallel configured element processors for processing vector elements in parallel fashion |
US4969117A (en) * | 1988-05-16 | 1990-11-06 | Ardent Computer Corporation | Chaining and hazard apparatus and method |
US4974146A (en) * | 1988-05-06 | 1990-11-27 | Science Applications International Corporation | Array processor |
US4980824A (en) * | 1986-10-29 | 1990-12-25 | United Technologies Corporation | Event driven executive |
US5019967A (en) * | 1988-07-20 | 1991-05-28 | Digital Equipment Corporation | Pipeline bubble compression in a computer system |
US5036454A (en) * | 1987-05-01 | 1991-07-30 | Hewlett-Packard Company | Horizontal computer having register multiconnect for execution of a loop with overlapped code |
US5040107A (en) * | 1988-07-27 | 1991-08-13 | International Computers Limited | Pipelined processor with look-ahead mode of operation |
US5043868A (en) * | 1984-02-24 | 1991-08-27 | Fujitsu Limited | System for by-pass control in pipeline operation of computer |
US5050073A (en) * | 1987-01-09 | 1991-09-17 | Kabushiki Kaisha Toshiba | Microinstruction execution system for reducing execution time for calculating microinstruction |
WO1991017495A1 (en) * | 1990-05-04 | 1991-11-14 | International Business Machines Corporation | System for compounding instructions for handling instruction and data stream for processor with different attributes |
US5067068A (en) * | 1987-06-05 | 1991-11-19 | Hitachi, Ltd. | Method for converting an iterative loop of a source program into parellelly executable object program portions |
US5083267A (en) * | 1987-05-01 | 1992-01-21 | Hewlett-Packard Company | Horizontal computer having register multiconnect for execution of an instruction loop with recurrance |
US5088030A (en) * | 1986-03-28 | 1992-02-11 | Kabushiki Kaisha Toshiba | Branch address calculating system for branch instructions |
US5088035A (en) * | 1988-12-09 | 1992-02-11 | Commodore Business Machines, Inc. | System for accelerating execution of program instructions by a microprocessor |
US5101341A (en) * | 1988-08-25 | 1992-03-31 | Edgcore Technology, Inc. | Pipelined system for reducing instruction access time by accumulating predecoded instruction bits a FIFO |
US5115510A (en) * | 1987-10-20 | 1992-05-19 | Sharp Kabushiki Kaisha | Multistage data flow processor with instruction packet, fetch, storage transmission and address generation controlled by destination information |
US5117490A (en) * | 1988-07-27 | 1992-05-26 | International Computers Limited | Pipelined data processor with parameter files for passing parameters between pipeline units |
US5121502A (en) * | 1989-12-20 | 1992-06-09 | Hewlett-Packard Company | System for selectively communicating instructions from memory locations simultaneously or from the same memory locations sequentially to plurality of processing |
US5121488A (en) * | 1986-06-12 | 1992-06-09 | International Business Machines Corporation | Sequence controller of an instruction processing unit for placing said unit in a ready, go, hold, or cancel state |
US5131086A (en) * | 1988-08-25 | 1992-07-14 | Edgcore Technology, Inc. | Method and system for executing pipelined three operand construct |
US5148528A (en) * | 1989-02-03 | 1992-09-15 | Digital Equipment Corporation | Method and apparatus for simultaneously decoding three operands in a variable length instruction when one of the operands is also of variable length |
US5148536A (en) * | 1988-07-25 | 1992-09-15 | Digital Equipment Corporation | Pipeline having an integral cache which processes cache misses and loads data in parallel |
US5163139A (en) * | 1990-08-29 | 1992-11-10 | Hitachi America, Ltd. | Instruction preprocessor for conditionally combining short memory instructions into virtual long instructions |
US5179531A (en) * | 1990-04-27 | 1993-01-12 | Pioneer Electronic Corporation | Accelerated digital signal processor |
US5179734A (en) * | 1984-03-02 | 1993-01-12 | Texas Instruments Incorporated | Threaded interpretive data processor |
US5179672A (en) * | 1990-06-19 | 1993-01-12 | International Business Machines Corporation | Apparatus and method for modeling parallel processing of instructions using sequential execution hardware |
US5203002A (en) * | 1989-12-27 | 1993-04-13 | Wetzel Glen F | System with a multiport memory and N processing units for concurrently/individually executing 2N-multi-instruction-words at first/second transitions of a single clock cycle |
WO1993008526A1 (en) * | 1991-10-21 | 1993-04-29 | Intel Corporation | Cross coupling mechanisms for microprocessor instructions using pipelining systems |
US5214763A (en) * | 1990-05-10 | 1993-05-25 | International Business Machines Corporation | Digital computer system capable of processing two or more instructions in parallel and having a coche and instruction compounding mechanism |
US5226126A (en) * | 1989-02-24 | 1993-07-06 | Nexgen Microsystems | Processor having plurality of functional units for orderly retiring outstanding operations based upon its associated tags |
US5226131A (en) * | 1989-12-27 | 1993-07-06 | The United States Of America As Represented By The United States Department Of Energy | Sequencing and fan-out mechanism for causing a set of at least two sequential instructions to be performed in a dataflow processing computer |
US5226128A (en) * | 1987-05-01 | 1993-07-06 | Hewlett-Packard Company | Horizontal computer having register multiconnect for execution of a loop with a branch |
US5233694A (en) * | 1988-11-11 | 1993-08-03 | Hitachi, Ltd. | Pipelined data processor capable of performing instruction fetch stages of a plurality of instructions simultaneously |
US5247628A (en) * | 1987-11-30 | 1993-09-21 | International Business Machines Corporation | Parallel processor instruction dispatch apparatus with interrupt handler |
US5261071A (en) * | 1991-03-21 | 1993-11-09 | Control Data System, Inc. | Dual pipe cache memory with out-of-order issue capability |
US5276819A (en) * | 1987-05-01 | 1994-01-04 | Hewlett-Packard Company | Horizontal computer having register multiconnect for operand address generation during execution of iterations of a loop of program code |
US5293500A (en) * | 1989-02-10 | 1994-03-08 | Mitsubishi Denki K.K. | Parallel processing method and apparatus |
US5303356A (en) * | 1990-05-04 | 1994-04-12 | International Business Machines Corporation | System for issuing instructions for parallel execution subsequent to branch into a group of member instructions with compoundability in dictation tag |
US5333297A (en) * | 1989-11-09 | 1994-07-26 | International Business Machines Corporation | Multiprocessor system having multiple classes of instructions for purposes of mutual interruptibility |
US5355460A (en) * | 1990-06-26 | 1994-10-11 | International Business Machines Corporation | In-memory preprocessor for compounding a sequence of instructions for parallel computer system execution |
US5357617A (en) * | 1991-11-22 | 1994-10-18 | International Business Machines Corporation | Method and apparatus for substantially concurrent multiple instruction thread processing by a single pipeline processor |
US5359718A (en) * | 1991-03-29 | 1994-10-25 | International Business Machines Corporation | Early scalable instruction set machine alu status prediction apparatus |
US5363495A (en) * | 1991-08-26 | 1994-11-08 | International Business Machines Corporation | Data processing system with multiple execution units capable of executing instructions out of sequence |
US5390355A (en) * | 1989-05-24 | 1995-02-14 | Tandem Computers Incorporated | Computer architecture capable of concurrent issuance and execution of general purpose multiple instructions |
US5398321A (en) * | 1991-02-08 | 1995-03-14 | International Business Machines Corporation | Microcode generation for a scalable compound instruction set machine |
US5404466A (en) * | 1989-06-13 | 1995-04-04 | Nec Corporation | Apparatus and method to set and reset a pipeline instruction execution control unit for sequential execution of an instruction interval |
US5408626A (en) * | 1989-08-04 | 1995-04-18 | Intel Corporation | One clock address pipelining in segmentation unit |
US5428756A (en) * | 1989-09-25 | 1995-06-27 | Matsushita Electric Industrial Co., Ltd. | Pipelined computer with control of instruction advance |
US5428810A (en) * | 1991-03-15 | 1995-06-27 | Hewlett-Packard Company | Allocation of resources of a pipelined processor by clock phase for parallel execution of dependent processes |
US5432724A (en) * | 1992-12-04 | 1995-07-11 | U.S. Philips Corporation | Processor for uniform operations on respective series of successive data in respective parallel data streams |
US5448746A (en) * | 1990-05-04 | 1995-09-05 | International Business Machines Corporation | System for comounding instructions in a byte stream prior to fetching and identifying the instructions for execution |
US5471593A (en) * | 1989-12-11 | 1995-11-28 | Branigin; Michael H. | Computer processor with an efficient means of executing many instructions simultaneously |
US5488729A (en) * | 1991-05-15 | 1996-01-30 | Ross Technology, Inc. | Central processing unit architecture with symmetric instruction scheduling to achieve multiple instruction launch and execution |
US5502826A (en) * | 1990-05-04 | 1996-03-26 | International Business Machines Corporation | System and method for obtaining parallel existing instructions in a particular data processing configuration by compounding instructions |
US5506974A (en) * | 1990-03-23 | 1996-04-09 | Unisys Corporation | Method and means for concatenating multiple instructions |
US5522052A (en) * | 1991-07-04 | 1996-05-28 | Matsushita Electric Industrial Co. Ltd. | Pipeline processor for processing instructions having a data dependence relationship |
US5526500A (en) * | 1990-02-13 | 1996-06-11 | Hewlett-Packard Company | System for operand bypassing to allow a one and one-half cycle cache memory access time for sequential load and branch instructions |
US5561775A (en) * | 1989-07-07 | 1996-10-01 | Hitachi, Ltd. | Parallel processing apparatus and method capable of processing plural instructions in parallel or successively |
US5590368A (en) * | 1993-03-31 | 1996-12-31 | Intel Corporation | Method and apparatus for dynamically expanding the pipeline of a microprocessor |
US5617561A (en) * | 1994-12-22 | 1997-04-01 | International Business Machines Corporation | Message sequence number control in a virtual time system |
US5636353A (en) * | 1991-06-17 | 1997-06-03 | Mitsubishi Denki Kabushiki Kaisha | Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassing |
US5691920A (en) * | 1995-10-02 | 1997-11-25 | International Business Machines Corporation | Method and system for performance monitoring of dispatch unit efficiency in a processing system |
US5729726A (en) * | 1995-10-02 | 1998-03-17 | International Business Machines Corporation | Method and system for performance monitoring efficiency of branch unit operation in a processing system |
US5748855A (en) * | 1995-10-02 | 1998-05-05 | Iinternational Business Machines Corporation | Method and system for performance monitoring of misaligned memory accesses in a processing system |
US5752062A (en) * | 1995-10-02 | 1998-05-12 | International Business Machines Corporation | Method and system for performance monitoring through monitoring an order of processor events during execution in a processing system |
US5751945A (en) * | 1995-10-02 | 1998-05-12 | International Business Machines Corporation | Method and system for performance monitoring stalls to identify pipeline bottlenecks and stalls in a processing system |
US5797019A (en) * | 1995-10-02 | 1998-08-18 | International Business Machines Corporation | Method and system for performance monitoring time lengths of disabled interrupts in a processing system |
US5870577A (en) * | 1996-11-27 | 1999-02-09 | International Business Machines, Corp. | System and method for dispatching two instructions to the same execution unit in a single cycle |
US5949971A (en) * | 1995-10-02 | 1999-09-07 | International Business Machines Corporation | Method and system for performance monitoring through identification of frequency and length of time of execution of serialization instructions in a processing system |
US5963723A (en) * | 1997-03-26 | 1999-10-05 | International Business Machines Corporation | System for pairing dependent instructions having non-contiguous addresses during dispatch |
US6212629B1 (en) | 1989-02-24 | 2001-04-03 | Advanced Micro Devices, Inc. | Method and apparatus for executing string instructions |
US6289439B1 (en) * | 1999-01-08 | 2001-09-11 | Rise Technology, Inc. | Method, device and microprocessor for performing an XOR clear without executing an XOR instruction |
US20030126414A1 (en) * | 2002-01-02 | 2003-07-03 | Grochowski Edward T. | Processing partial register writes in an out-of order processor |
US6598153B1 (en) * | 1999-12-10 | 2003-07-22 | International Business Machines Corporation | Processor and method that accelerate evaluation of pairs of condition-setting and branch instructions |
US6609189B1 (en) * | 1998-03-12 | 2003-08-19 | Yale University | Cycle segmented prefix circuits |
US20030159021A1 (en) * | 1999-09-03 | 2003-08-21 | Darren Kerr | Selected register decode values for pipeline stage register addressing |
US20030208674A1 (en) * | 1998-03-18 | 2003-11-06 | Sih Gilbert C. | Digital signal processor with variable length instruction set |
WO2004027647A1 (en) * | 2002-09-18 | 2004-04-01 | Netezza Corporation | Field oriented pipeline architecture for a programmable data streaming processor |
US20040093485A1 (en) * | 1991-07-08 | 2004-05-13 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US20040128475A1 (en) * | 2002-12-31 | 2004-07-01 | Gad Sheaffer | Widely accessible processor register file and method for use |
US20040128483A1 (en) * | 2002-12-31 | 2004-07-01 | Intel Corporation | Fuser renamer apparatus, systems, and methods |
US20040255099A1 (en) * | 2003-06-12 | 2004-12-16 | Kromer Stephen Charles | Method and data processor with reduced stalling due to operand dependencies |
US20050132170A1 (en) * | 2002-04-18 | 2005-06-16 | Koninklijke Philips Electroncis N.C. | Multi-issue processor |
US20050198472A1 (en) * | 2004-03-04 | 2005-09-08 | Sih Gilbert C. | Digital signal processors with configurable dual-MAC and dual-ALU |
US7320065B2 (en) | 2001-04-26 | 2008-01-15 | Eleven Engineering Incorporated | Multithread embedded processor with input/output capability |
US7516305B2 (en) | 1992-05-01 | 2009-04-07 | Seiko Epson Corporation | System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor |
US7558945B2 (en) | 1992-12-31 | 2009-07-07 | Seiko Epson Corporation | System and method for register renaming |
US20090204796A1 (en) * | 2008-02-08 | 2009-08-13 | International Business Machines Corporation | Method, system and computer program product for verifying address generation, interlocks and bypasses |
US20090307470A1 (en) * | 2005-11-25 | 2009-12-10 | Masaki Maeda | Multi thread processor having dynamic reconfiguration logic circuit |
US7685402B2 (en) | 1991-07-08 | 2010-03-23 | Sanjiv Garg | RISC microprocessor architecture implementing multiple typed register sets |
US7802074B2 (en) | 1992-03-31 | 2010-09-21 | Sanjiv Garg | Superscalar RISC instruction scheduling |
US8074052B2 (en) | 1992-12-31 | 2011-12-06 | Seiko Epson Corporation | System and method for assigning tags to control instruction processing in a superscalar processor |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5051940A (en) * | 1990-04-04 | 1991-09-24 | International Business Machines Corporation | Data dependency collapsing hardware apparatus |
JP2657947B2 (en) * | 1986-08-08 | 1997-09-30 | 株式会社日立製作所 | Data processing device |
JPH01312633A (en) * | 1988-06-10 | 1989-12-18 | Matsushita Electric Ind Co Ltd | Parallel processing type information processor |
US5202967A (en) * | 1988-08-09 | 1993-04-13 | Matsushita Electric Industrial Co., Ltd. | Data processing apparatus for performing parallel decoding and parallel execution of a variable word length instruction |
EP0354740B1 (en) * | 1988-08-09 | 1996-06-19 | Matsushita Electric Industrial Co., Ltd. | Data processing apparatus for performing parallel decoding and parallel execution of a variable word length instruction |
EP0478745A4 (en) * | 1990-04-04 | 1993-09-01 | International Business Machines Corporation | High performance interlock collapsing scism alu apparatus |
US5301341A (en) * | 1990-11-28 | 1994-04-05 | International Business Machines Corporation | Overflow determination for three-operand alus in a scalable compound instruction set machine which compounds two arithmetic instructions |
GB2263565B (en) * | 1992-01-23 | 1995-08-30 | Intel Corp | Microprocessor with apparatus for parallel execution of instructions |
CA2123442A1 (en) * | 1993-09-20 | 1995-03-21 | David S. Ray | Multiple execution unit dispatch with instruction dependency |
US20030096339A1 (en) | 2000-06-26 | 2003-05-22 | Sprecher Cindy A. | Cytokine receptor zcytor17 |
CN101699392B (en) * | 2009-11-12 | 2012-05-09 | 中国人民解放军国防科学技术大学 | Method for multiplexing IO units in stream processor |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3689895A (en) * | 1969-11-24 | 1972-09-05 | Nippon Electric Co | Micro-program control system |
US3787673A (en) * | 1972-04-28 | 1974-01-22 | Texas Instruments Inc | Pipelined high speed arithmetic unit |
US3840861A (en) * | 1972-10-30 | 1974-10-08 | Amdahl Corp | Data processing system having an instruction pipeline for concurrently processing a plurality of instructions |
US3928857A (en) * | 1973-08-30 | 1975-12-23 | Ibm | Instruction fetch apparatus with combined look-ahead and look-behind capability |
US3932845A (en) * | 1973-01-26 | 1976-01-13 | Thomson-Csf | Specialized digital computer with divided memory and arithmetic units |
US3949379A (en) * | 1973-07-19 | 1976-04-06 | International Computers Limited | Pipeline data processing apparatus with high speed slave store |
US3969702A (en) * | 1973-07-10 | 1976-07-13 | Honeywell Information Systems, Inc. | Electronic computer with independent functional networks for simultaneously carrying out different operations on the same data |
US4057846A (en) * | 1976-06-07 | 1977-11-08 | International Business Machines Corporation | Bus steering structure for low cost pipelined processor system |
US4062058A (en) * | 1976-02-13 | 1977-12-06 | The United States Of America As Represented By The Secretary Of The Navy | Next address subprocessor |
US4085450A (en) * | 1976-12-29 | 1978-04-18 | Burroughs Corporation | Performance invarient execution unit for non-communicative instructions |
US4152763A (en) * | 1975-02-19 | 1979-05-01 | Hitachi, Ltd. | Control system for central processing unit with plural execution units |
US4365311A (en) * | 1977-09-07 | 1982-12-21 | Hitachi, Ltd. | Control of instruction pipeline in data processing system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4439828A (en) * | 1981-07-27 | 1984-03-27 | International Business Machines Corp. | Instruction substitution mechanism in an instruction handling unit of a data processing system |
JPS5896346A (en) * | 1981-12-02 | 1983-06-08 | Hitachi Ltd | Hierarchical calculation method |
JPS58142447A (en) * | 1982-02-19 | 1983-08-24 | Hitachi Ltd | data processing equipment |
-
1983
- 1983-03-14 US US06/475,286 patent/US4594655A/en not_active Expired - Lifetime
-
1984
- 1984-02-14 JP JP59024549A patent/JPS59173850A/en active Granted
- 1984-02-29 EP EP84102104A patent/EP0118830B1/en not_active Expired - Lifetime
- 1984-02-29 DE DE8484102104T patent/DE3481233D1/en not_active Expired - Lifetime
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3689895A (en) * | 1969-11-24 | 1972-09-05 | Nippon Electric Co | Micro-program control system |
US3787673A (en) * | 1972-04-28 | 1974-01-22 | Texas Instruments Inc | Pipelined high speed arithmetic unit |
US3840861A (en) * | 1972-10-30 | 1974-10-08 | Amdahl Corp | Data processing system having an instruction pipeline for concurrently processing a plurality of instructions |
US3932845A (en) * | 1973-01-26 | 1976-01-13 | Thomson-Csf | Specialized digital computer with divided memory and arithmetic units |
US3969702A (en) * | 1973-07-10 | 1976-07-13 | Honeywell Information Systems, Inc. | Electronic computer with independent functional networks for simultaneously carrying out different operations on the same data |
US3949379A (en) * | 1973-07-19 | 1976-04-06 | International Computers Limited | Pipeline data processing apparatus with high speed slave store |
US3928857A (en) * | 1973-08-30 | 1975-12-23 | Ibm | Instruction fetch apparatus with combined look-ahead and look-behind capability |
US4152763A (en) * | 1975-02-19 | 1979-05-01 | Hitachi, Ltd. | Control system for central processing unit with plural execution units |
US4062058A (en) * | 1976-02-13 | 1977-12-06 | The United States Of America As Represented By The Secretary Of The Navy | Next address subprocessor |
US4057846A (en) * | 1976-06-07 | 1977-11-08 | International Business Machines Corporation | Bus steering structure for low cost pipelined processor system |
US4085450A (en) * | 1976-12-29 | 1978-04-18 | Burroughs Corporation | Performance invarient execution unit for non-communicative instructions |
US4365311A (en) * | 1977-09-07 | 1982-12-21 | Hitachi, Ltd. | Control of instruction pipeline in data processing system |
Non-Patent Citations (20)
Title |
---|
"Eliminating the Overhead of Floating Point Load and Store Instructions by Decoding Two Instructions Per Cycle in the Floating Point Unit", T. K. M. Agerwala et al., IBM TDB, vol. 25, No. L, Jun. 1982, pp. 126-129. |
"Load Bypass for Address Arithmetic", J. S. Liptay and J. W. Rymarczyk, IBM TDB, vol. 20, No. 9, 02/78, pp. 3606-3607. |
"Parallel Pipeline Organization of Execution Unit", D. Sofer and W. W. Sproul, III IBM TDB, vol. 14, No. 10, 03/72, pp. 2930-2933. |
"Sequential I-Fetching Mechanisms," J. H. Pomerene et al., IBM TDB, vol. 25, No. 1, Jun. 1982, pp. 124-125. |
"Variable I-Fetch", D. K. Hardin, IBM TDB, vol. 20, No. 7, 12/77, pp. 2547-2548. |
Eliminating the Overhead of Floating Point Load and Store Instructions by Decoding Two Instructions Per Cycle in the Floating Point Unit , T. K. M. Agerwala et al., IBM TDB, vol. 25, No. L, Jun. 1982, pp. 126 129. * |
Irwin and Heller, "Online Pipeline Systems for Recursive Numeric Computations", The 7th Annual Symposium on Computer Architecture, May 6-8, 1980, pp. 292-299, CH1494-4/80/0000-0292 1979 IEEE. |
Irwin and Heller, Online Pipeline Systems for Recursive Numeric Computations , The 7th Annual Symposium on Computer Architecture, May 6 8, 1980, pp. 292 299, CH1494 4/80/0000 0292 1979 IEEE. * |
Irwin, "A Pipelined Processing Unit for On-Line Division," The 5th Annual Symposium on Computer Architecture, Apr. 3-5, 1978, pp. 24-30, 78CH1284-9C 1979, IEEE. |
Irwin, A Pipelined Processing Unit for On Line Division, The 5th Annual Symposium on Computer Architecture, Apr. 3 5, 1978, pp. 24 30, 78CH1284 9C 1979, IEEE. * |
Lang et al., "A Modeling Approach and Design Tool for Pipelined Central Processors," The 6th Annual Symposium on Computer Architecture, Apr. 23-25, 1979, pp. 122-129, CH1394-6/79-0000-0122 1979 IEEE. |
Lang et al., A Modeling Approach and Design Tool for Pipelined Central Processors, The 6th Annual Symposium on Computer Architecture, Apr. 23 25, 1979, pp. 122 129, CH1394 6/79 0000 0122 1979 IEEE. * |
Load Bypass for Address Arithmetic , J. S. Liptay and J. W. Rymarczyk, IBM TDB, vol. 20, No. 9, 02/78, pp. 3606 3607. * |
Owens et al., "On-Line Algorithms for the Design of Pipeline Architectures", The 6th Annual Symposium on Computer Architecture, Apr. 23-25, 1979, pp. 12-19, CH1394-6/79/0000-0012 1979, IEEE. |
Owens et al., On Line Algorithms for the Design of Pipeline Architectures , The 6th Annual Symposium on Computer Architecture, Apr. 23 25, 1979, pp. 12 19, CH1394 6/79/0000 0012 1979, IEEE. * |
Parallel Pipeline Organization of Execution Unit , D. Sofer and W. W. Sproul, III IBM TDB, vol. 14, No. 10, 03/72, pp. 2930 2933. * |
Patel, "Pipelines with Internal Buffers", The 5th Annual Symposium on Computer Architecture, Apr. 3-5, 1978, pp. 249-254, 78CH1284-9C 1979, IEEE. |
Patel, Pipelines with Internal Buffers , The 5th Annual Symposium on Computer Architecture, Apr. 3 5, 1978, pp. 249 254, 78CH1284 9C 1979, IEEE. * |
Sequential I Fetching Mechanisms, J. H. Pomerene et al., IBM TDB, vol. 25, No. 1, Jun. 1982, pp. 124 125. * |
Variable I Fetch , D. K. Hardin, IBM TDB, vol. 20, No. 7, 12/77, pp. 2547 2548. * |
Cited By (174)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4794518A (en) * | 1979-07-28 | 1988-12-27 | Fujitsu Limited | Pipeline control system for an execution section of a pipeline computer with multiple selectable control registers in an address control stage |
US4967343A (en) * | 1983-05-18 | 1990-10-30 | International Business Machines Corp. | Pipelined parallel vector processor including parallel configured element processors for processing vector elements in parallel fashion |
US4800486A (en) * | 1983-09-29 | 1989-01-24 | Tandem Computers Incorporated | Multiple data patch CPU architecture |
US4736288A (en) * | 1983-12-19 | 1988-04-05 | Hitachi, Ltd. | Data processing device |
US4924377A (en) * | 1983-12-28 | 1990-05-08 | Hitachi, Ltd. | Pipelined instruction processor capable of reading dependent operands in parallel |
US5043868A (en) * | 1984-02-24 | 1991-08-27 | Fujitsu Limited | System for by-pass control in pipeline operation of computer |
US5179734A (en) * | 1984-03-02 | 1993-01-12 | Texas Instruments Incorporated | Threaded interpretive data processor |
US4766564A (en) * | 1984-08-13 | 1988-08-23 | International Business Machines Corporation | Dual putaway/bypass busses for multiple arithmetic units |
US4760520A (en) * | 1984-10-31 | 1988-07-26 | Hitachi, Ltd. | Data processor capable of executing instructions under prediction |
US4954947A (en) * | 1985-05-07 | 1990-09-04 | Hitachi, Ltd. | Instruction processor for processing branch instruction at high speed |
US4752873A (en) * | 1985-05-22 | 1988-06-21 | Hitachi Vlsi Eng. Corp. | Data processor having a plurality of operating units, logical registers, and physical registers for parallel instructions execution |
US4943916A (en) * | 1985-05-31 | 1990-07-24 | Matsushita Electric Industrial Co., Ltd. | Information processing apparatus for a data flow computer |
US4794521A (en) * | 1985-07-22 | 1988-12-27 | Alliant Computer Systems Corporation | Digital computer with cache capable of concurrently handling multiple accesses from parallel processors |
US4783736A (en) * | 1985-07-22 | 1988-11-08 | Alliant Computer Systems Corporation | Digital computer with multisection cache |
US4758949A (en) * | 1985-11-15 | 1988-07-19 | Hitachi, Ltd. | Information processing apparatus |
US4858105A (en) * | 1986-03-26 | 1989-08-15 | Hitachi, Ltd. | Pipelined data processor capable of decoding and executing plural instructions in parallel |
US5088030A (en) * | 1986-03-28 | 1992-02-11 | Kabushiki Kaisha Toshiba | Branch address calculating system for branch instructions |
US4773041A (en) * | 1986-06-02 | 1988-09-20 | Unisys Corporation | System for executing a sequence of operation codes with some codes being executed out of order in a pipeline parallel processor |
US5121488A (en) * | 1986-06-12 | 1992-06-09 | International Business Machines Corporation | Sequence controller of an instruction processing unit for placing said unit in a ready, go, hold, or cancel state |
US4825360A (en) * | 1986-07-30 | 1989-04-25 | Symbolics, Inc. | System and method for parallel processing with mostly functional languages |
US4884231A (en) * | 1986-09-26 | 1989-11-28 | Performance Semiconductor Corporation | Microprocessor system with extended arithmetic logic unit |
US4888689A (en) * | 1986-10-17 | 1989-12-19 | Amdahl Corporation | Apparatus and method for improving cache access throughput in pipelined processors |
US4980824A (en) * | 1986-10-29 | 1990-12-25 | United Technologies Corporation | Event driven executive |
US4811214A (en) * | 1986-11-14 | 1989-03-07 | Princeton University | Multinode reconfigurable pipeline computer |
US5671382A (en) * | 1986-11-21 | 1997-09-23 | Hitachi, Ltd. | Information processing system and information processing method for executing instructions in parallel |
US5922068A (en) * | 1986-11-21 | 1999-07-13 | Hitachi Ltd. | Information processing system and information processing method for executing instructions in parallel |
US4942525A (en) * | 1986-11-21 | 1990-07-17 | Hitachi, Ltd. | Data processor for concurrent executing of instructions by plural execution units |
US5050073A (en) * | 1987-01-09 | 1991-09-17 | Kabushiki Kaisha Toshiba | Microinstruction execution system for reducing execution time for calculating microinstruction |
US4829422A (en) * | 1987-04-02 | 1989-05-09 | Stellar Computer, Inc. | Control of multiple processors executing in parallel regions |
WO1988007724A1 (en) * | 1987-04-02 | 1988-10-06 | Stellar Computer Inc. | Control of multiple processors executing in parallel regions |
US4967339A (en) * | 1987-04-10 | 1990-10-30 | Hitachi, Ltd. | Operation control apparatus for a processor having a plurality of arithmetic devices |
US5276819A (en) * | 1987-05-01 | 1994-01-04 | Hewlett-Packard Company | Horizontal computer having register multiconnect for operand address generation during execution of iterations of a loop of program code |
US5083267A (en) * | 1987-05-01 | 1992-01-21 | Hewlett-Packard Company | Horizontal computer having register multiconnect for execution of an instruction loop with recurrance |
US5226128A (en) * | 1987-05-01 | 1993-07-06 | Hewlett-Packard Company | Horizontal computer having register multiconnect for execution of a loop with a branch |
US5036454A (en) * | 1987-05-01 | 1991-07-30 | Hewlett-Packard Company | Horizontal computer having register multiconnect for execution of a loop with overlapped code |
US4855947A (en) * | 1987-05-27 | 1989-08-08 | Amdahl Corporation | Microprogrammable pipeline interlocks based on the validity of pipeline states |
US5067068A (en) * | 1987-06-05 | 1991-11-19 | Hitachi, Ltd. | Method for converting an iterative loop of a source program into parellelly executable object program portions |
US4873656A (en) * | 1987-06-26 | 1989-10-10 | Daisy Systems Corporation | Multiple processor accelerator for logic simulation |
US4916647A (en) * | 1987-06-26 | 1990-04-10 | Daisy Systems Corporation | Hardwired pipeline processor for logic simulation |
US4872125A (en) * | 1987-06-26 | 1989-10-03 | Daisy Systems Corporation | Multiple processor accelerator for logic simulation |
US4937783A (en) * | 1987-08-17 | 1990-06-26 | Advanced Micro Devices, Inc. | Peripheral controller for executing multiple event-count instructions and nonevent-count instructions in a prescribed parallel sequence |
US4943915A (en) * | 1987-09-29 | 1990-07-24 | Digital Equipment Corporation | Apparatus and method for synchronization of a coprocessor unit in a pipelined central processing unit |
US4916652A (en) * | 1987-09-30 | 1990-04-10 | International Business Machines Corporation | Dynamic multiple instruction stream multiple data multiple pipeline apparatus for floating-point single instruction stream single data architectures |
US5115510A (en) * | 1987-10-20 | 1992-05-19 | Sharp Kabushiki Kaisha | Multistage data flow processor with instruction packet, fetch, storage transmission and address generation controlled by destination information |
US5247628A (en) * | 1987-11-30 | 1993-09-21 | International Business Machines Corporation | Parallel processor instruction dispatch apparatus with interrupt handler |
US4926323A (en) * | 1988-03-03 | 1990-05-15 | Advanced Micro Devices, Inc. | Streamlined instruction processor |
US4974146A (en) * | 1988-05-06 | 1990-11-27 | Science Applications International Corporation | Array processor |
US4969117A (en) * | 1988-05-16 | 1990-11-06 | Ardent Computer Corporation | Chaining and hazard apparatus and method |
US4935849A (en) * | 1988-05-16 | 1990-06-19 | Stardent Computer, Inc. | Chaining and hazard apparatus and method |
US5019967A (en) * | 1988-07-20 | 1991-05-28 | Digital Equipment Corporation | Pipeline bubble compression in a computer system |
US5148536A (en) * | 1988-07-25 | 1992-09-15 | Digital Equipment Corporation | Pipeline having an integral cache which processes cache misses and loads data in parallel |
US5430888A (en) * | 1988-07-25 | 1995-07-04 | Digital Equipment Corporation | Pipeline utilizing an integral cache for transferring data to and from a register |
US5040107A (en) * | 1988-07-27 | 1991-08-13 | International Computers Limited | Pipelined processor with look-ahead mode of operation |
US5117490A (en) * | 1988-07-27 | 1992-05-26 | International Computers Limited | Pipelined data processor with parameter files for passing parameters between pipeline units |
US5101341A (en) * | 1988-08-25 | 1992-03-31 | Edgcore Technology, Inc. | Pipelined system for reducing instruction access time by accumulating predecoded instruction bits a FIFO |
US5131086A (en) * | 1988-08-25 | 1992-07-14 | Edgcore Technology, Inc. | Method and system for executing pipelined three operand construct |
US6256726B1 (en) | 1988-11-11 | 2001-07-03 | Hitachi, Ltd. | Data processor for the parallel processing of a plurality of instructions |
US20010021970A1 (en) * | 1988-11-11 | 2001-09-13 | Takashi Hotta | Data processor |
US7424598B2 (en) | 1988-11-11 | 2008-09-09 | Renesas Technology Corp. | Data processor |
US5233694A (en) * | 1988-11-11 | 1993-08-03 | Hitachi, Ltd. | Pipelined data processor capable of performing instruction fetch stages of a plurality of instructions simultaneously |
US5088035A (en) * | 1988-12-09 | 1992-02-11 | Commodore Business Machines, Inc. | System for accelerating execution of program instructions by a microprocessor |
US5148528A (en) * | 1989-02-03 | 1992-09-15 | Digital Equipment Corporation | Method and apparatus for simultaneously decoding three operands in a variable length instruction when one of the operands is also of variable length |
US5293500A (en) * | 1989-02-10 | 1994-03-08 | Mitsubishi Denki K.K. | Parallel processing method and apparatus |
US6499123B1 (en) | 1989-02-24 | 2002-12-24 | Advanced Micro Devices, Inc. | Method and apparatus for debugging an integrated circuit |
US6212629B1 (en) | 1989-02-24 | 2001-04-03 | Advanced Micro Devices, Inc. | Method and apparatus for executing string instructions |
US5226126A (en) * | 1989-02-24 | 1993-07-06 | Nexgen Microsystems | Processor having plurality of functional units for orderly retiring outstanding operations based upon its associated tags |
US5442757A (en) * | 1989-02-24 | 1995-08-15 | Nexgen, Inc. | Computer processor with distributed pipeline control that allows functional units to complete operations out of order while maintaining precise interrupts |
US6009506A (en) * | 1989-05-24 | 1999-12-28 | Tandem Computers Incorporated | Computer architecture capable of concurrent issuance and execution of general purpose multiple instructions |
US5574941A (en) * | 1989-05-24 | 1996-11-12 | Tandem Computers Incorporated | Computer architecture capable of concurrent issuance and execution of general purpose multiple instruction |
US5628024A (en) * | 1989-05-24 | 1997-05-06 | Tandem Computers Incorporated | Computer architecture capable of concurrent issuance and execution of general purpose multiple instructions |
US5752064A (en) * | 1989-05-24 | 1998-05-12 | Tandem Computers Incorporated | Computer architecture capable of concurrent issuance and execution of general purpose multiple instructions |
US6092177A (en) * | 1989-05-24 | 2000-07-18 | Tandem Computers Incorporated | Computer architecture capable of execution of general purpose multiple instructions |
US5390355A (en) * | 1989-05-24 | 1995-02-14 | Tandem Computers Incorporated | Computer architecture capable of concurrent issuance and execution of general purpose multiple instructions |
US5404466A (en) * | 1989-06-13 | 1995-04-04 | Nec Corporation | Apparatus and method to set and reset a pipeline instruction execution control unit for sequential execution of an instruction interval |
US5561775A (en) * | 1989-07-07 | 1996-10-01 | Hitachi, Ltd. | Parallel processing apparatus and method capable of processing plural instructions in parallel or successively |
US5408626A (en) * | 1989-08-04 | 1995-04-18 | Intel Corporation | One clock address pipelining in segmentation unit |
US5428756A (en) * | 1989-09-25 | 1995-06-27 | Matsushita Electric Industrial Co., Ltd. | Pipelined computer with control of instruction advance |
US5333297A (en) * | 1989-11-09 | 1994-07-26 | International Business Machines Corporation | Multiprocessor system having multiple classes of instructions for purposes of mutual interruptibility |
US5471593A (en) * | 1989-12-11 | 1995-11-28 | Branigin; Michael H. | Computer processor with an efficient means of executing many instructions simultaneously |
US5121502A (en) * | 1989-12-20 | 1992-06-09 | Hewlett-Packard Company | System for selectively communicating instructions from memory locations simultaneously or from the same memory locations sequentially to plurality of processing |
US5203002A (en) * | 1989-12-27 | 1993-04-13 | Wetzel Glen F | System with a multiport memory and N processing units for concurrently/individually executing 2N-multi-instruction-words at first/second transitions of a single clock cycle |
US5226131A (en) * | 1989-12-27 | 1993-07-06 | The United States Of America As Represented By The United States Department Of Energy | Sequencing and fan-out mechanism for causing a set of at least two sequential instructions to be performed in a dataflow processing computer |
US5526500A (en) * | 1990-02-13 | 1996-06-11 | Hewlett-Packard Company | System for operand bypassing to allow a one and one-half cycle cache memory access time for sequential load and branch instructions |
US5506974A (en) * | 1990-03-23 | 1996-04-09 | Unisys Corporation | Method and means for concatenating multiple instructions |
US5179531A (en) * | 1990-04-27 | 1993-01-12 | Pioneer Electronic Corporation | Accelerated digital signal processor |
US5303356A (en) * | 1990-05-04 | 1994-04-12 | International Business Machines Corporation | System for issuing instructions for parallel execution subsequent to branch into a group of member instructions with compoundability in dictation tag |
US5502826A (en) * | 1990-05-04 | 1996-03-26 | International Business Machines Corporation | System and method for obtaining parallel existing instructions in a particular data processing configuration by compounding instructions |
WO1991017495A1 (en) * | 1990-05-04 | 1991-11-14 | International Business Machines Corporation | System for compounding instructions for handling instruction and data stream for processor with different attributes |
WO1991017496A1 (en) * | 1990-05-04 | 1991-11-14 | International Business Machines Corporation | System for preparing instructions for instruction parallel processor and system with mechanism for branching in the middle of a compound instruction |
US5448746A (en) * | 1990-05-04 | 1995-09-05 | International Business Machines Corporation | System for comounding instructions in a byte stream prior to fetching and identifying the instructions for execution |
EP0481031A1 (en) * | 1990-05-04 | 1992-04-22 | International Business Machines Corporation | System for compounding instructions for handling instruction and data stream for processor with different attributes |
EP0481031A4 (en) * | 1990-05-04 | 1993-01-27 | International Business Machines Corporation | System for compounding instructions for handling instruction and data stream for processor with different attributes |
US5214763A (en) * | 1990-05-10 | 1993-05-25 | International Business Machines Corporation | Digital computer system capable of processing two or more instructions in parallel and having a coche and instruction compounding mechanism |
US5179672A (en) * | 1990-06-19 | 1993-01-12 | International Business Machines Corporation | Apparatus and method for modeling parallel processing of instructions using sequential execution hardware |
US5459844A (en) * | 1990-06-26 | 1995-10-17 | International Business Machines Corporation | Predecode instruction compounding |
US5355460A (en) * | 1990-06-26 | 1994-10-11 | International Business Machines Corporation | In-memory preprocessor for compounding a sequence of instructions for parallel computer system execution |
US5163139A (en) * | 1990-08-29 | 1992-11-10 | Hitachi America, Ltd. | Instruction preprocessor for conditionally combining short memory instructions into virtual long instructions |
US5398321A (en) * | 1991-02-08 | 1995-03-14 | International Business Machines Corporation | Microcode generation for a scalable compound instruction set machine |
US5428810A (en) * | 1991-03-15 | 1995-06-27 | Hewlett-Packard Company | Allocation of resources of a pipelined processor by clock phase for parallel execution of dependent processes |
US5261071A (en) * | 1991-03-21 | 1993-11-09 | Control Data System, Inc. | Dual pipe cache memory with out-of-order issue capability |
US5359718A (en) * | 1991-03-29 | 1994-10-25 | International Business Machines Corporation | Early scalable instruction set machine alu status prediction apparatus |
US5488729A (en) * | 1991-05-15 | 1996-01-30 | Ross Technology, Inc. | Central processing unit architecture with symmetric instruction scheduling to achieve multiple instruction launch and execution |
US5636353A (en) * | 1991-06-17 | 1997-06-03 | Mitsubishi Denki Kabushiki Kaisha | Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating results bypassing |
US6233670B1 (en) | 1991-06-17 | 2001-05-15 | Mitsubishi Denki Kabushiki Kaisha | Superscalar processor with direct result bypass between execution units having comparators in execution units for comparing operand and result addresses and activating result bypassing |
US5522052A (en) * | 1991-07-04 | 1996-05-28 | Matsushita Electric Industrial Co. Ltd. | Pipeline processor for processing instructions having a data dependence relationship |
US20040093485A1 (en) * | 1991-07-08 | 2004-05-13 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US7739482B2 (en) | 1991-07-08 | 2010-06-15 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US7685402B2 (en) | 1991-07-08 | 2010-03-23 | Sanjiv Garg | RISC microprocessor architecture implementing multiple typed register sets |
US7941636B2 (en) | 1991-07-08 | 2011-05-10 | Intellectual Venture Funding Llc | RISC microprocessor architecture implementing multiple typed register sets |
US7721070B2 (en) | 1991-07-08 | 2010-05-18 | Le Trong Nguyen | High-performance, superscalar-based computer system with out-of-order instruction execution |
US7487333B2 (en) * | 1991-07-08 | 2009-02-03 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US5363495A (en) * | 1991-08-26 | 1994-11-08 | International Business Machines Corporation | Data processing system with multiple execution units capable of executing instructions out of sequence |
GB2275551B (en) * | 1991-10-21 | 1995-06-28 | Intel Corp | Cross coupling mechanisms for microprocessor instructions using pipelining systems |
WO1993008526A1 (en) * | 1991-10-21 | 1993-04-29 | Intel Corporation | Cross coupling mechanisms for microprocessor instructions using pipelining systems |
US5283874A (en) * | 1991-10-21 | 1994-02-01 | Intel Corporation | Cross coupling mechanisms for simultaneously completing consecutive pipeline instructions even if they begin to process at the same microprocessor of the issue fee |
GB2275551A (en) * | 1991-10-21 | 1994-08-31 | Intel Corp | Cross coupling mechanisms for microprocessor instructions using pipelining systems. |
US5357617A (en) * | 1991-11-22 | 1994-10-18 | International Business Machines Corporation | Method and apparatus for substantially concurrent multiple instruction thread processing by a single pipeline processor |
US7802074B2 (en) | 1992-03-31 | 2010-09-21 | Sanjiv Garg | Superscalar RISC instruction scheduling |
US7523296B2 (en) | 1992-05-01 | 2009-04-21 | Seiko Epson Corporation | System and method for handling exceptions and branch mispredictions in a superscalar microprocessor |
US7516305B2 (en) | 1992-05-01 | 2009-04-07 | Seiko Epson Corporation | System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor |
US7934078B2 (en) | 1992-05-01 | 2011-04-26 | Seiko Epson Corporation | System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor |
US7958337B2 (en) | 1992-05-01 | 2011-06-07 | Seiko Epson Corporation | System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor |
US5432724A (en) * | 1992-12-04 | 1995-07-11 | U.S. Philips Corporation | Processor for uniform operations on respective series of successive data in respective parallel data streams |
US7558945B2 (en) | 1992-12-31 | 2009-07-07 | Seiko Epson Corporation | System and method for register renaming |
US7979678B2 (en) | 1992-12-31 | 2011-07-12 | Seiko Epson Corporation | System and method for register renaming |
US8074052B2 (en) | 1992-12-31 | 2011-12-06 | Seiko Epson Corporation | System and method for assigning tags to control instruction processing in a superscalar processor |
US5590368A (en) * | 1993-03-31 | 1996-12-31 | Intel Corporation | Method and apparatus for dynamically expanding the pipeline of a microprocessor |
US5617561A (en) * | 1994-12-22 | 1997-04-01 | International Business Machines Corporation | Message sequence number control in a virtual time system |
US5752062A (en) * | 1995-10-02 | 1998-05-12 | International Business Machines Corporation | Method and system for performance monitoring through monitoring an order of processor events during execution in a processing system |
US5797019A (en) * | 1995-10-02 | 1998-08-18 | International Business Machines Corporation | Method and system for performance monitoring time lengths of disabled interrupts in a processing system |
US5949971A (en) * | 1995-10-02 | 1999-09-07 | International Business Machines Corporation | Method and system for performance monitoring through identification of frequency and length of time of execution of serialization instructions in a processing system |
US5751945A (en) * | 1995-10-02 | 1998-05-12 | International Business Machines Corporation | Method and system for performance monitoring stalls to identify pipeline bottlenecks and stalls in a processing system |
US5691920A (en) * | 1995-10-02 | 1997-11-25 | International Business Machines Corporation | Method and system for performance monitoring of dispatch unit efficiency in a processing system |
US5729726A (en) * | 1995-10-02 | 1998-03-17 | International Business Machines Corporation | Method and system for performance monitoring efficiency of branch unit operation in a processing system |
US5748855A (en) * | 1995-10-02 | 1998-05-05 | Iinternational Business Machines Corporation | Method and system for performance monitoring of misaligned memory accesses in a processing system |
US5870577A (en) * | 1996-11-27 | 1999-02-09 | International Business Machines, Corp. | System and method for dispatching two instructions to the same execution unit in a single cycle |
US5963723A (en) * | 1997-03-26 | 1999-10-05 | International Business Machines Corporation | System for pairing dependent instructions having non-contiguous addresses during dispatch |
US6609189B1 (en) * | 1998-03-12 | 2003-08-19 | Yale University | Cycle segmented prefix circuits |
US20030208674A1 (en) * | 1998-03-18 | 2003-11-06 | Sih Gilbert C. | Digital signal processor with variable length instruction set |
US20070186079A1 (en) * | 1998-03-18 | 2007-08-09 | Qualcomm Incorporated | Digital signal processor with variable length instruction set |
US7502911B2 (en) | 1998-03-18 | 2009-03-10 | Qualcomm Incorporated | Variable length instruction fetching that retrieves second instruction in dependence upon first instruction length |
US6289439B1 (en) * | 1999-01-08 | 2001-09-11 | Rise Technology, Inc. | Method, device and microprocessor for performing an XOR clear without executing an XOR instruction |
US20030159021A1 (en) * | 1999-09-03 | 2003-08-21 | Darren Kerr | Selected register decode values for pipeline stage register addressing |
US7139899B2 (en) * | 1999-09-03 | 2006-11-21 | Cisco Technology, Inc. | Selected register decode values for pipeline stage register addressing |
US6598153B1 (en) * | 1999-12-10 | 2003-07-22 | International Business Machines Corporation | Processor and method that accelerate evaluation of pairs of condition-setting and branch instructions |
US7320065B2 (en) | 2001-04-26 | 2008-01-15 | Eleven Engineering Incorporated | Multithread embedded processor with input/output capability |
US7380111B2 (en) | 2002-01-02 | 2008-05-27 | Intel Corporation | Out-of-order processing with predicate prediction and validation with correct RMW partial write new predicate register values |
US20040243791A1 (en) * | 2002-01-02 | 2004-12-02 | Grochowski Edward T. | Processing partial register writes in an out-of-order processor |
US20030126414A1 (en) * | 2002-01-02 | 2003-07-03 | Grochowski Edward T. | Processing partial register writes in an out-of order processor |
US8095780B2 (en) * | 2002-04-18 | 2012-01-10 | Nytell Software LLC | Register systems and methods for a multi-issue processor |
US20050132170A1 (en) * | 2002-04-18 | 2005-06-16 | Koninklijke Philips Electroncis N.C. | Multi-issue processor |
US7730077B2 (en) | 2002-09-18 | 2010-06-01 | Netezza Corporation | Intelligent storage device controller |
US20100257537A1 (en) * | 2002-09-18 | 2010-10-07 | Netezza Corporation | Field Oriented Pipeline Architecture For A Programmable Data Streaming Processor |
US7577667B2 (en) | 2002-09-18 | 2009-08-18 | Netezza Corporation | Programmable streaming data processor for database appliance having multiple processing unit groups |
US8880551B2 (en) | 2002-09-18 | 2014-11-04 | Ibm International Group B.V. | Field oriented pipeline architecture for a programmable data streaming processor |
US7634477B2 (en) | 2002-09-18 | 2009-12-15 | Netezza Corporation | Asymmetric data streaming architecture having autonomous and asynchronous job processing unit |
US7529752B2 (en) | 2002-09-18 | 2009-05-05 | Netezza Corporation | Asymmetric streaming record data processor method and apparatus |
US7698338B2 (en) | 2002-09-18 | 2010-04-13 | Netezza Corporation | Field oriented pipeline architecture for a programmable data streaming processor |
US20040205110A1 (en) * | 2002-09-18 | 2004-10-14 | Netezza Corporation | Asymmetric data streaming architecture having autonomous and asynchronous job processing unit |
WO2004027647A1 (en) * | 2002-09-18 | 2004-04-01 | Netezza Corporation | Field oriented pipeline architecture for a programmable data streaming processor |
US20040148420A1 (en) * | 2002-09-18 | 2004-07-29 | Netezza Corporation | Programmable streaming data processor for database appliance having multiple processing unit groups |
US20040133565A1 (en) * | 2002-09-18 | 2004-07-08 | Netezza Corporation | Intelligent storage device controller |
US20040117037A1 (en) * | 2002-09-18 | 2004-06-17 | Netezza Corporation | Asymmetric streaming record data processor method and apparatus |
US20040128483A1 (en) * | 2002-12-31 | 2004-07-01 | Intel Corporation | Fuser renamer apparatus, systems, and methods |
US20040128475A1 (en) * | 2002-12-31 | 2004-07-01 | Gad Sheaffer | Widely accessible processor register file and method for use |
US7290121B2 (en) * | 2003-06-12 | 2007-10-30 | Advanced Micro Devices, Inc. | Method and data processor with reduced stalling due to operand dependencies |
US20040255099A1 (en) * | 2003-06-12 | 2004-12-16 | Kromer Stephen Charles | Method and data processor with reduced stalling due to operand dependencies |
DE112004001040B4 (en) * | 2003-06-12 | 2012-05-31 | Advanced Micro Devices, Inc. | Method and data processor with reduced operation interruption due to operand dependencies |
US7873815B2 (en) * | 2004-03-04 | 2011-01-18 | Qualcomm Incorporated | Digital signal processors with configurable dual-MAC and dual-ALU |
US20050198472A1 (en) * | 2004-03-04 | 2005-09-08 | Sih Gilbert C. | Digital signal processors with configurable dual-MAC and dual-ALU |
US7949860B2 (en) * | 2005-11-25 | 2011-05-24 | Panasonic Corporation | Multi thread processor having dynamic reconfiguration logic circuit |
US20090307470A1 (en) * | 2005-11-25 | 2009-12-10 | Masaki Maeda | Multi thread processor having dynamic reconfiguration logic circuit |
US20090204796A1 (en) * | 2008-02-08 | 2009-08-13 | International Business Machines Corporation | Method, system and computer program product for verifying address generation, interlocks and bypasses |
US8165864B2 (en) * | 2008-02-08 | 2012-04-24 | International Business Machines Corporation | Method, system and computer program product for verifying address generation, interlocks and bypasses |
Also Published As
Publication number | Publication date |
---|---|
DE3481233D1 (en) | 1990-03-08 |
EP0118830A3 (en) | 1986-10-08 |
EP0118830A2 (en) | 1984-09-19 |
JPS59173850A (en) | 1984-10-02 |
JPS6217253B2 (en) | 1987-04-16 |
EP0118830B1 (en) | 1990-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4594655A (en) | (k)-Instructions-at-a-time pipelined processor for parallel execution of inherently sequential instructions | |
EP0378830B1 (en) | Method and apparatus for handling multiple condition codes as for a parallel pipeline computer | |
US5261113A (en) | Apparatus and method for single operand register array for vector and scalar data processing operations | |
US5872987A (en) | Massively parallel computer including auxiliary vector processor | |
US5640524A (en) | Method and apparatus for chaining vector instructions | |
US4740893A (en) | Method for reducing the time for switching between programs | |
EP0377991B1 (en) | Data processing systems | |
Kuehn et al. | The Horizon supercomputing system: architecture and software | |
US5203002A (en) | System with a multiport memory and N processing units for concurrently/individually executing 2N-multi-instruction-words at first/second transitions of a single clock cycle | |
EP1124181B1 (en) | Data processing apparatus | |
KR0133238B1 (en) | Computer processing system and instruction execution method | |
EP0619557A2 (en) | A data processing system and method thereof | |
JPH03138759A (en) | Signal processor | |
US4683547A (en) | Special accumulate instruction for multiple floating point arithmetic units which use a putaway bus to enhance performance | |
JPS635775B2 (en) | ||
CA2366830A1 (en) | Register file indexing methods and apparatus for providing indirect control of register addressing in a vliw processor | |
US5544337A (en) | Vector processor having registers for control by vector resisters | |
Padegs et al. | The IBM System/370 vector architecture: Design considerations | |
US5590351A (en) | Superscalar execution unit for sequential instruction pointer updates and segment limit checks | |
US5623650A (en) | Method of processing a sequence of conditional vector IF statements | |
US4853890A (en) | Vector processor | |
US5274777A (en) | Digital data processor executing a conditional instruction within a single machine cycle | |
US5812845A (en) | Method for generating an object code for a pipeline computer process to reduce swapping instruction set | |
US5093775A (en) | Microcode control system for digital data processing system | |
JP3737573B2 (en) | VLIW processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION ARMONK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST.;ASSIGNORS:HAO, HSIEH T.;LING, HUEI;SACHAR, HOWARD E.;AND OTHERS;REEL/FRAME:004108/0436;SIGNING DATES FROM |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |