CN118152013B - Assembly line module interaction circuit, system on chip and electronic equipment - Google Patents
Assembly line module interaction circuit, system on chip and electronic equipment Download PDFInfo
- Publication number
- CN118152013B CN118152013B CN202410585054.2A CN202410585054A CN118152013B CN 118152013 B CN118152013 B CN 118152013B CN 202410585054 A CN202410585054 A CN 202410585054A CN 118152013 B CN118152013 B CN 118152013B
- Authority
- CN
- China
- Prior art keywords
- selector
- instruction receiving
- switching unit
- pipeline module
- read pointer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Advance Control (AREA)
Abstract
The application provides a pipeline module interaction circuit, a system on chip and electronic equipment, wherein a first pipeline module comprises a cache queue and a channel switching unit, and a second pipeline module comprises an instruction receiving computing device, a function processing device, an M-selection K selector and K first registers; the nth input end of the passage switching unit is connected to the nth subunit; the mth output end of the channel switching unit is connected with the mth input end of the M selection K selector, and the kth output end of the M selection K selector is connected with the first end of the kth first register; the second ends of the K first registers are respectively connected with an instruction receiving computing device, and the instruction receiving computing device is also connected with a channel switching unit and a functional processing device, so that annular signal loops between two pipeline modules with a longer physical distance (with larger line delay) can be placed on different clock cycles to solve the problem of timing sequence convergence.
Description
Technical Field
The application relates to the field of chip design, in particular to a pipeline module interaction circuit, a system-on-chip and electronic equipment.
Background
In recent years, even in the field of high-end consumer electronics, and in the field of servers, more powerful processor cores have been the core competitiveness of large chip companies. High performance processor cores need to be characterized by superscalar. In high performance processor cores, however, the higher operating frequency and the wider superscalar width (meaning more complex control logic) are always mutually constrained, and timing convergence of logic between two complex modules that are physically relatively far apart is particularly difficult and presents a problem of concern to those skilled in the art.
Disclosure of Invention
The present application aims to provide a pipeline module interaction circuit, a system-on-chip and an electronic device, so as to at least partially improve the above problems.
In order to achieve the above object, the technical scheme adopted by the embodiment of the application is as follows:
In a first aspect, an embodiment of the present application provides a pipeline module interaction circuit, where the pipeline module interaction circuit includes a first pipeline module and a second pipeline module, where the first pipeline module includes a cache queue and a path switching unit, and the second pipeline module includes an instruction receiving computing device, a function processing device, an M-select K selector, and K first registers, where M is greater than or equal to 2×k, and K is greater than or equal to 1;
The access switching unit comprises N input ends and M output ends, the cache queue comprises N subunits, and the nth input end of the access switching unit is connected with the nth subunit;
the mth output end of the channel switching unit is connected with the mth input end of the M-ary K selector, and the kth output end of the M-ary K selector is connected with the first end of the kth first register;
the second ends of the K first registers are respectively connected with the instruction receiving computing device, and the instruction receiving computing device is also connected with the access switching unit and the function processing device.
In a second aspect, an embodiment of the present application provides a system on a chip, where the system on a chip includes the pipeline module interaction circuit described above.
In a third aspect, an embodiment of the present application provides an electronic device, including the above-mentioned system-on-chip.
Compared with the prior art, the pipeline module interaction circuit, the system on chip and the electronic equipment provided by the embodiment of the application comprise a first pipeline module and a second pipeline module, wherein the first pipeline module comprises a cache queue and a channel switching unit, and the second pipeline module comprises an instruction receiving computing device, a function processing device, an M selecting K selector and K first registers, M is more than or equal to 2 XK, and K is more than or equal to 1; the access switching unit comprises N input ends and M output ends, the cache queue comprises N subunits, and the nth input end of the access switching unit is connected with the nth subunit; the mth output end of the channel switching unit is connected with the mth input end of the M selection K selector, and the kth output end of the M selection K selector is connected with the first end of the kth first register; the second ends of the K first registers are respectively connected with an instruction receiving computing device, and the instruction receiving computing device is also connected with a path switching unit and a function processing device. The loop signal loop between two pipeline modules that are physically far apart (have a large line delay) can be placed on different clock cycles to solve the timing closure problem.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a processor core according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a pipeline module interaction circuit according to an embodiment of the present application;
FIG. 3 is a second schematic diagram of an interaction circuit of pipeline modules according to an embodiment of the present application;
FIG. 4 is a third schematic diagram of an interaction circuit of a pipeline module according to an embodiment of the present application;
fig. 5 is a schematic diagram of a pipeline module interaction circuit according to an embodiment of the present application.
In the figure: 10-a first pipeline module; 20-a second pipeline module; 110-a cache queue; a 120-path switching unit; 121-N selecting M selector; 122-read pointer calculation means; 123-a second register; 201-M selecting a K selector; 202-a first register; 203-instruction receiving computing means; 204-function processing means.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the description of the present application, it should be noted that, directions or positional relationships indicated by terms such as "upper", "lower", "inner", "outer", etc., are directions or positional relationships based on those shown in the drawings, or those conventionally put in use in the application, are merely for convenience of description and simplification of the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application.
In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed", "connected" and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating a composition structure of a processor core according to an embodiment of the application. As shown in fig. 1, the processor core includes: branch prediction module, get and instruct module, decode module, rename module, distribution module, transmitting module, operation module, access memory module and reorder module. Wherein, the operation module includes: the memory module comprises units such as an operation instruction, a branch instruction, a system instruction and the like, and comprises: load instructions and store instructions. Alternatively, the above modules may be organized into a plurality of pipelines according to the directions indicated by the arrows in fig. 1, and the modules in each pipeline are not limited herein.
Floating point and vector execution modules may also be included in some high performance processor cores, optionally with high performance processors operating at frequencies above 2 gHz.
It should be noted that the process of upgrading the processor core only reduces the logic gate delay and does not reduce the line delay of signal transmission between pipeline modules.
In order to solve the problem of timing sequence convergence between pipeline modules with a relatively long physical distance in a high-performance processor core, the embodiment of the application provides a pipeline module interaction circuit to solve the problem that the interaction logic timing sequence between two complex pipeline modules with a relatively long physical distance is difficult to converge in the high-performance processor core. A physical distance that is far may mean that the trace length between two pipeline modules is greater than a first preset length. Referring to fig. 2, fig. 2 is a schematic diagram of a pipeline module interaction circuit according to an embodiment of the application.
As shown in FIG. 2, the pipeline module interaction circuit comprises a first pipeline module 10 and a second pipeline module 20, wherein the first pipeline module 10 comprises a cache queue 110 and a channel switching unit 120, and the second pipeline module 20 comprises an instruction receiving computing device 203, a function processing device 204, a M-select K selector 201 and K first registers 202, M is larger than or equal to 2 XK, and K is larger than or equal to 1. The M-select K selector 201 may be a 4-select 2 selector as shown in the figure, or may be a 6-select 3 selector, an 8-select 4 selector, or the like, which are not shown in the figure. Cache queue 110 is a circular queue and may be, but is not limited to, an instruction cache queue.
The path switching unit 120 includes N input ends and M output ends, the buffer queue 110 includes N subunits, and an nth input end of the path switching unit 120 is connected to an nth subunit, where N is greater than or equal to M.
In an alternative embodiment, each subunit may be configured to cache the message information or instruction information acquired (received) by the first pipeline module 10. The first pipeline module 10 may manage the various sub-units in the cache queue 110 according to a first-in-first-out rule.
It should be noted that, the first pipeline module 10 may cache the acquired (received) message information or instruction information in N subunits of the cache queue 110, so as to implement a function of data flow decoupling between the first pipeline module 10 and the second pipeline module 20.
The mth output terminal of the path switching unit 120 is connected to the mth input terminal of the M-ary K selector 201, and the kth output terminal of the M-ary K selector 201 is connected to the first terminal of the kth first register 202, where 1 is equal to or less than M, and 1 is equal to or less than K.
The second ends of the K first registers 202 are respectively connected to the instruction receiving computing device 203, and the instruction receiving computing device 203 is also connected to the path switching unit 120 and the function processing device 204.
For the operation of the pipeline module interaction circuit, an alternative implementation manner is also provided in the embodiment of the present application, please refer to the following.
The path switching unit 120 is configured to sequentially turn on M subunits in the cache queue 110 and M input terminals of the M select K selector 201.
As shown in fig. 2, when M is 4, consecutive M subunits may be A1, A2, A3, and A4, A2, A3, A4, and A5, A3, A4, A5, and A6, A5, A6, A7, and A8, or may also be A7, A8, A1, and A2. The M successive subcells are sequentially (sequentially) connected in one-to-one correspondence with the M inputs (B1, B2, B3, and B4) of the M-select K selector 201.
The M-ary K selector 201 is configured to switch a conductive relationship between M input terminals and K output terminals of the M-ary K selector 201 to sequentially conduct K consecutive sub-units with K first registers 202.
With continued reference to fig. 2, the K inputs and the K outputs in the M-select K selector 201 are sequentially connected and turned on in a one-to-one correspondence (in sequence), for example, B1-C1 is turned on, and B2-C2 is turned on; B2-C1 on, B3-C2 on, and B3-C1 on, B4-C2 on to sequentially turn on consecutive K subunits with K first registers 202.
It should be noted that, through the path switching unit 120 and the M-select K selector 201, K consecutive subunits in the cache queue 110 may be sequentially turned on with K first registers 202, so that the K first registers 202 may cache data in the K consecutive subunits.
Referring to fig. 2, taking n= 8,K =2 as an example, the subunits include A1, A2, A3, A4, A5, A6, A7, and A8, the first register 202 includes C1 and C2, sequentially switching on K consecutive subunits in the cache queue 110 and K first registers 202, which may be A1-C1 switching on and A2-C2 switching on; A2-C1 is conducted, and A3-C2 is conducted; A5-C1 conduction, A6-C2 conduction, A8-C1 conduction, A1-C2 conduction and the like.
K first registers 202 are used to cache K sequential subunits.
It should be noted that, the K first registers 202 are respectively turned on with the K consecutive subunits in the ith clock period, so as to buffer the data in the K consecutive subunits, and the data buffered in the ith clock period by the K first registers 202 is read by the instruction receiving computing device 203 in the (i+1) th clock period. It should be noted that the sub-units cached in the K first registers 202 may be changed in different clock cycles.
The instruction receiving computing device 203 is configured to read the cache data from the K first registers 202, and transmit the read cache data to the function processing device 204.
After the i+1th clock cycle is entered, the instruction receiving computing device 203 reads the data buffered in the K first registers 202 in the i clock cycle, and the instruction receiving computing device 203 and the K first registers 202 are in the same pipeline module (the second pipeline module 20), the reading operation is relatively shorter than the reading operation of the change in the first pipeline module 10 transferred to the second pipeline module 20, so that the instruction receiving computing device 203 can complete the reading operation of the data buffered in the i clock cycle first. After the completion of the read operation of the instruction receiving computing device 203, the K first registers 202 will buffer the data in K consecutive sub-units corresponding to the (i+1) th clock cycle.
It should be noted that, the instruction receiving computing device 203 may also input the read cache data to the function processing device 204 to complete the corresponding function instruction, for example, decoding or transmitting.
The function processing device 204 is configured to perform corresponding function processing on the buffered data.
The pipeline module interaction circuit provided by the embodiment of the application can place the annular signal loop between two pipeline modules with a longer physical distance (larger line delay) on different clock cycles so as to solve the problem of timing sequence convergence.
In an alternative embodiment, the instruction receiving calculating device 203 is configured to feed back, to the path switching unit 120, the instruction receiving number corresponding to the i-th clock cycle in the i+1th clock cycle.
As described above, the K first registers 202 are respectively turned on with K consecutive sub-units in the ith clock period, and buffer the data in the K consecutive sub-units, that is, K groups of buffered data are all stored. The instruction receiving computing device 203 may read the buffered data of the K first registers 202 in the i+1th clock cycle, and the instruction receiving computing device 203 may not read all the first registers 202 due to the possible presence of the backpressure signal. The number of the first registers 202 read by the instruction receiving computing device 203 in the (i+1) th clock cycle is the number of instruction receiving corresponding to the (i) th clock cycle, and the number of instruction receiving corresponding to the (i) th clock cycle may be any number from 0 to K.
Alternatively, the backpressure signal may be determined based on the amount of available resources of the various modules on the pipeline.
The path switching unit 120 is configured to determine a read pointer target position based on the read pointer history position of the buffer queue 110 and the instruction receiving number corresponding to the i clock period in the i+2th clock period, and sequentially turn on M subunits starting from the read pointer target position with M output ends of the path switching unit 120, respectively.
The path switching unit 120 receives the number of instruction receptions corresponding to the i-th clock cycle fed back by the instruction reception calculating device 203 in the i+1-th clock cycle. The path switching unit 120 determines a read pointer target position in the (i+2) -th clock cycle based on the read pointer history position of the buffer queue 110 and the instruction receiving number corresponding to the (i) -th clock cycle, and sequentially turns on M subunits starting from the read pointer target position with M output terminals of the path switching unit 120, respectively.
The history position of the read pointer refers to a position of the buffer queue 110 corresponding to the first subunit of the M subunits of the M inputs of the M select K selector 201, which are turned on in the (i+1) th clock cycle. For example, the consecutive M subunits are A1, A2, A3, and A4, and the read pointer history position is A1 corresponding position. The target position of the read pointer is the position corresponding to the backward moving instruction of the history position of the read pointer after receiving the number.
In the scheme of the application, the problem of timing convergence is solved by placing the annular signal loop between two pipeline modules with larger physical distance (with larger line delay) on different clock cycles.
The instruction receiving computing device 203 is further configured to feed back, to the M-ary K selector 201, the instruction receiving number corresponding to the i-th clock cycle in the i+1th clock cycle.
The M-ary K selector 201 is configured to switch a conduction relationship between M input ends and K output ends of the M-ary K selector 201 in the (i+1) th clock cycle based on the instruction receiving number adjustment corresponding to the (i+1) th clock cycle, so as to change the conduction relationship between the K first registers 202 and the subunits, where the K first registers 202 may cache new K subunits in the (i+1) th clock cycle.
In the scheme of the application, i is more than or equal to 1.
Based on the foregoing, the embodiment of the present application also provides an alternative implementation for how the M-ary K selector 201 switches the internal conduction relationship, please refer to the following.
The M-ary K selector 201 is configured to connect the h+1th input terminal to the h+k input terminal of the M-ary K selector 201 with the K output terminals of the M-ary K selector 201 in a one-to-one correspondence in sequence in the i+1th clock cycle.
Wherein h represents the instruction receiving number corresponding to the ith clock cycle.
Referring to fig. 3, fig. 4 and fig. 5, fig. 3 is a second schematic diagram of a pipeline module interaction circuit provided in an embodiment of the present application, fig. 4 is a third schematic diagram of a pipeline module interaction circuit provided in an embodiment of the present application, and fig. 5 is a fourth schematic diagram of a pipeline module interaction circuit provided in an embodiment of the present application.
In the (i+1) -th clock period, the (h+k) -th input end of the M-selection K selector 201 is switched to be connected with the kth output end, K is more than or equal to 1 and less than or equal to K, and the (h+1) -th input end of the M-selection K selector 201 to the (h+k) -th input end of the M-selection K selector 201 are sequentially and correspondingly connected with the K output ends of the M-selection K selector 201 one by one, so that the (h+k) -th input end of the M-selection K selector 201 is conducted with the kth first register 202. As shown in fig. 3, 4, and 5, the internal conduction relationship of the M-select K selector 201 is switched in the i+1 th cycle as the instruction reception number corresponding to the i-th clock cycle changes.
In the scheme of the present application, the path switching unit 120 starts to switch the internal switching device at the 3 rd clock cycle at the earliest, so that the path switching unit 120 is configured to sequentially switch the M subunits from the read pointer target position to the M output terminals of the path switching unit 120 with the start position of the first subunit in the buffer queue 110 as the read pointer target position in the 1 st clock cycle and the 2 nd clock cycle.
In the 1 st clock cycle, the instruction receiving computing device 203 does not receive data, and the number of instruction received by the instruction receiving computing device 203 is 0.
The M-ary K selector 201 is configured to connect the 1 st input terminal to the K-th input terminal of the M-ary K selector 201 and the K output terminals of the M-ary K selector 201 in a one-to-one correspondence in sequence in the 1 st clock cycle, that is, the M-ary K selector 201 switches the K-th input terminal to connect with the K-th output terminal, where K is 1-1, so that the K-th input terminal of the M-ary K selector 201 is conducted with the K-th first register 202.
With continued reference to fig. 2 to 5, in an alternative embodiment, the path switching unit 120 includes an N-select M selector 121, a read pointer calculation device 122, and a second register 123.
The nth input terminal of the N-ary M selector 121 is connected to the nth subunit, and the mth output terminal of the N-ary M selector 121 is connected to the mth input terminal of the M-ary K selector 201, where N is 1-N, M is 1-M, and M is N.
The input end of the read pointer computing device 122 is connected to the output end of the second register 123, the output end of the read pointer computing device 122 is connected to the control end of the N-select M selector 121, and the instruction receiving computing device 203 is also connected to the input end of the second register 123.
The instruction receiving computing device 203 is configured to write, in the (i+1) th clock cycle, the instruction receiving number corresponding to the (i) th clock cycle into the second register 123.
The read pointer calculating means 122 is configured to read the instruction receiving number corresponding to the ith clock cycle from the second register 123 in the (i+2) th clock cycle, determine the target position of the read pointer based on the history position of the read pointer of the buffer queue 110 and the instruction receiving number corresponding to the ith clock cycle, and transmit the target position of the read pointer to the N-select M selector 121.
The M-select selector 121 is configured to sequentially turn on M subunits from the read pointer target position to M output terminals of the channel switching unit 120, respectively.
It should be noted that, after the second register 123 registers the instruction receiving number corresponding to the ith clock cycle transmitted from the instruction receiving computing device 203, the second register is used as the input of the read pointer computing device 122 in the (i+2) th clock cycle, so as to break the critical timing path, and the loop signal loop between two pipeline modules with relatively long physical distances (with relatively large line delays) is placed on different clock cycles, so as to solve the timing convergence problem.
Optionally, the first pipeline module 10 is a value module, the cache queue 110 is an instruction cache queue, the read pointer calculating device 122 is an instruction cache queue read pointer calculating device, the second pipeline module 20 is a decoding module, and the function processing device 204 is a decoding device.
Or the first pipeline module 10 is a reorder module, the second pipeline module 20 is a transmitting module, the read pointer computing device 122 is a microcode buffer queue submitting pointer computing device, the second pipeline module 20 is a transmitting module, and the function processing device 204 is a microcode speculative wake-up module.
The embodiment of the application also provides a system on a chip, which comprises the pipeline module interactive circuit.
The embodiment of the application also provides electronic equipment, which comprises the system-on-chip.
In summary, the pipeline module interaction circuit, the system on chip and the electronic device provided by the embodiments of the present application include a first pipeline module and a second pipeline module, where the first pipeline module includes a cache queue and a path switching unit, and the second pipeline module includes an instruction receiving computing device, a function processing device, an M-select-K selector and K first registers, where M is greater than or equal to 2×k, and K is greater than or equal to 1; the access switching unit comprises N input ends and M output ends, the cache queue comprises N subunits, and the nth input end of the access switching unit is connected with the nth subunit; the mth output end of the channel switching unit is connected with the mth input end of the M selection K selector, and the kth output end of the M selection K selector is connected with the first end of the kth first register; the second ends of the K first registers are respectively connected with an instruction receiving computing device, and the instruction receiving computing device is also connected with a path switching unit and a function processing device. The loop signal loop between two pipeline modules that are physically far apart (have a large line delay) can be placed on different clock cycles to solve the timing closure problem.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Claims (10)
1. The pipeline module interaction circuit is characterized by comprising a first pipeline module and a second pipeline module, wherein the first pipeline module comprises a cache queue and a passage switching unit, and the second pipeline module comprises an instruction receiving computing device, a function processing device, an M selecting K selector and K first registers, wherein M is more than or equal to 2 XK, and K is more than or equal to 1;
The access switching unit comprises N input ends and M output ends, the cache queue comprises N subunits, and the nth input end of the access switching unit is connected with the nth subunit;
the mth output end of the channel switching unit is connected with the mth input end of the M-ary K selector, and the kth output end of the M-ary K selector is connected with the first end of the kth first register;
the second ends of the K first registers are respectively connected with the instruction receiving computing device, and the instruction receiving computing device is also connected with the access switching unit and the function processing device.
2. The pipeline module interaction circuit of claim 1,
The access switching unit is used for sequentially conducting M continuous subunits in the cache queue and M input ends of the M-selection K selector;
The M-ary K selector is used for switching the conduction relation between M input ends and K output ends of the M-ary K selector so as to conduct the continuous K subunits and the K first registers in sequence;
The K first registers are used for caching the K continuous subunits;
The instruction receiving and calculating device is used for reading cache data from K first registers and transmitting the read cache data to the function processing device;
The function processing device is used for carrying out corresponding function processing on the cache data.
3. The pipeline module interaction circuit of claim 2,
The instruction receiving and calculating device is used for feeding back the instruction receiving number corresponding to the ith clock period to the channel switching unit in the (i+1) th clock period;
The path switching unit is used for determining a read pointer target position based on a read pointer history position of the cache queue and the instruction receiving number corresponding to the ith clock period in the (i+2) th clock period, and sequentially conducting M subunits starting from the read pointer target position with M output ends of the path switching unit respectively;
The instruction receiving and calculating device is further used for feeding back the instruction receiving number corresponding to the ith clock period to the M-ary K selector in the (i+1) th clock period;
the M-ary K selector is used for adjusting the number of command receiving corresponding to the (i) th clock period in the (i+1) th clock period and switching the conduction relation between M input ends and K output ends of the M-ary K selector;
Wherein i is not less than 1.
4. The pipeline module interaction circuit of claim 3,
The M-ary K selector is used for connecting the h+1th input end to the h+Kth input end of the M-ary K selector with the K output ends of the M-ary K selector in a one-to-one correspondence manner in sequence in the (i+1th) clock period;
wherein h represents the instruction receiving number corresponding to the ith clock cycle.
5. The pipeline module interaction circuit of claim 3, wherein the path switching unit is configured to switch between a 1 st clock cycle and a2 nd clock cycle, taking the initial position of the first subunit in the cache queue as the target position of the read pointer, and respectively and sequentially conducting M subunits starting from the target position of the read pointer with M output ends of the access switching unit;
The M selects K selector is used for in 1 st clock cycle, with M selects K selector 1 st input to K input and M selects K selector's K output in proper order one-to-one correspondence to be connected.
6. The pipeline module interaction circuit of claim 3, wherein the path switching unit comprises an M-select-N selector, a read pointer computation device, and a second register;
The nth input end of the N-selection M selector is connected with the nth subunit, and the mth output end of the N-selection M selector is connected with the mth input end of the M-selection K selector;
the input end of the read pointer computing device is connected with the output end of the second register, the output end of the read pointer computing device is connected with the control end of the N-selection M selector, and the instruction receiving computing device is also connected with the input end of the second register.
7. The pipeline module interaction circuit of claim 6,
The instruction receiving and calculating device is used for writing the instruction receiving number corresponding to the ith clock period into the second register in the (i+1) th clock period;
The read pointer calculating device is used for reading the instruction receiving number corresponding to the ith clock period from the second register in the (i+2) th clock period, determining a read pointer target position based on the read pointer history position of the cache queue and the instruction receiving number corresponding to the ith clock period, and transmitting the read pointer target position to the N-selection M selector;
the N-selection M selector is used for respectively and sequentially conducting M subunits from the target position of the read pointer with M output ends of the access switching unit.
8. The pipeline module interaction circuit of claim 1, wherein the first pipeline module is a value module and the second pipeline module is a decode module;
or the first pipeline module is a reordering module, and the second pipeline module is a transmitting module.
9. A system on a chip, characterized in that the system on a chip comprises the pipeline module interaction circuit of any of claims 1-8.
10. An electronic device comprising the system-on-chip of claim 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410585054.2A CN118152013B (en) | 2024-05-13 | 2024-05-13 | Assembly line module interaction circuit, system on chip and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410585054.2A CN118152013B (en) | 2024-05-13 | 2024-05-13 | Assembly line module interaction circuit, system on chip and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118152013A CN118152013A (en) | 2024-06-07 |
CN118152013B true CN118152013B (en) | 2024-08-02 |
Family
ID=91299278
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410585054.2A Active CN118152013B (en) | 2024-05-13 | 2024-05-13 | Assembly line module interaction circuit, system on chip and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118152013B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111766505A (en) * | 2020-06-30 | 2020-10-13 | 山东云海国创云计算装备产业创新中心有限公司 | Scanning test device for integrated circuit |
CN115579036A (en) * | 2022-10-12 | 2023-01-06 | 成都维德青云电子有限公司 | DDR (double data Rate) continuous storage circuit based on FPGA (field programmable Gate array) and implementation method thereof |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9354884B2 (en) * | 2013-03-13 | 2016-05-31 | International Business Machines Corporation | Processor with hybrid pipeline capable of operating in out-of-order and in-order modes |
CN116991480A (en) * | 2022-08-30 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Instruction processing method, device, circuit, transmitter, chip, medium and product |
-
2024
- 2024-05-13 CN CN202410585054.2A patent/CN118152013B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111766505A (en) * | 2020-06-30 | 2020-10-13 | 山东云海国创云计算装备产业创新中心有限公司 | Scanning test device for integrated circuit |
CN115579036A (en) * | 2022-10-12 | 2023-01-06 | 成都维德青云电子有限公司 | DDR (double data Rate) continuous storage circuit based on FPGA (field programmable Gate array) and implementation method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN118152013A (en) | 2024-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10042641B2 (en) | Method and apparatus for asynchronous processor with auxiliary asynchronous vector processor | |
EP2289003B1 (en) | Method & apparatus for real-time data processing | |
CN107066408B (en) | Method, system and apparatus for digital signal processing | |
CN112230992B (en) | Instruction processing device, processor and processing method thereof comprising branch prediction loop | |
CN116048627B (en) | Instruction buffering method, apparatus, processor, electronic device and readable storage medium | |
US6018796A (en) | Data processing having a variable number of pipeline stages | |
US5689694A (en) | Data processing apparatus providing bus attribute information for system debugging | |
CN116661703B (en) | Memory access circuit and memory access method, integrated circuit and electronic device | |
CN118508933A (en) | Time sequence adjusting circuit, delay path determining method and terminal equipment | |
CN116521096B (en) | Memory access circuit and memory access method, integrated circuit and electronic device | |
CN118152013B (en) | Assembly line module interaction circuit, system on chip and electronic equipment | |
EP4202661A1 (en) | Device, method, and system to facilitate improved bandwidth of a branch prediction unit | |
CN113703715B (en) | Regular expression matching method and device, FPGA and medium | |
US6226706B1 (en) | Rotation bus interface coupling processor buses to memory buses for interprocessor communication via exclusive memory access | |
JP2013545211A (en) | Architecture and method for eliminating storage buffers in a DSP / processor with multiple memory accesses | |
CN119149111A (en) | RISC-V CPU architecture supporting integrated memory and calculation buffer | |
CN117713799B (en) | Pipeline back-pressure logic circuit and electronic equipment | |
CN116149733A (en) | Instruction branch prediction system, method, device, computer equipment and storage medium | |
CN107077381B (en) | Asynchronous instruction execution apparatus and method | |
KR20230121629A (en) | Data pipeline circuit and related method supporting increased data transmission interface frequency with reduced power consumption | |
CN114924792A (en) | Instruction decoding unit, instruction execution unit, and related devices and methods | |
Brackenbury | An instruction buffer for a low-power DSP | |
JP2006285724A (en) | Information processing apparatus and information processing method | |
CN117667222B (en) | Two-stage branch prediction system, method and related equipment with optimized time sequence | |
US11327763B2 (en) | Opportunistic consumer instruction steering based on producer instruction value prediction in a multi-cluster processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |