[go: up one dir, main page]

CN118643001B - Memory data transmission bandwidth improving method based on parallel data channel architecture - Google Patents

Memory data transmission bandwidth improving method based on parallel data channel architecture Download PDF

Info

Publication number
CN118643001B
CN118643001B CN202411118561.1A CN202411118561A CN118643001B CN 118643001 B CN118643001 B CN 118643001B CN 202411118561 A CN202411118561 A CN 202411118561A CN 118643001 B CN118643001 B CN 118643001B
Authority
CN
China
Prior art keywords
module
signal
memory
data
mux
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411118561.1A
Other languages
Chinese (zh)
Other versions
CN118643001A (en
Inventor
毕津慈
陈月峰
李有山
王佳乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongyin Microelectronics Nanjing Co ltd
Original Assignee
Zhongyin Microelectronics Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongyin Microelectronics Nanjing Co ltd filed Critical Zhongyin Microelectronics Nanjing Co ltd
Priority to CN202411118561.1A priority Critical patent/CN118643001B/en
Publication of CN118643001A publication Critical patent/CN118643001A/en
Application granted granted Critical
Publication of CN118643001B publication Critical patent/CN118643001B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4234Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a memory bus
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a memory data transmission bandwidth improving method based on a parallel data channel architecture, which comprises a bus master module, a combinational logic module, an FSM module, a FIFO module, a MUX module, a crossbar_queue module and a standard memory subsystem module. And simultaneously, the standard memory subsystem module sends out data of a write response channel and a read data channel through the bus protocol, and the data are transmitted to the bus master module after being spliced and processed by the modules, so that the improvement of the transmission bandwidth of the data of the storage structure is realized. The invention has simple structure, easy realization, no need of modifying the memory subsystem, high reusability by adopting a standardized interface, and suitability for flexible quantity of memory subsystems so as to improve the data transmission bandwidth of the memory structure.

Description

Memory data transmission bandwidth improving method based on parallel data channel architecture
Technical Field
The invention relates to the technical field of storage structures, in particular to a data transmission flow control design in the process of data transmission of the storage structures, and more particularly relates to a memory data transmission bandwidth improving method based on a parallel data channel architecture.
Background
With the continuous progress of technology, the requirements of modern computing demands on high-speed data access are continuously improved, and in order to meet the high requirements of memory bandwidth and performance, the rapid development of data storage technology is continuously pushing the performance improvement of computer systems. However, in face of the pressure on higher bandwidth demands, data storage technologies need to continually break through their own limitations and find new solutions.
Firstly, increasing the data transmission bandwidth of the memory structure by increasing the clock frequency is a common method, the development of the memory technology is also frequency increasing once, with the progress of the technology, the highest supporting frequency of the memory structure has reached 8533Mbps, and the method has the advantage that the bandwidth can be increased without changing the data bus and the channel, however, with the increase of the frequency, the requirements of the memory structure controller and the memory module on voltage and current also increase, which may cause the problems of power consumption and heat dissipation. High frequency operation requires more power supply and heat dissipation solutions, and thus system designers need to trade off between balance performance and power consumption. And secondly, increasing the number of channels is another method for improving the transmission bandwidth of the storage structure, the channels are physical connection channels for connecting the storage structure controller and the memory module, more parallel data transmission can be realized by increasing the number of channels, and the data transmission speed and the bandwidth are obviously improved. However, this approach requires changing the design of the memory fabric controller and the memory module, requires additional physical connections and more complex circuit designs, and at the same time, requires coordination between multiple channels while meeting the bus transfer protocol, thereby increasing the complexity of the design and hardware costs. Another method for improving the data transmission bandwidth of the memory structure is to expand the data bus width, the memory structure memory technology transmits data in parallel, the width of the data bus determines the data amount transmitted at one time, and the data transmission speed and the bandwidth can be greatly improved by increasing the data bus width, however, this method also needs more complex circuit design of the memory structure controller. While a wider data bus requires more pins and more complex wiring, which also increases the design and physical layout difficulties of the memory fabric controller and memory modules. In addition, the transmission bandwidth of the storage structure can be improved by optimizing the design and algorithm of the storage structure controller and improving a memory access mechanism. For example, improving the command scheduling and scheduling algorithm of the storage structure controller can improve the efficiency and concurrency of data access, thereby improving the overall bandwidth of the storage structure, and by improving the memory access mechanism, the caching algorithm and the prefetching algorithm, the memory access delay can be reduced, and the efficiency and bandwidth of data transmission of the storage structure can be further improved.
In summary, the above methods for improving the data transmission bandwidth of the storage structure need to be modified and optimized based on the existing storage technology of the storage structure, and involve the combination of hardware design and software algorithm of the system. In engineering applications, a great deal of manpower and material resources are consumed to implement the embodiments, and design, verification and test are performed. However, with continued advances in technology and increased computing demands, a continual increase in storage structure data transfer bandwidth is necessary to meet the demands of computer systems for telling data access.
Disclosure of Invention
The invention aims to provide a memory data transmission bandwidth improving method based on a parallel data channel architecture, which can improve the data transmission bandwidth of a memory structure without modifying the existing memory structure design.
The technical scheme for realizing the aim of the invention is that the memory data transmission bandwidth improving method based on the parallel data channel architecture comprises the following steps:
The bus master module sends out data of a write address channel, a write data channel and a read address channel through a bus protocol, and the data is processed by the combinational logic module, the FSM module, the FIFO module and the MUX module and then is transmitted to the standard memory subsystem module in parallel;
the standard memory subsystem module sends out data of a write response channel and a read data channel through a bus protocol, and the data is transmitted to the bus master module after being spliced and processed by the FSM module, the FIFO module, the crossbar_queue module, the MUX module and the combinational logic module;
The combination logic module comprises a combination logic_AW module, a combination logic_W module, a combination logic_B module, a combination logic_AR module and a combination logic_R module;
the FSM module comprises an FSM_AWmodule, an FSM_Wmodule, an FSM_B0 module, an FSM_B1 module, an FSM_AR module, an FSM_R0 module and an FSM_R1 module;
the FIFO module comprises a FIFO_AW module, a FIFO_W module and a FIFO_AR module;
the MUX module comprises a MUX_AW module, a MUX_W module, a MUX_B module, a MUX_AR module and a MUX_R module;
The crossbar_queue module comprises a crossbar_queue_B module and a crossbar_queue_R module;
The standard memory subsystem module comprises a memory 1 module and a memory 2 module;
The standard memory subsystem is a system module after integrating a memory structure controller and a memory module, and is provided with a standard AXI (Advanced eXtensible Interface) interface, an APB (ADVANCED PERPHERAL Bus) standard interface and clock and reset pins, wherein the Bus data bandwidth, namely DQ (Data and Quality), is configurable, specific values can be determined by referring to JEDEC (Joint Electron DEVICE ENGINEERING counter) DDR specification, 16, 32 and 64 data widths are supported, and DQs of a standard memory subsystem memory 1 module and a standard memory subsystem memory 2 module are equal.
In this scheme, the bus master module sends out address channel, data channel and address channel data through bus protocol, and the data is processed by the combinational logic module, FSM module, FIFO module and MUX module and then transmitted to the standard memory subsystem module in parallel, specifically:
in a write address channel AW CHANNEL of the bus protocol, write address channel data aw_m signals of the bus protocol are sent by a bus master module, and aw_c signals are obtained after transmission size and length information are split by a combinational logic_AW module;
when the FIFO_AW module temporarily has no stored data, and awready signals of the memory 1 module and the memory 2 module are pulled high, the aw_c signal is directly transmitted to the input end of the MUX_AW module, otherwise, the aw_c signal is stored in the FIFO_AW module;
Judging by the MUX_AW module, when the FIFO_AW module temporarily has no stored data, and awready signals of the memory 1 module and the memory 2 module are pulled high, the aw_s signal is equal to the aw_c signal, otherwise, the aw_s signal is equal to the output value aw_f signal of the FIFO_AW module, and the aw_s signal is transmitted to the memory 1 module and the memory 2 module simultaneously;
In a write data channel W channel of the bus protocol, a write data channel data w_m signal of the bus protocol is sent by a bus master module, and when data temporarily stored in the fifo_w module is not stored in the fifo_w module and wready signals of the memory 1 module and the memory 2 module are pulled high, the w_m signal can be directly transmitted to an input end of the mux_w module, otherwise the w_m signal is stored in the fifo_w module;
Judging by the MUX_W module, when the FIFO_W module temporarily has no stored data, and the wready signals of the memory 1 module and the memory 2 module are pulled high, the w_c signal is equal to the w_m signal, otherwise, the w_c signal is equal to the output value w_f of the FIFO_W module, and the w_s 0 and the w_s1 signals are obtained after the data are split by the combinational logic_W module and are respectively transmitted to the memory 1 module and the memory 2 module;
In the read address channel AR CHANNEL of the bus protocol, the data ar_m signal of the read address channel of the bus protocol is sent by the bus master module, and the ar_c signal is obtained after the transmission size and length information are split by the combinational logic_AR module;
When the information stored in the FIFO_AR module is empty and arready signals of the memory 1 module and the memory 2 module are pulled high, the ar_c signal is directly transmitted to the input end of the MUX_AR module, otherwise the ar_c signal is stored in the FIFO_AR module;
The mux_ar module determines that if there is no data temporarily stored in the fifo_ar module and the arready signals of the memory 1 module and the memory 2 module are pulled up, the ar_s signal is equal to the ar_c signal, otherwise the ar_s signal is equal to the ar_f signal of the output value of the fifo_ar, and the ar_s signal is transmitted to the memory 1 module and the memory 2 module simultaneously.
In this scheme, the standard memory subsystem module sends out write response channel and read data channel data through bus protocol, and the data are transmitted to the bus master module after being spliced by the FSM module, the FIFO module, the crossbar_queue module, the MUX module and the combinational logic module, specifically:
In a write response channel B channel of the bus protocol, a write response channel data b_s0 signal and a b_s1 signal of the bus protocol are respectively sent by a memory 1 module and a memory 2 module, the b_s0 signal passes through an fsm_b0 module, when a crossbar_queue_b module temporarily has no stored information, and a bready signal of the bus master module is pulled high, the b_s0 signal is directly transmitted to an input end of the mux_b module, otherwise, the b_s0 signal is stored in the crossbar_queue_b module;
The b_s1 signal passes through the FSM_B1 module, when the crossbar_queue_B module temporarily has no stored data, and the bready signal of the bus master is pulled high, the b_s1 signal is directly transmitted to the input end of the MUX_B module, otherwise, the b_s1 signal is stored in the crossbar_queue_B module;
When two bid_m0 and b_m1 signals in the bus protocol are the same and respectively source the memory 1 module and the memory 2 module, the MUX_B module selects two signals b_m0 and b_m1 which meet the matching condition from the b_s0, b_s1 and the output b_q signals of the crossbar_queue_B module, and then the signals are combined by the logic_B module to obtain a final b_m signal to be transmitted to the bus master module;
In a read data channel R channel of the bus protocol, a read data channel r_s0 signal and a r_s1 signal of the bus protocol are respectively sent by a memory 1 module and a memory 2 module, the r_s0 signal passes through an fsm_r0 module, when a crossbar_queue_r module temporarily has no stored information, and a rready signal of the bus master module is pulled high, the r_s0 signal is directly transmitted to an input end of the mux_r module, otherwise, the r_s0 signal is stored in the crossbar_queue_r module;
The r_s1 signal passes through the fsm_r1 module, when the crossbar_queue_r has no data stored temporarily therein, and the rready signal of the bus master module is pulled high, the r_s1 signal can be directly transmitted to the input end of the mux_r module, otherwise, the r_s1 signal is stored in the crossbar_queue_r module;
When two rid_signals in the bus protocol are the same and the signals of r_m0 and r_m1 are respectively sourced from the memory 1 module and the memory 2 module, the mux_r module selects two signals r_m0 and r_m1 meeting the matching condition from the signals of r_s0, r_s1 and the output r_q of the crossbar_queue_r module, and then the signals are transmitted to the bus master module through the combinational logic_r module to obtain the final r_m signal.
In this scheme, the w_s0 and w_s1 signals are obtained by splitting the data by the combinational logic_w module, and the width of the split data is the DQ value of the standard memory subsystem.
In this scheme, the crossbar_queue module is implemented by using a plurality of FIFO modules with ID as a matching condition.
In this scheme, the r_m0 signal and the r_m1 signal are spliced by the combinational logic_r module to obtain the r_m signal, where the width of the spliced data is the bus data bandwidth value of the standard memory subsystem.
Compared with the prior art, the invention has the remarkable advantages that:
(1) The invention adopts a simple and easy-to-implement design architecture, which not only can reduce hardware cost, but also is completely independent of the memory subsystem module. Because the memory subsystem is not required to be modified, a large amount of manpower and material resources are saved, and meanwhile, a user can complete development work in a short time, so that the data transmission bandwidth of the memory structure is improved.
(2) The invention adopts standardized interfaces, has high reusability, is not limited by the configuration of a specific memory subsystem, and can be suitable for various memory subsystems. This provides convenience for the user's quick application in existing memory subsystem designs.
(3) The invention is suitable for the flexible number of memory subsystems, and can be customized according to the user requirements so as to improve the data transmission bandwidth of the storage structure.
The invention discloses a memory data transmission bandwidth improving method based on a parallel data channel architecture, which comprises a bus master module, a combinational logic module, an FSM module, a FIFO module, a MUX module, a crossbar_queue module and a standard memory subsystem module. And simultaneously, the standard memory subsystem module sends out data of a write response channel and a read data channel through the bus protocol, and the data are transmitted to the bus master module after being spliced and processed by the modules, so that the improvement of the transmission bandwidth of the data of the storage structure is realized. The invention has simple structure, easy realization, no need of modifying the memory subsystem, high reusability by adopting a standardized interface, and suitability for flexible quantity of memory subsystems so as to improve the data transmission bandwidth of the memory structure.
Drawings
Fig. 1 is a schematic diagram of a memory data transmission bandwidth improving method based on a parallel data channel structure according to the present invention.
FIG. 2 is a diagram illustrating data splitting in a write data channel according to the present invention.
FIG. 3 is a schematic diagram of data splicing in a read data channel according to the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
Fig. 1 shows a flow chart of a memory data transmission bandwidth improving method based on a parallel data channel architecture according to the present invention.
The configuration of the memory 1 and the memory 2 shown in fig. 1 is the same.
As shown in fig. 1, a first aspect of the present invention provides a memory data transmission bandwidth improving method based on a parallel data channel architecture, including:
The bus master module sends out data of a write address channel, a write data channel and a read address channel through a bus protocol, and the data is processed by the combinational logic module, the FSM module, the FIFO module and the MUX module and then is transmitted to the standard memory subsystem module in parallel;
the standard memory subsystem module sends out data of a write response channel and a read data channel through a bus protocol, and the data is transmitted to the bus master module after being spliced and processed by the FSM module, the FIFO module, the crossbar_queue module, the MUX module and the combinational logic module;
The combination logic module comprises a combination logic_AW module, a combination logic_W module, a combination logic_B module, a combination logic_AR module and a combination logic_R module;
the FSM module comprises an FSM_AWmodule, an FSM_Wmodule, an FSM_B0 module, an FSM_B1 module, an FSM_AR module, an FSM_R0 module and an FSM_R1 module;
the FIFO module comprises a FIFO_AW module, a FIFO_W module and a FIFO_AR module;
the MUX module comprises a MUX_AW module, a MUX_W module, a MUX_B module, a MUX_AR module and a MUX_R module;
The crossbar_queue module comprises a crossbar_queue_B module and a crossbar_queue_R module;
The standard memory subsystem module comprises a memory 1 module and a memory 2 module;
The standard memory subsystem is a system module after integrating a memory structure controller and a memory module, and is provided with a standard AXI (Advanced eXtensible Interface) interface, an APB (ADVANCED PERPHERAL Bus) standard interface and clock and reset pins, wherein the Bus data bandwidth, namely DQ (Data and Quality), is configurable, specific values can be determined by referring to JEDEC (Joint Electron DEVICE ENGINEERING counter) DDR specification, 16, 32 and 64 data widths are supported, and DQs of a standard memory subsystem memory 1 module and a standard memory subsystem memory 2 module are equal.
According to the embodiment of the invention, the bus master module sends out data of a write address channel, a write data channel and a read address channel through a bus protocol, and the data is processed by the combinational logic module, the FSM module, the FIFO module and the MUX module and then is transmitted to the standard memory subsystem module in parallel, wherein the data is specifically as follows:
in a write address channel AW CHANNEL of the bus protocol, write address channel data aw_m signals of the bus protocol are sent by a bus master module, and aw_c signals are obtained after transmission size and length information are split by a combinational logic_AW module;
when the FIFO_AW module temporarily has no stored data, and awready signals of the memory 1 module and the memory 2 module are pulled high, the aw_c signal is directly transmitted to the input end of the MUX_AW module, otherwise, the aw_c signal is stored in the FIFO_AW module;
Judging by the MUX_AW module, when the FIFO_AW module temporarily has no stored data, and awready signals of the memory 1 module and the memory 2 module are pulled high, the aw_s signal is equal to the aw_c signal, otherwise, the aw_s signal is equal to the output value aw_f signal of the FIFO_AW module, and the aw_s signal is transmitted to the memory 1 module and the memory 2 module simultaneously;
In a write data channel W channel of the bus protocol, a write data channel data w_m signal of the bus protocol is sent by a bus master module, and when data temporarily stored in the fifo_w module is not stored in the fifo_w module and wready signals of the memory 1 module and the memory 2 module are pulled high, the w_m signal can be directly transmitted to an input end of the mux_w module, otherwise the w_m signal is stored in the fifo_w module;
Judging by the MUX_W module, when the FIFO_W module temporarily has no stored data, and the wready signals of the memory 1 module and the memory 2 module are pulled high, the w_c signal is equal to the w_m signal, otherwise, the w_c signal is equal to the output value w_f of the FIFO_W module, and the w_s 0 and the w_s1 signals are obtained after the data are split by the combinational logic_W module and are respectively transmitted to the memory 1 module and the memory 2 module;
In the read address channel AR CHANNEL of the bus protocol, the data ar_m signal of the read address channel of the bus protocol is sent by the bus master module, and the ar_c signal is obtained after the transmission size and length information are split by the combinational logic_AR module;
When the information stored in the FIFO_AR module is empty and arready signals of the memory 1 module and the memory 2 module are pulled high, the ar_c signal is directly transmitted to the input end of the MUX_AR module, otherwise the ar_c signal is stored in the FIFO_AR module;
The mux_ar module determines that if there is no data temporarily stored in the fifo_ar module and the arready signals of the memory 1 module and the memory 2 module are pulled up, the ar_s signal is equal to the ar_c signal, otherwise the ar_s signal is equal to the ar_f signal of the output value of the fifo_ar, and the ar_s signal is transmitted to the memory 1 module and the memory 2 module simultaneously.
According to the embodiment of the invention, the standard memory subsystem module sends out data of a write response channel and a read data channel through a bus protocol, and the data is transmitted to the bus master module after being spliced and processed by the FSM module, the FIFO module, the crossbar_queue module, the MUX module and the combinational logic module, specifically:
In a write response channel B channel of the bus protocol, a write response channel data b_s0 signal and a b_s1 signal of the bus protocol are respectively sent by a memory 1 module and a memory 2 module, the b_s0 signal passes through an fsm_b0 module, when a crossbar_queue_b module temporarily has no stored information, and a bready signal of the bus master module is pulled high, the b_s0 signal is directly transmitted to an input end of the mux_b module, otherwise, the b_s0 signal is stored in the crossbar_queue_b module;
The b_s1 signal passes through the FSM_B1 module, when the crossbar_queue_B module temporarily has no stored data, and the bready signal of the bus master is pulled high, the b_s1 signal is directly transmitted to the input end of the MUX_B module, otherwise, the b_s1 signal is stored in the crossbar_queue_B module;
When two bid_m0 and b_m1 signals in the bus protocol are the same and respectively source the memory 1 module and the memory 2 module, the MUX_B module selects two signals b_m0 and b_m1 which meet the matching condition from the b_s0, b_s1 and the output b_q signals of the crossbar_queue_B module, and then the signals are combined by the logic_B module to obtain a final b_m signal to be transmitted to the bus master module;
In a read data channel R channel of the bus protocol, a read data channel r_s0 signal and a r_s1 signal of the bus protocol are respectively sent by a memory 1 module and a memory 2 module, the r_s0 signal passes through an fsm_r0 module, when a crossbar_queue_r module temporarily has no stored information, and a rready signal of the bus master module is pulled high, the r_s0 signal is directly transmitted to an input end of the mux_r module, otherwise, the r_s0 signal is stored in the crossbar_queue_r module;
The r_s1 signal passes through the fsm_r1 module, when the crossbar_queue_r has no data stored temporarily therein, and the rready signal of the bus master module is pulled high, the r_s1 signal can be directly transmitted to the input end of the mux_r module, otherwise, the r_s1 signal is stored in the crossbar_queue_r module;
When two rid_signals in the bus protocol are the same and the signals of r_m0 and r_m1 are respectively sourced from the memory 1 module and the memory 2 module, the mux_r module selects two signals r_m0 and r_m1 meeting the matching condition from the signals of r_s0, r_s1 and the output r_q of the crossbar_queue_r module, and then the signals are transmitted to the bus master module through the combinational logic_r module to obtain the final r_m signal.
According to the embodiment of the invention, the w_s0 and w_s1 signals are obtained by splitting the data by the combinational logic_w module, and the split data width is the DQ value of the standard memory subsystem.
According to the embodiment of the invention, the crossbar_queue module is realized by using a plurality of FIFO modules by taking the ID as a matching condition.
According to the embodiment of the invention, the r_m0 signal and the r_m1 signal are spliced by the combinational logic_r module to obtain the r_m signal, and the width of the spliced data is the bus data bandwidth value of the standard memory subsystem.
It should be noted that the storage structure is a memory, the storage structure includes a random access memory such as DRAM, SRAM, PSRAM, HBM, flash and a memory device, and the bus protocol includes a buffer transmission protocol such as AMBA, OCP, CHI.
Compared with the prior art, the invention adopts a simple and easy-to-implement design architecture, not only can reduce the hardware cost, but also is completely independent of the memory subsystem module. Because the memory subsystem is not required to be modified, a large amount of manpower and material resources are saved, and a user can complete development work in a short time, so that the data transmission bandwidth of the memory structure is improved; in addition, the invention adopts standardized interfaces, has high reusability, is not limited by specific memory subsystem configuration, and can be suitable for various memory subsystems. The invention is suitable for flexible quantity of memory subsystems, and can be customized according to the user requirement so as to improve the data transmission bandwidth of the memory structure.
The invention discloses a memory data transmission bandwidth improving method based on a parallel data channel architecture, which comprises a bus master module, a combinational logic module, an FSM module, a FIFO module, a MUX module, a crossbar_queue module and a standard memory subsystem module. The bus master sends out data of a write address channel, a write data channel and a read address channel through a bus protocol, and the data are processed by each module and then transmitted to the memory subsystem module in parallel, and meanwhile, the memory subsystem module sends out data of a write response channel and a read data channel through the bus protocol, and the data are transmitted to the master module after being spliced by each module, so that the improvement of the transmission bandwidth of the data of the storage structure is realized. The invention has simple structure, easy realization, no need of modifying the memory subsystem, high reusability by adopting the standardized interface, and convenience for users to rapidly apply the existing memory subsystem design, and is applicable to flexible number of memory subsystems to improve the data transmission bandwidth of the memory structure.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be additional divisions of actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described as separate components may or may not be physically separate, and components displayed as units may or may not be physical units, may be located in one place or distributed on a plurality of network units, and may select some or all of the units according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of hardware plus a form of software functional unit.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1.一种基于并行数据通道架构的存储器数据传输带宽提升方法,其特征在于,包括以下步骤:1. A method for improving memory data transmission bandwidth based on a parallel data channel architecture, characterized in that it comprises the following steps: 总线master模块通过总线协议发出写地址通道、写数据通道、读地址通道数据,经组合逻辑模块、FSM模块、FIFO模块、MUX模块处理后并行传输到标准存储器子系统模块;The bus master module sends out the data of the write address channel, write data channel and read address channel through the bus protocol, and transmits them to the standard memory subsystem module in parallel after being processed by the combinational logic module, FSM module, FIFO module and MUX module; 标准存储器子系统模块通过总线协议发出写响应通道和读数据通道数据,经FSM模块、FIFO模块、crossbar_queue模块、MUX模块、组合逻辑模块拼接处理后传输到总线master模块;The standard memory subsystem module sends out write response channel and read data channel data through the bus protocol, which are spliced and processed by the FSM module, FIFO module, crossbar_queue module, MUX module, and combinational logic module and then transmitted to the bus master module; 所述组合逻辑模块包括组合逻辑_AW模块、组合逻辑_W模块、组合逻辑_B模块、组合逻辑_AR模块、组合逻辑_R模块;The combinational logic module includes a combinational logic_AW module, a combinational logic_W module, a combinational logic_B module, a combinational logic_AR module, and a combinational logic_R module; 所述FSM模块包括FSM_AW模块、FSM_W模块、FSM_B0模块、FSM_B1模块、FSM_AR模块、FSM_R0模块、FSM_R1模块;The FSM module includes an FSM_AW module, an FSM_W module, an FSM_B0 module, an FSM_B1 module, an FSM_AR module, an FSM_R0 module, and an FSM_R1 module; 所述FIFO模块包括FIFO_AW模块、FIFO_W模块、FIFO_AR模块;The FIFO module includes a FIFO_AW module, a FIFO_W module, and a FIFO_AR module; 所述MUX模块包括MUX_AW模块、MUX_W模块、MUX_B模块、MUX_AR模块、MUX_R模块;The MUX module includes a MUX_AW module, a MUX_W module, a MUX_B module, a MUX_AR module, and a MUX_R module; 所述crossbar_queue模块包括crossbar_queue_B模块、crossbar_queue_R模块;The crossbar_queue module includes a crossbar_queue_B module and a crossbar_queue_R module; 所述标准存储器子系统模块包括存储器1模块、存储器2模块;The standard memory subsystem module includes a memory 1 module and a memory 2 module; 所述总线master模块通过总线协议发出写地址通道、写数据通道、读地址通道数据,经组合逻辑模块、FSM模块、FIFO模块、MUX模块处理后并行传输到标准存储器子系统模块,具体为:The bus master module sends out write address channel, write data channel and read address channel data through the bus protocol, and transmits them to the standard memory subsystem module in parallel after being processed by the combinational logic module, FSM module, FIFO module and MUX module, specifically: 在总线协议的写地址通道AW channel中,总线协议的写地址通道数据aw*_m信号由总线master模块发出,经过组合逻辑_AW模块拆分传输尺寸、长度信息后得到aw*_c信号;In the write address channel AW channel of the bus protocol, the write address channel data aw*_m signal of the bus protocol is sent by the bus master module, and the aw*_c signal is obtained after the transmission size and length information are split by the combinational logic _AW module; 当所述FIFO_AW模块内暂时没有存储的数据,且存储器1模块及存储器2模块的awready信号都拉高时,aw*_c信号直接传输到MUX_AW模块的输入端,否则aw*_c信号将存入FIFO_AW模块中;When there is no data stored in the FIFO_AW module temporarily, and the awready signals of the memory 1 module and the memory 2 module are both pulled high, the aw*_c signal is directly transmitted to the input end of the MUX_AW module, otherwise the aw*_c signal will be stored in the FIFO_AW module; 经MUX_AW模块判断,当 FIFO_AW模块内暂时没有存储的数据,且存储器1模块及存储器2模块的awready信号都拉高时,aw*_s信号等于aw*_c信号,否则aw*_s信号等于FIFO_AW模块的输出值aw*_f信号,aw*_s信号同时传输给存储器1模块及存储器2模块;According to the judgment of the MUX_AW module, when there is no data stored in the FIFO_AW module temporarily, and the awready signals of the memory 1 module and the memory 2 module are both pulled high, the aw*_s signal is equal to the aw*_c signal, otherwise the aw*_s signal is equal to the output value aw*_f signal of the FIFO_AW module, and the aw*_s signal is transmitted to the memory 1 module and the memory 2 module at the same time; 在总线协议的写数据通道W channel中,总线协议的写数据通道数据w*_m信号由总线master模块发出,经过FSM_W模块,当FIFO_W模块内暂时没有存储的数据,且存储器1模块及存储器2模块的wready信号都拉高时,w*_m信号直接传输到MUX_W模块的输入端,否则w*_m信号将存入FIFO_W模块中;In the write data channel W channel of the bus protocol, the write data channel data w*_m signal of the bus protocol is sent by the bus master module and passes through the FSM_W module. When there is no data stored in the FIFO_W module temporarily and the wready signals of the memory 1 module and the memory 2 module are both pulled high, the w*_m signal is directly transmitted to the input end of the MUX_W module, otherwise the w*_m signal will be stored in the FIFO_W module; 经MUX_W模块判断,当FIFO_W模块内暂时没有存储的数据,且存储器1模块及存储器2模块的wready信号都拉高时,w*_c信号等于w*_m信号,否则w*_c信号等于FIFO_W模块的输出值w*_f,w*_c信号经由组合逻辑_W模块拆分数据后得到w*_s0和w*_s1信号,分别传输给存储器1模块及存储器2模块;According to the judgment of the MUX_W module, when there is no data stored in the FIFO_W module temporarily, and the wready signals of the memory 1 module and the memory 2 module are both pulled high, the w*_c signal is equal to the w*_m signal, otherwise the w*_c signal is equal to the output value w*_f of the FIFO_W module, and the w*_c signal is split by the combinational logic_W module to obtain the w*_s0 and w*_s1 signals, which are transmitted to the memory 1 module and the memory 2 module respectively; 在总线协议的读地址通道AR channel中,总线协议的读地址通道数据ar*_m信号由总线master模块发出,经过组合逻辑_AR模块拆分传输尺寸、长度信息后得到ar*_c信号;In the read address channel AR channel of the bus protocol, the read address channel data ar*_m signal of the bus protocol is sent by the bus master module, and the ar*_c signal is obtained after the transmission size and length information are split by the combinational logic _AR module; 当FIFO_AR模块内存储信息为空,且存储器1模块及存储器2模块的arready信号都已拉高时,ar*_c信号直接传输到MUX_AR模块的输入端,否则ar*_c信号将存入FIFO_AR模块中;When the storage information in the FIFO_AR module is empty and the arready signals of the memory 1 module and the memory 2 module are both pulled high, the ar*_c signal is directly transmitted to the input end of the MUX_AR module, otherwise the ar*_c signal will be stored in the FIFO_AR module; 经MUX_AR模块判断,如果FIFO_AR模块内暂时没有存储的数据,且存储器1模块及存储器2模块的arready信号都已拉高时,ar*_s信号等于ar*_c信号,否则ar*_s信号等于FIFO_AR的输出值ar*_f信号,ar*_s信号同时传输给存储器1模块及存储器2模块;After judgment by the MUX_AR module, if there is no data stored in the FIFO_AR module temporarily, and the arready signals of the memory 1 module and the memory 2 module are both pulled high, the ar*_s signal is equal to the ar*_c signal, otherwise the ar*_s signal is equal to the output value ar*_f signal of the FIFO_AR, and the ar*_s signal is transmitted to the memory 1 module and the memory 2 module at the same time; 所述标准存储器子系统模块通过总线协议发出写响应通道和读数据通道数据,经FSM模块、FIFO模块、crossbar_queue模块、MUX模块、组合逻辑模块拼接处理后传输到总线master模块,具体为:The standard memory subsystem module sends out write response channel and read data channel data through the bus protocol, and transmits them to the bus master module after splicing and processing by the FSM module, FIFO module, crossbar_queue module, MUX module, and combinational logic module, specifically: 在总线协议的写响应通道B channel中,总线协议的写响应通道数据b*_s0信号和b*_s1信号分别由存储器1模块及存储器2模块发出,b*_s0信号经过FSM_B0模块,当crossbar_queue_B模块内暂时没有存储的信息,且总线master模块的bready信号拉高时,b*_s0信号直接传输到MUX_B模块的输入端,否则b*_s0信号将存入crossbar_queue_B模块中;In the write response channel B channel of the bus protocol, the write response channel data b*_s0 signal and b*_s1 signal of the bus protocol are respectively sent by the memory 1 module and the memory 2 module. The b*_s0 signal passes through the FSM_B0 module. When there is no stored information in the crossbar_queue_B module temporarily and the bready signal of the bus master module is pulled high, the b*_s0 signal is directly transmitted to the input end of the MUX_B module, otherwise the b*_s0 signal will be stored in the crossbar_queue_B module. b*_s1信号经过FSM_B1模块,当crossbar_queue_B模块内暂时没有存储的数据,且总线master的bready信号拉高时,b*_s1信号直接传输到MUX_B模块的输入端,否则b*_s1信号将存入crossbar_queue_B模块中;The b*_s1 signal passes through the FSM_B1 module. When there is no data stored in the crossbar_queue_B module and the bready signal of the bus master is pulled high, the b*_s1 signal is directly transmitted to the input end of the MUX_B module, otherwise the b*_s1 signal will be stored in the crossbar_queue_B module; 当总线协议中两个bid_*信号相同,并且b*_m0和b*_m1的信号分别来源存储器1模块及存储器2模块时,MUX_B模块在b*_s0、b*_s1、以及crossbar_queue_B模块的输出 b*_q信号中选取两个符合匹配条件的信号b*_m0和b*_m1,再经组合逻辑_B模块,得到最终b*_ m信号传输给总线master模块;When the two bid_* signals in the bus protocol are the same, and the signals b*_m0 and b*_m1 come from the memory 1 module and the memory 2 module respectively, the MUX_B module selects two signals b*_m0 and b*_m1 that meet the matching conditions from the b*_s0, b*_s1, and the output b*_q signals of the crossbar_queue_B module, and then passes through the combinational logic_B module to obtain the final b*_m signal and transmit it to the bus master module; 在总线协议的读数据通道R channel中,总线协议的读数据通道数据r*_s0信号和r*_s1信号分别由存储器1模块及存储器2模块发出,r*_s0信号经过FSM_R0模块,当crossbar_queue_R模块内暂时没有存储的信息,且总线master模块的rready信号拉高时,r*_s0信号直接传输到MUX_R模块的输入端,否则r*_s0信号将存入crossbar_queue_R模块中;In the read data channel R channel of the bus protocol, the read data channel data r*_s0 signal and r*_s1 signal of the bus protocol are respectively sent by the memory 1 module and the memory 2 module, and the r*_s0 signal passes through the FSM_R0 module. When there is no information stored in the crossbar_queue_R module temporarily and the rready signal of the bus master module is pulled high, the r*_s0 signal is directly transmitted to the input end of the MUX_R module, otherwise the r*_s0 signal will be stored in the crossbar_queue_R module; r*_s1信号经过FSM_R1模块,当crossbar_queue_R内暂时没有存储的数据,且总线master模块的rready信号拉高时,r*_s1信号直接传输到MUX_R模块的输入端,否则r*_s1信号将存入crossbar_queue_R模块中;The r*_s1 signal passes through the FSM_R1 module. When there is no data stored in the crossbar_queue_R module and the rready signal of the bus master module is pulled high, the r*_s1 signal is directly transmitted to the input end of the MUX_R module, otherwise the r*_s1 signal will be stored in the crossbar_queue_R module; 当总线协议中两个rid_*信号相同,并且r*_m0和r*_m1的信号分别来源存储器1模块及存储器2模块时,MUX_R模块在r*_s0、r*_s1、以及crossbar_queue_R模块的输出r*_q信号中选取两个符合匹配条件的信号r*_m0和r*_m1,再经组合逻辑_R模块,得到最终r*_ m信号传输给总线 master模块。When the two rid_* signals in the bus protocol are the same, and the signals of r*_m0 and r*_m1 come from the memory 1 module and the memory 2 module respectively, the MUX_R module selects two signals r*_m0 and r*_m1 that meet the matching conditions from the output r*_q signals of r*_s0, r*_s1, and the crossbar_queue_R module, and then obtains the final r*_m signal through the combinational logic_R module and transmits it to the bus master module. 2.根据权利要求1所述的基于并行数据通道架构的存储器数据传输带宽提升方法,其特征在于,所述标准存储器子系统模块为存储结构控制器和内存模块集成后系统模块,其具有标准AXI接口,APB标准接口以及时钟、复位引脚。2. According to the method for improving the memory data transmission bandwidth based on the parallel data channel architecture of claim 1, it is characterized in that the standard memory subsystem module is a system module after the storage structure controller and the memory module are integrated, which has a standard AXI interface, an APB standard interface and clock and reset pins. 3.根据权利要求1所述的基于并行数据通道架构的存储器数据传输带宽提升方法,其特征在于,所述w*_c信号经由组合逻辑_W模块拆分数据后得到w*_s0和w*_s1信号,其拆分数据宽度大小为标准存储器子系统DQ值。3. The method for improving memory data transmission bandwidth based on a parallel data channel architecture according to claim 1 is characterized in that the w*_c signal is split into w*_s0 and w*_s1 signals through a combinational logic _W module, and the split data width is a standard memory subsystem DQ value. 4.根据权利要求1所述的基于并行数据通道架构的存储器数据传输带宽提升方法,其特征在于,所述crossbar_queue模块是以ID作为匹配条件,使用多个FIFO模块实现的模块。4. The method for improving memory data transmission bandwidth based on a parallel data channel architecture according to claim 1, wherein the crossbar_queue module is a module implemented using multiple FIFO modules with ID as a matching condition. 5.根据权利要求1所述的基于并行数据通道架构的存储器数据传输带宽提升方法,其特征在于,所述r*_m0信号和r*_m1信号经由组合逻辑_R模块拼接数据后得到r*_m信号,其拼接数据宽度大小为标准存储器子系统的总线数据带宽值。5. The method for improving memory data transmission bandwidth based on a parallel data channel architecture according to claim 1 is characterized in that the r*_m0 signal and the r*_m1 signal are spliced together via a combinational logic _R module to obtain an r*_m signal, and the spliced data width is the bus data bandwidth value of a standard memory subsystem. 6.根据权利要求1所述的基于并行数据通道架构的存储器数据传输带宽提升方法,其特征在于,所述标准存储器子系统模块数量是可扩展的。6 . The method for improving memory data transmission bandwidth based on a parallel data channel architecture according to claim 1 , wherein the number of standard memory subsystem modules is expandable.
CN202411118561.1A 2024-08-15 2024-08-15 Memory data transmission bandwidth improving method based on parallel data channel architecture Active CN118643001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411118561.1A CN118643001B (en) 2024-08-15 2024-08-15 Memory data transmission bandwidth improving method based on parallel data channel architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411118561.1A CN118643001B (en) 2024-08-15 2024-08-15 Memory data transmission bandwidth improving method based on parallel data channel architecture

Publications (2)

Publication Number Publication Date
CN118643001A CN118643001A (en) 2024-09-13
CN118643001B true CN118643001B (en) 2024-11-29

Family

ID=92663472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411118561.1A Active CN118643001B (en) 2024-08-15 2024-08-15 Memory data transmission bandwidth improving method based on parallel data channel architecture

Country Status (1)

Country Link
CN (1) CN118643001B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005546A (en) * 2015-06-23 2015-10-28 中国兵器工业集团第二一四研究所苏州研发中心 Asynchronous AXI bus structure with built-in cross point queue
CN110704351A (en) * 2019-09-24 2020-01-17 山东华芯半导体有限公司 Host equipment data transmission expansion method based on AXI bus

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050182863A1 (en) * 2004-02-18 2005-08-18 Arm Limited, Direct memory access control
KR100597468B1 (en) * 2005-02-03 2006-07-05 삼성전자주식회사 Data interface system and data interface method in transmission and reception mode
CN112214945B (en) * 2020-10-13 2023-11-14 安徽芯纪元科技有限公司 AXI bus isolation protection structure and protection method thereof
CN114153775B (en) * 2021-12-10 2024-02-09 中国兵器工业集团第二一四研究所苏州研发中心 FlexRay controller based on AXI bus
CN117130957B (en) * 2023-08-31 2024-08-02 成都奥瑞科电子科技有限公司 Multichannel high-speed cache system and device based on signal processing
CN118133733A (en) * 2024-01-23 2024-06-04 华中科技大学 Access processing method and device for high-bandwidth memory on FPGA

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005546A (en) * 2015-06-23 2015-10-28 中国兵器工业集团第二一四研究所苏州研发中心 Asynchronous AXI bus structure with built-in cross point queue
CN110704351A (en) * 2019-09-24 2020-01-17 山东华芯半导体有限公司 Host equipment data transmission expansion method based on AXI bus

Also Published As

Publication number Publication date
CN118643001A (en) 2024-09-13

Similar Documents

Publication Publication Date Title
CN102232215B (en) Many serial interface stacked-die memory architectures
TWI409815B (en) Memory system and method for controlling receipt of read data timing
US10339072B2 (en) Read delivery for memory subsystem with narrow bandwidth repeater channel
EP1738267B1 (en) System and method for organizing data transfers with memory hub memory modules
US6425044B1 (en) Apparatus for providing fast memory decode using a bank conflict table
JPH1078934A (en) Multi-size bus connection system for packet switching computer system
CN110633229A (en) DIMM for high bandwidth memory channel
US20220005521A1 (en) Programmable Memory Controller Circuits And Methods
US20170289850A1 (en) Write delivery for memory subsystem with narrow bandwidth repeater channel
JP5007337B2 (en) Control of power consumption in data processing equipment.
US20240021239A1 (en) Hardware Acceleration System for Data Processing, and Chip
EP4312104A1 (en) Memory module adapter card with multiplexer circuitry
JP3516431B2 (en) Processor bus for transporting I/O traffic
US20220229790A1 (en) Buffer communication for data buffers supporting multiple pseudo channels
CN113312304B (en) A kind of interconnection device, motherboard and server
CN110633230A (en) High bandwidth DIMM
CN118643001B (en) Memory data transmission bandwidth improving method based on parallel data channel architecture
CN100456275C (en) Split t-chain memory command and address bus topology
US20220413768A1 (en) Memory module with double data rate command and data interfaces supporting two-channel and four-channel modes
JP3469521B2 (en) System for bridging a system bus with multiple PCI buses
CN109033002A (en) A kind of multipath server system
WO2021139733A1 (en) Memory allocation method and device, and computer readable storage medium
US20120089771A1 (en) Data Processing Apparatus
CN118503191B (en) High-power storage module and communication method
US20230342035A1 (en) Method and apparatus to improve bandwidth efficiency in a dynamic random access memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant