CN103218304B - Off-chip distribution method in a kind of embedded memory data slice - Google Patents
Off-chip distribution method in a kind of embedded memory data slice Download PDFInfo
- Publication number
- CN103218304B CN103218304B CN201310114684.3A CN201310114684A CN103218304B CN 103218304 B CN103218304 B CN 103218304B CN 201310114684 A CN201310114684 A CN 201310114684A CN 103218304 B CN103218304 B CN 103218304B
- Authority
- CN
- China
- Prior art keywords
- data
- data object
- tcg
- cache
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000009826 distribution Methods 0.000 title claims abstract description 4
- 230000002123 temporal effect Effects 0.000 claims abstract description 14
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 238000003860 storage Methods 0.000 abstract description 5
- 230000009897 systematic effect Effects 0.000 abstract 1
- 238000005303 weighing Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 7
- 238000005457 optimization Methods 0.000 description 6
- 238000013500 data storage Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Description
技术领域 technical field
本发明属于嵌入式内存技术领域。尤其涉及一种嵌入式内存数据片上片外分配方法。本发明能取得针对具体应用在具体内存配置上的性能最优,尤其适用于多媒体应用程序在便笺式存贮器/高速缓存混合内存结构上的性能优化。 The invention belongs to the technical field of embedded memory. In particular, it relates to a method for on-chip and off-chip allocation of embedded memory data. The present invention can achieve optimal performance on specific memory configuration for specific applications, and is especially suitable for performance optimization of multimedia application programs on the hybrid memory structure of scratch pad memory/cache cache.
背景技术 Background technique
由于制造工艺和电路逻辑结构的差异,处理器执行部件的速度一直高于存储器读写速度,并且随着半导体工艺技术的发展,这一速度差距造成的性能差异在逐步加大。解决处理器与外存速度失配的一个重要技术就是存储系统采用分层设计,在片上集成一个小的、但速度更快的存储器来提高系统存储访问性能。 Due to differences in manufacturing process and circuit logic structure, the speed of processor execution components has always been higher than the read and write speed of memory, and with the development of semiconductor process technology, the performance difference caused by this speed gap is gradually increasing. An important technology to solve the speed mismatch between the processor and the external memory is that the storage system adopts a layered design and integrates a small but faster memory on the chip to improve the system storage access performance.
片上内存结构作为嵌入式系统的重要部分,直接影响着系统的性能、功耗、成本等关键参数。片上内存结构有高速缓存Cache和便笺式存贮器SPM(Scratch-PadMemory)两种类型。SPM相比Cache存储每位花费更少面积和功耗,因而嵌入式系统片上内存结构采取SPM/Cache的混合结构渐渐成为一种趋势。然而,SPM的容量很小和专用性,使得如何有效使用片上内存资源成为嵌入式系统设计的关键问题。 As an important part of the embedded system, the on-chip memory structure directly affects key parameters such as system performance, power consumption, and cost. There are two types of on-chip memory structures: cache Cache and scratch pad memory SPM (Scratch-PadMemory). Compared with Cache, SPM consumes less area and power consumption per bit. Therefore, it is gradually becoming a trend to adopt a hybrid structure of SPM/Cache for the on-chip memory structure of embedded systems. However, the small capacity and specificity of SPM make how to effectively use on-chip memory resources a key issue in embedded system design.
现有的软件数据存储优化研究主要集中在如何增加Cache命令率,或者如何增加SPM访问次数,缺乏对采用Cache和SPM混合片上内存结构的数据内存访问优化研究。 Existing research on software data storage optimization mainly focuses on how to increase the command rate of Cache, or how to increase the number of SPM accesses, but there is a lack of research on data memory access optimization using the hybrid on-chip memory structure of Cache and SPM.
数据片上片外分配技术是一种嵌入式系统存储优化技术,利用该技术得到片上片外分配策略决定哪些数据通过SPM(称为片上)访问,哪些数据通过Cache(称为片外)访问。数据片上片外分配技术优化了数据在SPM和Cache之间的分配,可以取得对具体应用的性能最优,已经成为了嵌入式系统存储优化研究的热点。 Data on-chip and off-chip allocation technology is an embedded system storage optimization technology. Using this technology, the on-chip and off-chip allocation strategy determines which data is accessed through SPM (called on-chip) and which data is accessed through Cache (called off-chip). Data on-chip and off-chip allocation technology optimizes the allocation of data between SPM and Cache, and can achieve the best performance for specific applications. It has become a hot spot in the research of embedded system storage optimization.
发明内容 Contents of the invention
本发明的目的是针对现有技术的不足,提供一种嵌入式内存数据片上片外分配方法,能够实现具体应用程序在具体内存配置上的性能最优。 The object of the present invention is to provide a method for on-chip and off-chip allocation of embedded memory data in view of the deficiencies in the prior art, which can realize the optimal performance of specific application programs in specific memory configurations.
为了解决上述技术问题,本发明采用的技术方案包括如下步骤: In order to solve the problems of the technologies described above, the technical solution adopted in the present invention comprises the following steps:
步骤1.利用编译器和仿真器工具提取具体应用程序的信息; Step 1. Utilize compiler and emulator tools to extract specific application information;
步骤2.对这些信息建立TCG模型; Step 2. Build a TCG model on these information;
步骤3.提出数据分配方法将TCG值大的数据对象分配到SPM; Step 3. Propose a data allocation method to allocate data objects with large TCG values to SPM;
步骤4.提出数据布局方法将TCG值大的数据对象映射到不同的Cache组以避免冲突。 Step 4. A data layout method is proposed to map data objects with large TCG values to different Cache groups to avoid conflicts.
步骤1所述的具体应用程序的信息,包括数据对象的大小、生命周期、访问次数、时间局部性和空间局部性;所述的时间局部性是由时间关系图TRG(TemporalRelationshipGraph)来表示;空间局部性是由最大连续访问次数来表示。 The specific application information described in step 1 includes the size, life cycle, number of visits, temporal locality, and spatial locality of the data object; the temporal locality is represented by the temporal relationship graph TRG (TemporalRelationshipGraph); the spatial Locality is represented by the maximum number of consecutive accesses.
步骤2所述的TCG模型,其内容包括步骤1提取的数据对象的大小、生命周期、访问次数、时间局部性和空间局部性因素,其模型公式如下: The TCG model described in step 2 includes the size, life cycle, number of visits, temporal locality and spatial locality factors of the data object extracted in step 1, and its model formula is as follows:
TCG=(访问次数*生命周期*TRG值)/(最大连续访问次数*对象大小)。 TCG=(number of visits*lifetime*TRG value)/(maximum number of consecutive visits*object size).
步骤3所述的数据分配方法,具体包括如下步骤: The data distribution method described in step 3 specifically includes the following steps:
3-1.将全部数据对象按照TCG值降序排列,并初始化,然后分配到片外内存,作为待分配数据对象; 3-1. Arrange all data objects in descending order of TCG values, initialize them, and then allocate them to off-chip memory as data objects to be allocated;
3-2.在所有待分配数据对象中,依降序顺序选择第一个满足容量小于或等于便笺式存贮器剩余容量的数据对象,将该数据对象分配到片上便笺式存贮器; 3-2. Among all the data objects to be allocated, select the first data object whose capacity is less than or equal to the remaining capacity of the pad memory in descending order, and allocate the data object to the on-chip scratch pad memory;
3-3.重复步骤3-2,直到所有待分配数据对象容量均大于便笺式存贮器剩余容量,则结束。 3-3. Step 3-2 is repeated until the capacity of all data objects to be allocated is greater than the remaining capacity of the scratch pad, then the process ends.
步骤4所述的数据布局方法包含如下步骤: The data layout method described in step 4 includes the following steps:
4-1.计算剩余待分配数据对象中数据对象需要的高速缓存组数,计算公式如下: 4-1. Calculate the number of cache groups required by the data objects in the remaining data objects to be allocated. The calculation formula is as follows:
组数=数据对象大小/高速缓存组大小; Number of groups = data object size / cache group size;
4-2.将高速缓存当前组号分配给数据对象,并将缓存当前组号加一,同时数据对象所需组数减一; 4-2. Assign the current group number of the cache to the data object, add one to the current group number of the cache, and decrease the number of groups required by the data object by one;
4-3.重复步骤4-2,直到数据对象所需组数为零; 4-3. Repeat step 4-2 until the number of groups required for the data object is zero;
4-4.重复步骤4-1、4-2和4-3,直到剩余待分配数据对象全部分配完成。 4-4. Repeat steps 4-1, 4-2 and 4-3 until all remaining data objects to be allocated are allocated.
本发明的有益效果如下: The beneficial effects of the present invention are as follows:
本发明方法利用TCG模型对应用程序信息进行建模,综合考虑了SPM的合理利用与片外内存数据对象的合理布局,优化了SPM和Cache之间的数据分配,减少了程序消耗在数据存储访问时间和降低数据存储访问能耗,实现了具体应用程序在具体内存配置上的性能最优。 The method of the present invention utilizes the TCG model to model the application program information, comprehensively considers the reasonable utilization of the SPM and the reasonable layout of the off-chip memory data objects, optimizes the data allocation between the SPM and the Cache, and reduces program consumption in data storage access time and reduce data storage access energy consumption, and realize the optimal performance of specific applications on specific memory configurations.
附图说明 Description of drawings
图1为本发明方法的流程图; Fig. 1 is the flowchart of the inventive method;
图2为本发明方法提出的TCG模型结构图; Fig. 2 is the TCG model structural diagram that the inventive method proposes;
图3为本发明方法中的SPM/Cache数据分配方法流程图; Fig. 3 is the flow chart of the SPM/Cache data allocation method in the method of the present invention;
图4为本发明方法中的固定Cache数据布局方法流程图。 FIG. 4 is a flow chart of the fixed Cache data layout method in the method of the present invention.
具体实施方式 detailed description
下面结合具体实施方式和附图对本发明进行详细描述。 The present invention will be described in detail below in conjunction with specific embodiments and accompanying drawings.
如图1所示,本实施方式中首先利用编译器和仿真器工具提取具体应用程序的信息:①选择GCC-2.7.1-MIPS编译器的-O3优化选项静态编译应用程序得到MIPS汇编代码;②选择MIPS仿真器对片上内存进行配置,包括容量大小、访问延迟和组织方式(替换策略、写策略、写缺失策略和关联方式)等,开启性能统计工具,进行程序的数据存储访问性能的仿真。其次对这些信息建立TCG模型;然后利用SPM/Cache数据分配方法将TCG值大的数据对象分配到SPM;最后利用固定Cache数据布局方法将TCG值大的数据对象映射到不同的Cache组以避免冲突。 As shown in Figure 1, at first utilize compiler and emulator tool to extract the information of concrete application program in the present embodiment: 1. select the -O3 optimization option static compilation application program of GCC-2.7.1-MIPS compiler to obtain MIPS assembly code; ②Select the MIPS emulator to configure the on-chip memory, including capacity, access delay, and organization (replacement strategy, write strategy, write-miss strategy, and association method), etc., and open the performance statistics tool to simulate the data storage access performance of the program . Secondly, establish a TCG model for these information; then use the SPM/Cache data allocation method to allocate data objects with large TCG values to SPM; finally use the fixed Cache data layout method to map data objects with large TCG values to different Cache groups to avoid conflicts .
所述的具体应用程序的信息,包括数据对象的大小、生命周期、访问次数、时间局部性和空间局部性;所述的时间局部性是由时间关系图TRG(TemporalRelationshipGraph)来表示;空间局部性是由最大连续访问次数来表示。 The specific application information includes the size, life cycle, access times, temporal locality and spatial locality of the data object; the temporal locality is represented by the temporal relationship graph TRG (TemporalRelationshipGraph); spatial locality It is represented by the maximum number of consecutive visits.
如图2所示,所述的TCG模型,其内容包括步骤1提取的数据对象的大小、生命周期、访问次数、时间局部性和空间局部性因素,其模型公式如下: As shown in Figure 2, the described TCG model includes the size, life cycle, number of visits, temporal locality and spatial locality factors of the data object extracted in step 1, and its model formula is as follows:
TCG=(访问次数*生命周期*TRG值)/(最大连续访问次数*对象大小)。 TCG=(number of visits*lifetime*TRG value)/(maximum number of consecutive visits*object size).
其中,时间关系图TRG(TemporalRelationshipGraph)和TRG值可参看N.Gloy,T.Blockwell,M.D.Zorn论文题目Procedureplacementusingtemporalorderinginformation后参照文本。 Among them, the temporal relationship graph TRG (TemporalRelationshipGraph) and TRG values can be found in the reference text after the title of N.Gloy, T.Blockwell, M.D.Zorn paper title Procedure placement using temporal ordering information.
如图3所示,本实施方式中的SPM/Cache数据分配的目的是将最容易发生冲突的数据对象分配到SPM中,包括如下步骤: As shown in Figure 3, the purpose of the SPM/Cache data allocation in this embodiment is to allocate the most conflict-prone data objects to the SPM, including the following steps:
Step1、将全部数据对象按照TCG值降序排列,并初始化,然后分配到片外内存,作为待分配数据对象; Step1. Arrange all data objects in descending order of TCG values, initialize them, and then allocate them to off-chip memory as data objects to be allocated;
Step2、在所有待分配数据对象中,依降序顺序选择第一个满足容量小于或等于便笺式存贮器剩余容量的数据对象,将该数据对象分配到片上便笺式存贮器SPM; Step2, among all data objects to be distributed, select the first data object satisfying capacity less than or equal to the remaining capacity of the pad memory in descending order, and distribute the data object to the on-chip scratch pad memory SPM;
Step3、重复步骤Step2,直到所有待分配数据对象容量均大于便笺式存贮器剩余容量,则结束。 Step 3. Repeat Step 2 until the capacity of all data objects to be allocated is greater than the remaining capacity of the scratch pad, then end.
如图4所示,本实施方式中的固定Cache数据布局具有两个目标:①减少Cache缺失的次数;②减少片外内存空间(即数据布局完后减少片外内存中的洞),包括如下步骤: As shown in Figure 4, the fixed Cache data layout in this embodiment has two goals: ① reduce the number of Cache misses; ② reduce the off-chip memory space (that is, reduce the holes in the off-chip memory after the data layout is completed), including the following step:
Step4、计算剩余的i个待分配数据对象中每个数据对象需要的高速缓存组数j,计算公式如下: Step4. Calculate the number j of cache groups required for each data object in the remaining i data objects to be allocated. The calculation formula is as follows:
组数j=数据对象大小/高速缓存组大小; Number of groups j = data object size / cache group size;
Step5、将高速缓存当前组号setNO分配给数据对象,并将缓存当前组号setNO加一,同时数据对象所需组数j减一; Step5. Assign the current group number setNO of the cache to the data object, and add one to the current group number setNO of the cache, and decrease the number of groups j required by the data object by one;
Step6、重复步骤4-5,直到数据对象所需组数j为零; Step6, repeat steps 4-5 until the required number of groups j of the data object is zero;
Step7、重复步骤4-4、4-5和4-6,直到剩余的i个待分配数据对象全部分配完成。 Step7. Repeat steps 4-4, 4-5 and 4-6 until the remaining i data objects to be allocated are all allocated.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310114684.3A CN103218304B (en) | 2013-04-03 | 2013-04-03 | Off-chip distribution method in a kind of embedded memory data slice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310114684.3A CN103218304B (en) | 2013-04-03 | 2013-04-03 | Off-chip distribution method in a kind of embedded memory data slice |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103218304A CN103218304A (en) | 2013-07-24 |
CN103218304B true CN103218304B (en) | 2016-07-20 |
Family
ID=48816120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310114684.3A Expired - Fee Related CN103218304B (en) | 2013-04-03 | 2013-04-03 | Off-chip distribution method in a kind of embedded memory data slice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103218304B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559148B (en) * | 2013-11-15 | 2016-03-23 | 山东大学 | Scratch-pad storage management method on the sheet of multi-task embedded operation system |
CN103793339B (en) * | 2014-01-13 | 2016-08-24 | 杭州电子科技大学 | Data Cache performance heuristic approach based on internal storage access storehouse distance |
CN105204940A (en) * | 2014-05-28 | 2015-12-30 | 中兴通讯股份有限公司 | Memory allocation method and device |
CN106940682B (en) * | 2017-03-07 | 2020-06-09 | 武汉科技大学 | Embedded system optimization method based on-chip programmable memory |
WO2021232183A1 (en) * | 2020-05-18 | 2021-11-25 | 华为技术有限公司 | Memory arrangement optimization method and apparatus |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763316A (en) * | 2009-12-25 | 2010-06-30 | 东南大学 | Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism |
CN101901192A (en) * | 2010-07-27 | 2010-12-01 | 杭州电子科技大学 | A static allocation method of on-chip and off-chip data objects |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9405683B2 (en) * | 2007-11-06 | 2016-08-02 | Samsung Electronics Co., Ltd. | Processor and memory control method for allocating instructions to a cache and a scratch pad memory |
-
2013
- 2013-04-03 CN CN201310114684.3A patent/CN103218304B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763316A (en) * | 2009-12-25 | 2010-06-30 | 东南大学 | Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism |
CN101901192A (en) * | 2010-07-27 | 2010-12-01 | 杭州电子科技大学 | A static allocation method of on-chip and off-chip data objects |
Non-Patent Citations (1)
Title |
---|
基于ScratchPad Memory的低功耗技术研究;袁名举;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110315;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN103218304A (en) | 2013-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103218304B (en) | Off-chip distribution method in a kind of embedded memory data slice | |
Lebeck et al. | Power aware page allocation | |
Hu et al. | Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory | |
Mittal | A survey of architectural techniques for DRAM power management | |
Salkhordeh et al. | An operating system level data migration scheme in hybrid DRAM-NVM memory architecture | |
CN101989183A (en) | Method for realizing energy-saving storing of hybrid main storage | |
CN102073596B (en) | Method for managing reconfigurable on-chip unified memory aiming at instructions | |
CN101763316A (en) | Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism | |
Zhong et al. | Energy-efficient in-memory paging for smartphones | |
CN106201700A (en) | The dispatching method that a kind of virtual machine migrates online | |
Gai et al. | Smart energy-aware data allocation for heterogeneous memory | |
Wang et al. | Designing scratchpad memory architecture with emerging STT-RAM memory technologies | |
Liu et al. | MLCache: A space-efficient cache scheme based on reuse distance and machine learning for NVMe SSDs | |
Kgil et al. | PicoServer: Using 3D stacking technology to build energy efficient servers | |
Xie et al. | Page policy control with memory partitioning for DRAM performance and power efficiency | |
Facchini et al. | System-level power/performance evaluation of 3D stacked DRAMs for mobile applications | |
Liu et al. | A space-efficient fair cache scheme based on machine learning for NVMe SSDs | |
Niu et al. | WIRD: an efficiency migration scheme in hybrid DRAM and PCM main memory for image processing applications | |
CN101290592B (en) | Realization method for multiple program sharing SPM on MPSOC | |
Huang et al. | A garbage collection aware stripping method for solid-state drives | |
CN101901192B (en) | On-chip and off-chip data object static assignment method | |
Hu et al. | Optimizing data allocation and memory configuration for non-volatile memory based hybrid SPM on embedded CMPs | |
Garibotti et al. | Exploiting memory allocations in clusterised many‐core architectures | |
CN103176799B (en) | Temperature sensitive mixing storage architecture and data allocation strategy method thereof | |
AbouGhazaleh et al. | Near-memory caching for improved energy consumption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160720 Termination date: 20170403 |
|
CF01 | Termination of patent right due to non-payment of annual fee |