CN115130032B - Deletion rate curve construction method and system in heterogeneous granularity storage system - Google Patents
Deletion rate curve construction method and system in heterogeneous granularity storage system Download PDFInfo
- Publication number
- CN115130032B CN115130032B CN202210789994.4A CN202210789994A CN115130032B CN 115130032 B CN115130032 B CN 115130032B CN 202210789994 A CN202210789994 A CN 202210789994A CN 115130032 B CN115130032 B CN 115130032B
- Authority
- CN
- China
- Prior art keywords
- access
- access request
- module
- histogram
- reuse distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims abstract description 36
- 238000012217 deletion Methods 0.000 title abstract 4
- 230000037430 deletion Effects 0.000 title abstract 4
- 238000005070 sampling Methods 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000012546 transfer Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 4
- 230000000717 retained effect Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9574—Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
技术领域Technical Field
本发明属于计算机数据存储领域,更具体地,涉及一种异构粒度存储系统中的缺失率曲线构建方法和系统。The present invention belongs to the field of computer data storage, and more specifically, relates to a method and system for constructing a missing rate curve in a heterogeneous granularity storage system.
背景技术Background technique
对象级缓存是网络中非常常见的缓存技术,其主要评价指标有字节缺失率(Bytemissing rate,简称BMR)和对象缺失率(Object missing rate,简称OMR)两种。在网络缓存中,字节丢失率可以直接反映带宽开销和缓存提供商的流量开销,因此其应用非常重要。更好的缓存策略可以在给定的缓存大小上获得更低的丢失率,并且,更好的缓存资源管理可以提高共享缓存的所有应用程序的总体性能。缺失率曲线(Miss rate curve,简称MRC)是一种定量的缓存性能建模技术,其构建的方法在缓存资源管理中起着至关重要的作用。Object-level caching is a very common caching technology in the network. Its main evaluation indicators are byte missing rate (BMR) and object missing rate (OMR). In network caching, the byte missing rate can directly reflect the bandwidth overhead and traffic overhead of the cache provider, so its application is very important. Better caching strategies can achieve lower miss rates for a given cache size, and better cache resource management can improve the overall performance of all applications sharing the cache. Miss rate curve (MRC) is a quantitative cache performance modeling technology, and its construction method plays a vital role in cache resource management.
针对MRC的构建方法在过去得到了广泛的研究,其构建方法主要有:准确的MRC构建方法、近似的MRC构建方法、基于空间哈希采样的MRC构建方法等。准确的MRC构建方法通过逐对象计算重用距离,可以得到较为精确的构建结果;近似的MRC构建通过近似和等效的处理方法,使得计算开销较小,且误差在可容忍范围之内;基于空间哈希采样的MRC构建方法通过哈希采样技术,能以较低的计算开销在异构粒度存储系统中实现MRC的构建。The construction method of MRC has been widely studied in the past, and its construction methods mainly include: accurate MRC construction method, approximate MRC construction method, MRC construction method based on spatial hash sampling, etc. The accurate MRC construction method can obtain more accurate construction results by calculating the reuse distance of each object; the approximate MRC construction uses approximate and equivalent processing methods to reduce the computational overhead and the error is within the tolerable range; the MRC construction method based on spatial hash sampling can realize MRC construction in heterogeneous granularity storage systems with lower computational overhead through hash sampling technology.
然而,现有的MRC构建方法在异构粒度存储系统的运用中存在者一些问题:第一、若要实现MRC的准确构建,必然伴随着极高的时间和空间复杂度,导致计算开销很大;第二、在异构粒度存储系统缓存(如对象存储)中,对象尺寸的差异会带来MRC的构建误差,以及字节缺失率曲线(Byte missing rate curve,简称BMRC)与对象缺失率曲线(Object missingrate curve,简称OMRC)之间存在较大的构建误差;第三、不同对象的不同内容流行程度(即请求概率)会导致前端MRC构建不准确,从而影响MRC的整体准确性。However, there are some problems with the existing MRC construction methods in the application of heterogeneous granularity storage systems: First, if the accurate construction of MRC is to be achieved, it is inevitably accompanied by extremely high time and space complexity, resulting in a large computational overhead; Second, in the cache of heterogeneous granularity storage systems (such as object storage), the difference in object size will lead to MRC construction errors, and there will be a large construction error between the Byte missing rate curve (BMRC) and the Object missing rate curve (OMRC); Third, the different content popularity of different objects (i.e., request probability) will lead to inaccurate front-end MRC construction, thereby affecting the overall accuracy of MRC.
发明内容Summary of the invention
针对现有技术的以上缺陷或改进需求,本发明提供了一种异构粒度存储系统中的缺失率曲线构建方法,其目的在于,解决现有MRC构建方法在实现MRC的准确构建期间,必然伴随着极高的时间和空间复杂度,从而导致计算开销很大的技术问题,以及在异构粒度存储系统缓存(如对象存储)中,对象尺寸的差异会带来MRC的构建误差,以及BMRC与OMRC之间存在较大的构建误差的技术问题,以及不同对象的不同内容流行程度会导致前端MRC构建不准确,从而影响MRC整体准确性的技术问题。In response to the above defects or improvement needs of the prior art, the present invention provides a method for constructing a miss rate curve in a heterogeneous granularity storage system, with the aim of solving the technical problems that the existing MRC construction method is inevitably accompanied by extremely high time and space complexity during the accurate construction of MRC, resulting in a large computational overhead, as well as the technical problems that in the cache of a heterogeneous granularity storage system (such as object storage), the difference in object size will cause MRC construction errors, and there is a large construction error between BMRC and OMRC, and the different content popularity of different objects will lead to inaccurate front-end MRC construction, thereby affecting the overall accuracy of MRC.
为实现上述目的,按照本发明的一个方面,提供了一种用于缺失率曲线构建的采样方法,包括以下步骤:To achieve the above object, according to one aspect of the present invention, a sampling method for constructing a missing rate curve is provided, comprising the following steps:
(1)接收访问请求序列,并设置计数器i=1;(1) Receive access request sequence and set counter i=1;
(2)判断i是否等于访问请求序列中的访问请求总数,如果是则进入步骤(8),否则转入步骤(3);(2) Determine whether i is equal to the total number of access requests in the access request sequence. If so, proceed to step (8); otherwise, proceed to step (3);
(3)将访问请求序列中的第i条访问请求载入缓存过滤器中,并判断该第i条访问请求对应的访问对象是否在缓存过滤器中命中,如果是则转入步骤(4),否则转入步骤(5);(3) Load the i-th access request in the access request sequence into the cache filter, and determine whether the access object corresponding to the i-th access request hits in the cache filter. If so, proceed to step (4); otherwise, proceed to step (5);
(4)获取该第i条访问请求对应的访问对象的重用距离,然后转入步骤(7);(4) Obtain the reuse distance of the access object corresponding to the i-th access request, and then proceed to step (7);
(5)根据第i条访问请求对应的访问对象的大小获取该访问对象的采样率;(5) obtaining a sampling rate of the access object according to the size of the access object corresponding to the i-th access request;
(6)根据步骤(5)得到的第i条访问请求对应的访问对象的采样率对该访问对象进行采样,并计算该访问对象的重用距离,然后转入步骤(7);(6) sampling the access object according to the sampling rate of the access object corresponding to the i-th access request obtained in step (5), calculating the reuse distance of the access object, and then proceeding to step (7);
(7)设置计数器i=i+1,并返回步骤(2);(7) Set counter i=i+1 and return to step (2);
(8)根据访问请求队列中所有在缓存过滤器中命中的访问对象的重用距离构建第一直方图,然后转入步骤(9)。(8) Construct the first histogram based on the reuse distances of all access objects in the access request queue that hit the cache filter, and then proceed to step (9).
(9)根据访问请求队列中所有未在缓存过滤器中命中的访问对象的重用距离构建第二直方图,然后转入步骤(10);(9) constructing a second histogram according to the reuse distances of all access objects in the access request queue that are not hit in the cache filter, and then proceeding to step (10);
(10)合并第一直方图与第二直方图,构建完整的重用距离直方图,并对重用距离直方图进行积分处理,以获得最终的缺失率曲线。(10) The first histogram and the second histogram are combined to construct a complete reuse distance histogram, and the reuse distance histogram is integrated to obtain the final missing rate curve.
优选地,步骤(1)是从缓存系统中获取的实际运行的访问请求序列。Preferably, step (1) is to obtain the actual running access request sequence from the cache system.
优选地,每条访问请求仅对应一个访问对象,一个访问对象对应至少一条访问请求。Preferably, each access request corresponds to only one access object, and one access object corresponds to at least one access request.
优选地,缓存过滤器中仅保留最近的大小不大于L的访问对象,其中L的取值范围是0到5×1/r×log2(1/r),优选为1/r×log2(1/r),其中r为预设的基础采样率大小。Preferably, the cache filter only retains the most recently accessed objects whose size is not greater than L, where L ranges from 0 to 5×1/r×log 2 (1/r), preferably 1/r×log 2 (1/r), where r is a preset basic sampling rate.
优选地,访问对象i的重用距离为该访问对象i在距上一次被访问的间隔之内除了该访问对象之外的其余所有访问对象的大小之和;Preferably, the reuse distance of the access object i is the sum of the sizes of all other access objects except the access object i within the interval from the last access of the access object i;
访问对象D的采样率rD是按照以下公式计算得到:The sampling rate r D of the access object D is calculated according to the following formula:
rD=r×sD/savg r D =r×s D /s avg
其中sD为访问对象的大小,savg为访问请求序列中所有访问请求对应的访问对象的平均值;Where s D is the size of the access object, s avg is the average value of the access objects corresponding to all access requests in the access request sequence;
优选地,步骤(6)包括以下子步骤:Preferably, step (6) comprises the following sub-steps:
(6-1)根据步骤(5)中计算得到的第i条访问请求对应的访问对象的采样率rD对第i条访问请求对应的访问对象进行采样,以生成一个0到1的随机数,并判断该随机数是否小于等于rD,如果是则转入步骤(6-2),否则过程结束;(6-1) Sampling the access object corresponding to the i-th access request according to the sampling rate r D of the access object corresponding to the i-th access request calculated in step (5) to generate a random number between 0 and 1, and determining whether the random number is less than or equal to r D . If so, proceed to step (6-2); otherwise, the process ends.
(6-2)构建二叉树Tws,使得二叉树Tws的中序遍历等价于步骤(5)得到的第i条访问请求对应的访问对象的先后顺序;(6-2) Construct a binary tree T ws , so that the in-order traversal of the binary tree T ws is equivalent to the order of access objects corresponding to the i-th access request obtained in step (5);
(6-3)根据步骤(6-2)得到的二叉树Tws计算第i条访问请求对应的访问对象的重用距离sBRD;(6-3) Calculate the reuse distance sBRD of the access object corresponding to the i-th access request based on the binary tree T ws obtained in step (6-2);
(6-4)使用步骤(6-3)得到的第i条访问请求对应的访问对象的重用距离并根据公式eBRD=sBRD/r+size(TCF)获取第i条访问请求对应的访问对象的最终重用距离eBRD,其中size(TCF)是缓存过滤器中存储的所有访问对象的总大小。(6-4) Use the reuse distance of the access object corresponding to the i-th access request obtained in step (6-3) and obtain the final reuse distance eBRD of the access object corresponding to the i-th access request according to the formula eBRD=sBRD/r+size(T CF ), where size(T CF ) is the total size of all access objects stored in the cache filter.
优选地,在第一直方图中,横坐标1表示的是重用距离,其对应的纵坐标,是重用距离为1的访问对象的总数。Preferably, in the first histogram, the abscissa 1 represents the reuse distance, and the corresponding ordinate is the total number of access objects with a reuse distance of 1.
在第二直方图中,横坐标1表示的是重用距离,其对应的纵坐标,是重用距离为1的访问对象的总数。In the second histogram, the horizontal axis 1 represents the reuse distance, and the corresponding vertical axis is the total number of access objects with a reuse distance of 1.
按照本发明的另一方面,提供了一种异构粒度存储系统中的缺失率曲线构建系统,包括:According to another aspect of the present invention, a system for constructing a miss rate curve in a heterogeneous granularity storage system is provided, comprising:
第一模块,用于接收访问请求序列,并设置计数器i=1;The first module is used to receive an access request sequence and set a counter i=1;
第二模块,用于判断i是否等于访问请求序列中的访问请求总数,如果是则进入第八模块,否则转入第三模块;The second module is used to determine whether i is equal to the total number of access requests in the access request sequence, if yes, then enter the eighth module, otherwise, then transfer to the third module;
第三模块,用于将访问请求序列中的第i条访问请求载入缓存过滤器中,并判断该第i条访问请求对应的访问对象是否在缓存过滤器中命中,如果是则转入第四模块,否则转入第五模块;The third module is used to load the i-th access request in the access request sequence into the cache filter, and determine whether the access object corresponding to the i-th access request hits in the cache filter, if yes, transfer to the fourth module, otherwise transfer to the fifth module;
第四模块,用于获取该第i条访问请求对应的访问对象的重用距离,然后转入第七模块;The fourth module is used to obtain the reuse distance of the access object corresponding to the i-th access request, and then transfer to the seventh module;
第五模块,用于根据第i条访问请求对应的访问对象的大小获取该访问对象的采样率;A fifth module is used to obtain a sampling rate of the access object according to the size of the access object corresponding to the i-th access request;
第六模块,用于根据第五模块得到的第i条访问请求对应的访问对象的采样率对该访问对象进行采样,并计算该访问对象的重用距离,然后转入第七模块;The sixth module is used to sample the access object corresponding to the i-th access request according to the sampling rate of the access object obtained in the fifth module, calculate the reuse distance of the access object, and then transfer to the seventh module;
第七模块,用于设置计数器i=i+1,并返回第二模块;The seventh module is used to set the counter i=i+1 and return to the second module;
第八模块,用于根据访问请求队列中所有在缓存过滤器中命中的访问对象的重用距离构建第一直方图,然后转入第九模块。The eighth module is used to construct a first histogram according to the reuse distances of all access objects in the access request queue that hit the cache filter, and then transfer to the ninth module.
第九模块,用于根据访问请求队列中所有未在缓存过滤器中命中的访问对象的重用距离构建第二直方图,然后转入第十模块;The ninth module is used to construct a second histogram according to the reuse distances of all access objects in the access request queue that are not hit in the cache filter, and then transfer to the tenth module;
第十模块,用于合并第一直方图与第二直方图,以构建完整的重用距离直方图,并对重用距离直方图进行积分处理,以获得最终的缺失率曲线。The tenth module is used to merge the first histogram and the second histogram to construct a complete reuse distance histogram, and integrate the reuse distance histogram to obtain a final missing rate curve.
总体而言,通过本发明所构思的以上技术方案与现有技术相比,能够取得下列有益效果:In general, the above technical solutions conceived by the present invention can achieve the following beneficial effects compared with the prior art:
1、由于本发明采用了步骤(5)到步骤(6),应用了采样的计算方式,能够降低时间与空间复杂度,从而降低MRC曲线构建的计算开销;此外,由于采用了步骤(3)到步骤(6),分开考虑缺失率曲线的头部和尾部的构建问题,在降低开销的同时保证了准确性满足要求。1. Since the present invention adopts step (5) to step (6) and applies a sampling calculation method, it can reduce the time and space complexity, thereby reducing the calculation overhead of MRC curve construction; in addition, since step (3) to step (6) are adopted, the construction problems of the head and tail of the missing rate curve are considered separately, which reduces the overhead while ensuring that the accuracy meets the requirements.
2、由于本发明采用了步骤(5)到步骤(6),对不同尺寸的对象采取不同的采样率进行变权重采样的处理,能够避免对象大小差异造成的缺失率曲线构建不准确问题,降低BMRC与OMRC之间存在的构建误差,大大提高了缓存缺失率曲线构建的准确率。2. Since the present invention adopts steps (5) to (6) and adopts different sampling rates for objects of different sizes to perform variable weight sampling processing, it can avoid the problem of inaccurate construction of the miss rate curve caused by the difference in object size, reduce the construction error between BMRC and OMRC, and greatly improve the accuracy of cache miss rate curve construction.
3、由于本发明采用了步骤(3)到步骤(4),将内容流行度高的对象单独计算其重用距离,分开考虑缺失率曲线的头部和尾部的构建问题,提高了高内容流行度对象的重用距离统计精度,大大提高了MRC曲线的头部构建精度。3. Since the present invention adopts step (3) to step (4), the reuse distance of objects with high content popularity is calculated separately, and the construction problems of the head and tail of the missing rate curve are considered separately, which improves the statistical accuracy of the reuse distance of objects with high content popularity and greatly improves the head construction accuracy of the MRC curve.
4、本发明作为通用性的工具,很容易在内容发布网络、网页缓存等其他应用场景灵活迁移适用,可以帮助缓存系统实现更灵活、更高效的多目标缓存优化。4. As a universal tool, the present invention can be easily and flexibly migrated and applied in other application scenarios such as content distribution networks and web page caches, and can help cache systems achieve more flexible and efficient multi-objective cache optimization.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明异构粒度存储系统中的缺失率曲线构建方法的流程图在所有附图中,相同的附图标记用来表示相同的元件或结构。FIG. 1 is a flow chart of a method for constructing a miss rate curve in a heterogeneous granularity storage system according to the present invention. In all drawings, the same reference numerals are used to represent the same elements or structures.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the purpose, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
如图1所示,本发明提供了一种异构粒度存储系统中的缺失率曲线构建方法,其主要包含两个部分:缓存过滤器以及变权重采样阶段,两部分分别用于构建缺失率曲线的头部和尾部,以提高缺失率曲线的构建精度。As shown in Figure 1, the present invention provides a method for constructing a miss rate curve in a heterogeneous granularity storage system, which mainly includes two parts: a cache filter and a variable weight sampling stage. The two parts are respectively used to construct the head and tail of the miss rate curve to improve the construction accuracy of the miss rate curve.
具体而言,本发明异构粒度存储系统中的缺失率曲线构建方法包括以下步骤:Specifically, the method for constructing a missing rate curve in a heterogeneous granularity storage system of the present invention comprises the following steps:
(1)接收访问请求序列,并设置计数器i=1;(1) Receive access request sequence and set counter i=1;
具体而言,本步骤是从缓存系统中获取的实际运行的访问请求序列记录。Specifically, this step is to obtain the actual running access request sequence record from the cache system.
(2)判断i是否等于访问请求序列中的访问请求总数,如果是则进入步骤(8),否则转入步骤(3);(2) Determine whether i is equal to the total number of access requests in the access request sequence. If so, proceed to step (8); otherwise, proceed to step (3);
(3)将访问请求序列中的第i条访问请求载入缓存过滤器中,并判断该第i条访问请求对应的访问对象是否在缓存过滤器中命中,如果是则转入步骤(4),否则转入步骤(5);(3) Load the i-th access request in the access request sequence into the cache filter, and determine whether the access object corresponding to the i-th access request hits in the cache filter. If so, proceed to step (4); otherwise, proceed to step (5);
具体而言,每条访问请求仅仅对应一个访问对象,一个访问对象可以对应多条访问请求。Specifically, each access request corresponds to only one access object, and one access object can correspond to multiple access requests.
缓存过滤器使用二叉树TCF来构建访问请求序列,使得二叉树的中序遍历等价于访问请求序列访问对象的先后顺序,以快速计算访问请求序列对应的访问对象的重用距离。并且,缓存过滤器中仅保留最近的大小不大于L的访问对象,其中L的取值范围是0到5×1/r×log2(1/r),优选为1/r×log2(1/r),其中r为预设的基础采样率大小,可以在0到1之间任意选择,与步骤(5)中的基础采样率大小相同。The cache filter uses a binary tree T CF to construct an access request sequence, so that the in-order traversal of the binary tree is equivalent to the order of accessing objects in the access request sequence, so as to quickly calculate the reuse distance of the access objects corresponding to the access request sequence. In addition, only the most recent access objects whose size is not greater than L are retained in the cache filter, where the value range of L is 0 to 5×1/r×log 2 (1/r), preferably 1/r×log 2 (1/r), where r is a preset basic sampling rate, which can be arbitrarily selected between 0 and 1, and is the same as the basic sampling rate in step (5).
(4)获取该第i条访问请求对应的访问对象的重用距离,然后转入步骤(7);(4) Obtain the reuse distance of the access object corresponding to the i-th access request, and then proceed to step (7);
具体而言,访问对象i的重用距离为,访问对象i在距上一次被访问的间隔之内除了该访问对象之外的其余所有访问对象的大小之和。Specifically, the reuse distance of access object i is the sum of the sizes of all other access objects except the access object i within the interval from the last access to the access object i.
上述步骤(3)到(4)的优点在于,对内容流行度高的对象单独计算重用距离,后续对其构建准确的重用距离直方图,可以提高缺失率曲线在头部的准确率,从而整体提高缺失率曲线的准确程度。The advantage of the above steps (3) to (4) is that the reuse distance is calculated separately for objects with high content popularity, and an accurate reuse distance histogram is subsequently constructed for them, which can improve the accuracy of the missing rate curve at the head, thereby improving the accuracy of the missing rate curve as a whole.
(5)根据第i条访问请求对应的访问对象的大小获取该访问对象的采样率;(5) obtaining a sampling rate of the access object according to the size of the access object corresponding to the i-th access request;
具体而言,访问对象D的采样率rD是按照以下公式计算得到:Specifically, the sampling rate r D of the access object D is calculated according to the following formula:
rD=r×sD/savg r D =r×s D /s avg
其中sD为访问对象的大小,savg为访问请求序列中所有访问请求对应的访问对象的平均值;Where s D is the size of the access object, s avg is the average value of the access objects corresponding to all access requests in the access request sequence;
(6)根据步骤(5)得到的第i条访问请求对应的访问对象的采样率对该访问对象进行采样,并计算该访问对象的重用距离,然后转入步骤(7);(6) sampling the access object according to the sampling rate of the access object corresponding to the i-th access request obtained in step (5), calculating the reuse distance of the access object, and then proceeding to step (7);
具体而言,重用距离的计算策略是:Specifically, the calculation strategy of reuse distance is:
(6-1)根据步骤(5)中计算得到的第i条访问请求对应的访问对象的采样率rD对第i条访问请求对应的访问对象进行采样,以生成一个0到1的随机数,并判断该随机数是否小于等于rD,如果是则转入步骤(6-2),否则过程结束;(6-1) Sampling the access object corresponding to the i-th access request according to the sampling rate r D of the access object corresponding to the i-th access request calculated in step (5) to generate a random number between 0 and 1, and determining whether the random number is less than or equal to r D . If so, proceed to step (6-2); otherwise, the process ends.
(6-2)构建二叉树Tws,使得二叉树Tws的中序遍历等价于步骤(5)得到的第i条访问请求对应的访问对象的先后顺序;(6-2) Construct a binary tree T ws , so that the in-order traversal of the binary tree T ws is equivalent to the order of access objects corresponding to the i-th access request obtained in step (5);
(6-3)根据步骤(6-2)得到的二叉树Tws计算第i条访问请求对应的访问对象的重用距离sBRD;(6-3) Calculate the reuse distance sBRD of the access object corresponding to the i-th access request based on the binary tree T ws obtained in step (6-2);
具体而言,本步骤中的计算方法与步骤(4)中完全相同,在此不再赘述。Specifically, the calculation method in this step is exactly the same as that in step (4), and will not be repeated here.
(6-4)使用步骤(6-3)得到的第i条访问请求对应的访问对象的重用距离并根据公式eBRD=sBRD/r+size(TCF)获取第i条访问请求对应的访问对象的最终重用距离eBRD,其中size(TCF)是缓存过滤器中存储的所有访问对象的总大小。(6-4) Use the reuse distance of the access object corresponding to the i-th access request obtained in step (6-3) and obtain the final reuse distance eBRD of the access object corresponding to the i-th access request according to the formula eBRD=sBRD/r+size(T CF ), where size(T CF ) is the total size of all access objects stored in the cache filter.
上述步骤(6-1)到(6-4)的优点在于,对不同大小的对象采用不同的抽样率,减少了随机采样的方差,进一步降低了缺失率曲线在尾部的误差,降低了对象尺寸差异对MRC构建的影响,从而提高整体缺失率曲线的准确程度。The advantage of the above steps (6-1) to (6-4) is that different sampling rates are used for objects of different sizes, which reduces the variance of random sampling, further reduces the error at the tail of the missing rate curve, and reduces the impact of object size differences on MRC construction, thereby improving the accuracy of the overall missing rate curve.
(7)设置计数器i=i+1,并返回步骤(2);(7) Set counter i=i+1 and return to step (2);
(8)根据访问请求队列中所有在缓存过滤器中命中的访问对象的重用距离构建第一直方图,然后转入步骤(9);(8) constructing a first histogram according to the reuse distances of all access objects in the access request queue that hit the cache filter, and then proceeding to step (9);
具体而言,在第一直方图中,横坐标1表示的是重用距离,其对应的纵坐标,是重用距离为1的访问对象的总数。Specifically, in the first histogram, the horizontal axis 1 represents the reuse distance, and the corresponding vertical axis is the total number of access objects with a reuse distance of 1.
(9)根据访问请求队列中所有未在缓存过滤器中命中的访问对象的重用距离构建第二直方图,然后转入步骤(10);(9) constructing a second histogram according to the reuse distances of all access objects in the access request queue that are not hit in the cache filter, and then proceeding to step (10);
具体而言,在第二直方图中,横坐标1表示的是重用距离,其对应的纵坐标,是重用距离为1的访问对象的总数。Specifically, in the second histogram, the horizontal axis 1 represents the reuse distance, and the corresponding vertical axis is the total number of access objects with a reuse distance of 1.
(10)合并第一直方图与第二直方图,以构建完整的重用距离直方图,并对重用距离直方图进行积分处理,以获得最终的缺失率曲线。(10) The first histogram and the second histogram are combined to construct a complete reuse distance histogram, and the reuse distance histogram is integrated to obtain the final missing rate curve.
从缺失率曲线的工作流程可以看出,相较于传统的构建流程,本发明増加了缓存过滤器,使得内容流行度高的对象单独计算,从而减少了传统采样计算造成的缺失率曲线头部误差大的问题。此外,本发明在传统采样的基础上,根据对象的大小对不同对象采用不同的采样率,减少了对象大小差异对缺失率曲线构建准确度的影响。It can be seen from the workflow of the missing rate curve that, compared with the traditional construction process, the present invention adds a cache filter so that objects with high content popularity are calculated separately, thereby reducing the problem of large errors in the head of the missing rate curve caused by traditional sampling calculations. In addition, based on traditional sampling, the present invention uses different sampling rates for different objects according to the size of the object, thereby reducing the impact of object size differences on the accuracy of missing rate curve construction.
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210789994.4A CN115130032B (en) | 2022-07-05 | 2022-07-05 | Deletion rate curve construction method and system in heterogeneous granularity storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210789994.4A CN115130032B (en) | 2022-07-05 | 2022-07-05 | Deletion rate curve construction method and system in heterogeneous granularity storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115130032A CN115130032A (en) | 2022-09-30 |
CN115130032B true CN115130032B (en) | 2024-07-09 |
Family
ID=83381680
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210789994.4A Active CN115130032B (en) | 2022-07-05 | 2022-07-05 | Deletion rate curve construction method and system in heterogeneous granularity storage system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115130032B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992579A (en) * | 2019-03-28 | 2019-07-09 | 湖北交投智能检测股份有限公司 | A kind of data recovery method and system of highway infrastructures multi-resources Heterogeneous data |
CN113419976A (en) * | 2021-06-29 | 2021-09-21 | 华中科技大学 | Self-adaptive segmented caching method and system based on classification prediction |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8141058B2 (en) * | 2006-06-05 | 2012-03-20 | Rogue Wave Software, Inc. | System for and method of capturing application characteristics data from a computer system and modeling target system |
WO2018152267A1 (en) * | 2017-02-14 | 2018-08-23 | Bahram Ghaffarzadeh Kermani | Reliable and secure detection techniques for processing genome data in next generation sequencing (ngs) |
-
2022
- 2022-07-05 CN CN202210789994.4A patent/CN115130032B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992579A (en) * | 2019-03-28 | 2019-07-09 | 湖北交投智能检测股份有限公司 | A kind of data recovery method and system of highway infrastructures multi-resources Heterogeneous data |
CN113419976A (en) * | 2021-06-29 | 2021-09-21 | 华中科技大学 | Self-adaptive segmented caching method and system based on classification prediction |
Also Published As
Publication number | Publication date |
---|---|
CN115130032A (en) | 2022-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9898206B2 (en) | Memory access processing method, apparatus, and system | |
US8032713B2 (en) | Structure for handling data access | |
CN111737168B (en) | A cache system, cache processing method, device, equipment and medium | |
CN109240946A (en) | The multi-level buffer method and terminal device of data | |
CN103856567A (en) | Small file storage method based on Hadoop distributed file system | |
CN112328700B (en) | A distributed database | |
CN110287010A (en) | A cache data prefetch method for Spark time window data analysis | |
US20150067646A1 (en) | System and Method to Predict Elapsed Response Time for a Query during Application Development Stage | |
CN107329910A (en) | A kind of web front end data based on localStorage are locally stored and access method | |
CN108509723B (en) | LRU Cache prefetching mechanism performance gain evaluation method based on artificial neural network | |
CN103049392A (en) | Method and device for achieving cache catalogue | |
WO2023165543A1 (en) | Shared cache management method and apparatus, and storage medium | |
CN105808358A (en) | Data dependency thread group mapping method for many-core system | |
CN113157605A (en) | Resource allocation method and system for two-level cache, storage medium and computing device | |
CN115130032B (en) | Deletion rate curve construction method and system in heterogeneous granularity storage system | |
US9053031B2 (en) | System and method for handling data access | |
US20240289275A1 (en) | Data processing method and apparatus, and cache, processor and electronic device | |
CN112083877B (en) | A data grouping method of vehicle Internet of things cloud storage system | |
CN115981555A (en) | Data processing method and device, electronic equipment and medium | |
CN118796896A (en) | Database cache strategy adjustment method, device, and equipment | |
CN118193540A (en) | Index processing method, device, electronic device and readable storage medium | |
CN118113458A (en) | SQL traffic scheduling method, system, medium and device based on machine learning | |
CN118012906A (en) | Multi-level cache adaptive system and strategy based on machine learning | |
US10942859B2 (en) | Computing system and method using bit counter | |
CN110019362A (en) | A kind of method and device accessing database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |