CN107291878A - A kind of data-erasure method and device of distributed storage file system - Google Patents
A kind of data-erasure method and device of distributed storage file system Download PDFInfo
- Publication number
- CN107291878A CN107291878A CN201710463930.4A CN201710463930A CN107291878A CN 107291878 A CN107291878 A CN 107291878A CN 201710463930 A CN201710463930 A CN 201710463930A CN 107291878 A CN107291878 A CN 107291878A
- Authority
- CN
- China
- Prior art keywords
- data
- detection table
- label
- validity period
- set time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/162—Delete operations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本申请涉及数据存储领域,特别涉及分布式存储文件系统中的一种数据删除方法及装置。The present application relates to the field of data storage, in particular to a data deletion method and device in a distributed storage file system.
背景技术Background technique
在大数据时代,数据信息的作用越来越大。不可避免的,数据的存储是个至关重要的问题。在数据存储的过程中,总是会存在一些无效的数据,这些数据会占用相当一部分的存储资源,还会对有效数据形成干扰,严重影响人们对于数据库中数据资源的利用和数据分析的结果。In the era of big data, the role of data information is increasing. Inevitably, data storage is a crucial issue. In the process of data storage, there will always be some invalid data, which will occupy a considerable part of storage resources, and will also interfere with valid data, seriously affecting people's use of data resources in the database and the results of data analysis.
目前最热门的存储系统无疑是基于分布式文件存储架构的分布式文件存储系统,它主要用于存储文档、图像、视频之类的数据。该系统以全局管理的方式管理系统资源,它可以任意调度集群中的存储资源,并且调度过程是“透明”的。Currently the most popular storage system is undoubtedly a distributed file storage system based on a distributed file storage architecture, which is mainly used to store data such as documents, images, and videos. The system manages system resources in a global manner, and it can schedule storage resources in the cluster arbitrarily, and the scheduling process is "transparent".
分布式存储系统采用可扩展的系统结构,不但提高了系统的可靠性、可用性和存取效率,还易于扩展。以高性能、高容量为主要特性的分布式存储系统,一般满足以下四个条件:应用于网络环境中;单个文件数据分布存放在不同的节点上;支持多个终端多个进程并发存取;提供统一的目录空间和访问名称。The distributed storage system adopts a scalable system structure, which not only improves the reliability, availability and access efficiency of the system, but also is easy to expand. A distributed storage system with high performance and high capacity as its main features generally meets the following four conditions: it is applied in a network environment; the data of a single file is distributed and stored on different nodes; it supports concurrent access by multiple terminals and multiple processes; Provides a unified directory space and access name.
但是目前在分布式存储文件系统中主要靠人工对无效数据进行筛选和删除,效率低下且存在误操作风险。数据库的内容庞大,人工删除无效数据不可避免的会因为工作疲劳等因素存在误操作的高风险。However, at present, in the distributed storage file system, invalid data is mainly screened and deleted manually, which is inefficient and has the risk of misoperation. The content of the database is huge, and manual deletion of invalid data will inevitably lead to a high risk of misoperation due to factors such as work fatigue.
发明内容Contents of the invention
有鉴于此,本申请提供了一种数据删除的方法及装置,通过对数据设置有效期,数据检测发现有效期满的数据并删除,解决了现有技术中人工筛选无效数据并删除的低效高风险问题。该方法如下:In view of this, this application provides a method and device for data deletion. By setting the validity period for the data, the data detection finds and deletes the expired data, which solves the inefficiency and high risk of manual screening and deletion of invalid data in the prior art. question. The method is as follows:
设置数据的有效期标签;Set the validity period label of the data;
将所述有效期标签存到检测表中;Store the expiration date label in the detection table;
在设定时间遍历所述检测表,筛选出所述检测表中有效期满的标签;traversing the detection table at a set time, and filtering out tags whose validity period has expired in the detection table;
删除所述检测表中所述有效期满的标签和其对应的数据。Delete the expired label and its corresponding data in the detection table.
其中,所述在设定时间遍历所述检测表包括:Wherein, the traversing the detection table at the set time includes:
利用所述检测表中剩余数据的有效期标签,通过线性回归法计算得到所述设定时间。The set time is obtained through linear regression calculation using the validity period labels of the remaining data in the detection table.
其中,所述删除所述检测表中所述有效期满的标签和其对应的数据之后还包括:Wherein, after the deletion of the expired label and its corresponding data in the detection table, it also includes:
再次利用线性回归法计算出设定时间后,利用该设定时间对原先计算出的设定时间进行校正。After the set time is calculated again using the linear regression method, the set time is used to correct the previously calculated set time.
其中,所述删除所述检测表中所述数据标签和其对应的数据,包括:Wherein, the deletion of the data label and its corresponding data in the detection table includes:
调用删除程序删除所述检测表中所述有效期满的标签和其对应的数据。Call the deletion program to delete the expired label and its corresponding data in the detection table.
本申请还包括一种分布式存储文件系统的数据删除装置,所述装置包括:The present application also includes a data deletion device for a distributed storage file system, the device comprising:
标签设置模块,用于设置数据的有效期标签;A label setting module is used to set the validity period label of the data;
标签映射模块,用于将所述有效期标签存到检测表中;A label mapping module, configured to store the validity period label in a detection table;
标签遍历模块,用于在设定时间遍历所述检测表,筛选出所述检测表中有效期满的标签;A label traversal module, configured to traverse the detection table at a set time, and filter out labels whose validity period has expired in the detection table;
数据删除模块,用于删除所述检测表中的所述有效期满的标签和其对应的数据。A data deletion module, configured to delete the expired label and its corresponding data in the detection table.
其中,所述标签遍历模块包括:Wherein, the label traversal module includes:
时间计算子模块,用于根据检测表中剩余数据有效期通过线性回归法计算得到遍历检查表的时间。The time calculation sub-module is used to calculate the time for traversing the inspection table through linear regression method according to the validity period of the remaining data in the inspection table.
其中,所述标签遍历模块还包括:Wherein, the label traversal module also includes:
时间校正子模块,用于再次利用线性回归法计算出设定时间后,利用该设定时间对原先计算出的设定时间进行校正。The time correction sub-module is used to correct the previously calculated set time by using the set time after calculating the set time again by using the linear regression method.
其中,所述数据删除模块具体用于调用删除程序删除所述检测表中所述数据标签和其对应的数据。Wherein, the data deletion module is specifically configured to call a deletion program to delete the data tag and its corresponding data in the detection table.
本申请提供了一种分布式文件存储系统的数据删除方法,该方法通过对数据设置有效期,在预先设定的时间检测数据的有效期,有效期满则删除。使得数据存储系统能够自动筛选出无效数据并删除,避免了人工操作带来的低效率高风险。本申请还提供了一种分布式文件存储系统的数据删除装置,具有上述有益效果,此处不再赘述。The present application provides a method for deleting data in a distributed file storage system. The method sets a validity period for the data, detects the validity period of the data at a preset time, and deletes the data when the validity period expires. It enables the data storage system to automatically filter out invalid data and delete it, avoiding the low efficiency and high risk caused by manual operation. The present application also provides a data deletion device of a distributed file storage system, which has the above-mentioned beneficial effects, and will not be repeated here.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本申请实施例所提供的一种分布式文件存储系统的数据删除方法流程图;FIG. 1 is a flowchart of a data deletion method of a distributed file storage system provided by an embodiment of the present application;
图2为本申请实施例所提供的另一种分布式文件存储系统的数据删除方法流程图;FIG. 2 is a flow chart of another data deletion method of a distributed file storage system provided by an embodiment of the present application;
图3为本申请实施例所提供的分布式文件存储系统的数据删除装置示意图。FIG. 3 is a schematic diagram of a data deletion device of a distributed file storage system provided by an embodiment of the present application.
具体实施方式detailed description
本发明的核心是提供一种分布式文件存储系统的无效数据删除方法及装置,实现系统能够自动筛选无效数据并将其删除,解决了人工筛选无效数据时造成的低效率高风险的问题。The core of the present invention is to provide a method and device for deleting invalid data in a distributed file storage system, so that the system can automatically screen and delete invalid data, and solve the problem of low efficiency and high risk caused by manual screening of invalid data.
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
实施例一Embodiment one
请参照图1,图1为本申请提供的一种分布式文件存储系统的无效数据删除方法。具体方法如下:Please refer to FIG. 1 . FIG. 1 is a method for deleting invalid data in a distributed file storage system provided by the present application. The specific method is as follows:
S101:设置数据的有效期标签。S101: Setting a validity period label of the data.
对分布式文件存储系统中的数据设置有效期标签,种类和形式不限。例如可以对某一数据设置一个单独的标签数据,包括数据有效期、数据类型、数据大小、数据位置等信息的一种或若干种,亦可以作为原有数据的一个属性,附加于原有数据的物理位置或者逻辑位置之上。Set the validity period label for the data in the distributed file storage system, and the type and form are not limited. For example, a separate label data can be set for a certain data, including one or several types of information such as data validity period, data type, data size, data location, etc., or it can be used as an attribute of the original data and added to the original data physical location or logical location.
S102:将所述有效期标签存到检测表中;S102: Store the expiration date label in a detection table;
在系统中建立一个检测表,或者利用系统现有的表类型数据,将S101中设置好的有效期标签存放在检测表中。这里的存放可以是将已经设置好的标签移动过去,也可以复制一个标签到检测表。从系统存储数据量的角度来看,因为标签已经含有其对应的数据信息,通过该标签可以寻址到数据,所以直接移动标签到检测表会节省空间。Create a test table in the system, or use the existing table type data in the system to store the validity period label set in S101 in the test table. The storage here can be to move the label that has been set, or to copy a label to the detection table. From the perspective of the amount of data stored in the system, because the tag already contains its corresponding data information, the data can be addressed through the tag, so moving the tag directly to the detection table will save space.
S103:在设定时间遍历所述检测表,筛选出所述检测表中有效期满的标签;S103: traverse the detection table at a set time, and filter out the labels whose validity period has expired in the detection table;
设定时间可以是根据数学算法计算得到,例如根据线性回归方法对现有数据标签进行处理得到下次遍历时间作为设定时间。此外,再次利用线性回归法计算出校正设定时间后,利用所述校正设定时间对所述设定时间进行校正。这样做的好处就是能够保证得到的校正设定时间是最优遍历时间,能够最大程度降低系统资源的消耗率。也可以人工设置的一个周期,比如三天遍历一次等等,或者在数据量大时设置较短的周期,数据量小时设置较长的周期,能够节省一定的资源。The set time can be calculated according to a mathematical algorithm, for example, the next traversal time is obtained as the set time by processing existing data tags according to a linear regression method. In addition, after the corrected set time is calculated by using the linear regression method again, the set time is corrected by using the corrected set time. The advantage of doing this is that it can ensure that the obtained calibration setting time is the optimal traversal time, and can minimize the consumption rate of system resources. You can also manually set a period, such as traversing once every three days, etc., or set a shorter period when the amount of data is large, and set a longer period when the amount of data is small, which can save certain resources.
在遍历检测表时,可以是在每一个标签存入检测表后就对其进行检测是否有效期满,或者是将所有待检测的标签全部存放在检测表后对其检测,还可以是拥有某同一种属性的标签存放完毕进行检测并由此实行数据的分块检测,这里的同一属性包括日期、数据类型、数据长度、数据存储位置等属性的任意一种。When traversing the detection table, it can be checked whether the validity period of each label has expired after it is stored in the detection table, or it can be tested after all the labels to be detected are stored in the detection table, or it can be a Tags with different attributes are stored for detection and block detection of data is carried out. The same attribute here includes any one of attributes such as date, data type, data length, and data storage location.
筛选是指将有效期满的数据全部挑选出,“挑选”可以将所有有效期满的标签存放在某一存储区域对其进行集中处理,也可以在系统遍历检测数据标签的有效期时,发现有效期满的数据时对其即刻处理;还可以在发现标签的有效期满时对其标记,待遍历结束之后对有标记的标签进行批量处理。Screening refers to selecting all the expired data. "Select" can store all the expired tags in a certain storage area for centralized processing, or find the expired tags when the system traverses the validity period of the detected data tags. When the data is processed, it is processed immediately; it can also be marked when the validity period of the label is found, and the marked labels are processed in batches after the traversal is completed.
S104:删除所述检测表中所述有效期满的标签和其对应的数据。S104: Delete the expired label and its corresponding data in the detection table.
再遍历完检测表中的所有数据后,不管使用S103中的任一种筛选方法总能得到所有有效期满的标签。此时,需要将这些过期的无效数据从存储系统中删除。一般用人工删除或者机器删除。机器删除就是调用删除程序,将系统中有效期满的标签和标签对应的数据全部彻底删除。这样做的好处就是效率较高,错误率极低,而且免去了人工操作,更加自动化智能化。After traversing all the data in the detection table, no matter which screening method in S103 is used, all tags whose expiration date can always be obtained. At this point, these expired and invalid data need to be deleted from the storage system. Generally, it is deleted manually or by machine. Machine deletion is to call the deletion program to completely delete all the labels and the data corresponding to the labels that have expired in the system. The advantage of this is that it has high efficiency, extremely low error rate, and eliminates manual operations, making it more automated and intelligent.
实施例二Embodiment two
参见图2,图2为本申请所提供的的第二种实施例流程图。在上述实施例的基础上,将所述设定时间的算法具体为线性回归方法,对其遍历周期做了更具体的描述。具体方法如下:Referring to FIG. 2, FIG. 2 is a flowchart of the second embodiment provided by the present application. On the basis of the above embodiments, the algorithm for setting the time is specified as a linear regression method, and its ergodic period is described in more detail. The specific method is as follows:
S201:设置数据的有效期标签。S201: Set a validity period label of the data.
对分布式文件存储系统中的数据设置有效期标签,标签中包含数据的名称、数据的逻辑位置信息和数据的有效期。这样,通过标签上的逻辑位置信息可以寻址到数据的实际地址,以便于后续操作。A validity period label is set for the data in the distributed file storage system, and the label includes the name of the data, the logical location information of the data and the validity period of the data. In this way, the actual address of the data can be addressed through the logical location information on the label, so as to facilitate subsequent operations.
S202:将所述有效期标签存到检测表中。S202: Store the expiration date label in a detection table.
在系统中新建空白表数据作为检测表,将S201中设置好的有效期标签移动到空白表中。表中的标签按照其对应数据的地址关系存放,例如数据1和数据2都是A文件夹下的文件,那么在检测表中数据1和数据2的位置关系也应当并列存放于检测表中。这样做的好处是一旦系统在设置有效期标签时发生错误,使得后续操作中找不到某一标签对应的地址,可以借助该标签周围的标签判断其对应的数据,提供了整个系统的容错率。Create a new blank table data in the system as the detection table, and move the validity period label set in S201 to the blank table. The labels in the table are stored according to the address relationship of their corresponding data. For example, both data 1 and data 2 are files under folder A, then the positional relationship between data 1 and data 2 in the detection table should also be stored side by side in the detection table. The advantage of this is that once the system makes an error when setting the validity period label, so that the address corresponding to a certain label cannot be found in subsequent operations, the corresponding data can be judged by the labels around the label, which provides the fault tolerance rate of the entire system.
S203:计算遍历时间。S203: Calculate the traversal time.
通过线性回归方法可以得到系统的遍历时间,这个过程不依赖于有效期标签的设置,我们只需知道每个数据的有效期即可。统计图的横坐标为有效期,纵坐标为标签量,能够得到一个散点图。根据散点图系统可以获得一个最优的时间去遍历检测表中的数据。所谓最优,意即是能单位之间内能够删除的数据量最大。例如,有20个数据的有效期在2号到期,但是1号只有一个1个数据到期,系统会优选的选择在2号遍历检测表中有效期满的过期数据。但是如果在1号和2号同时运行程序遍历,会造成不必要的资源浪费。当然,若是1号的数据同样很多,通过线性回归算法就能算出最优时间,将最近的最优时间作为下次遍历的启动时间。The traversal time of the system can be obtained through the linear regression method. This process does not depend on the setting of the validity period label. We only need to know the validity period of each data. The abscissa of the statistical chart is the validity period, and the ordinate is the label quantity, and a scatter diagram can be obtained. According to the scatter diagram system, an optimal time can be obtained to traverse the data in the detection table. The so-called optimal means that the amount of data that can be deleted within a unit is the largest. For example, if there are 20 pieces of data whose validity period expires on the 2nd, but only one piece of data expires on the 1st, the system will preferably select the expired data whose validity period expires in the traversal detection table on the 2nd. However, if the program traversal is run simultaneously on the 1st and 2nd, it will cause unnecessary waste of resources. Of course, if there is also a lot of data on No. 1, the optimal time can be calculated through the linear regression algorithm, and the nearest optimal time will be used as the start time of the next traversal.
S204:校正遍历时间。S204: Correct the traversal time.
此外,因为需要多次运用线性回归方法计算,而显然根据最新剩余的数据计算出的遍历时间肯定是最优的。因此每次运用线性回归方法计算之后,用最新的计算结果替代以前的最优时间。这种自适应的时间校正能够降低数据筛选进程的性能损耗。In addition, because the linear regression method needs to be used for calculation many times, it is obvious that the traversal time calculated based on the latest remaining data must be optimal. Therefore, after each calculation using the linear regression method, the latest calculation results are used to replace the previous optimal time. This adaptive time correction reduces the performance overhead of the data filtering process.
S205:筛选检测表。S205: Screen the test form.
遍历所述检测表,筛选出所述检测表中有效期满的标签;traversing the detection table, and filtering out labels whose validity period has expired in the detection table;
在遍历检测表的时候,因为分布式文件存储系统中的数据量极其庞大,因此采用分块进行遍历。将系统的数据按地址分类,因此对应的检测表可以是多张,本申请在此不作限定。将某一块的标签全部存放到检测表后,开始遍历检测表中标签的有效期。When traversing the detection table, because the amount of data in the distributed file storage system is extremely large, block traversal is used. The data of the system is classified by address, so there may be multiple corresponding detection tables, which are not limited in this application. After storing all the labels of a certain block in the detection table, start to traverse the validity period of the labels in the detection table.
筛选过程中,对有效期满的标签数据作一个标记,仅作为区分未过有效期的标签和有效期满的标签使用。在遍历结束后,筛选过程同时结束,所有有效期满的标签全部被标记,进行下一步骤。During the screening process, a mark is made on the tag data whose validity period has expired, which is only used to distinguish between tags whose validity period has not expired and tags whose validity period has expired. After the traversal ends, the screening process ends at the same time, and all tags that have expired are marked, and the next step proceeds.
S206:删除所述检测表中所述有效期满的标签和其对应的数据。S206: Delete the label whose validity period has expired and its corresponding data in the detection table.
调用系统中的删除程序,对S203中已经标记的有效期满的标签和其对应的数据作删除处理。The delete program in the system is invoked to delete the expired tags marked in S203 and their corresponding data.
S207:判断检测表是否为空,若不是,返回执行步骤S203;若是,结束流程。S207: Determine whether the detection table is empty, if not, return to step S203; if yes, end the process.
对于一个系统来说,需要周期性的对无效数据进行筛选和自动删除。这里说的“周期性”并不严格限定为固定时间周期的过程,而是指这个过程需要反复出现。For a system, it is necessary to filter and automatically delete invalid data periodically. The "periodical" mentioned here is not strictly limited to the process of a fixed time period, but refers to the need for this process to occur repeatedly.
实施例三Embodiment three
本申请还提供了一种分布式文件存储系统的数据删除装置,所述装置包括:The present application also provides a data deletion device of a distributed file storage system, the device comprising:
标签设置模块100,用于设置数据的有效期标签;The label setting module 100 is used to set the validity period label of the data;
标签映射模块200,用于将所述有效期标签存到检测表中;A label mapping module 200, configured to store the validity period label in a detection table;
标签遍历模块300,用于在设定时间遍历所述检测表,筛选出所述检测表中有效期满的标签;The label traversal module 300 is used to traverse the detection table at a set time, and filter out the labels whose validity period has expired in the detection table;
在标签遍历模块中,还可以包括一个时间计算子模块和时间校正子模块。时间计算子模块用于根据检测表中剩余数据有效期通过线性回归法计算得到遍历检查表的时间;时间校正子模块用于再次利用线性回归法计算出设定时间后,利用该设定时间对原先计算出的设定时间进行校正。In the label traversal module, a time calculation sub-module and a time correction sub-module may also be included. The time calculation sub-module is used to calculate the time for traversing the inspection table through linear regression according to the validity period of the remaining data in the detection table; the time correction sub-module is used to calculate the set time by using the linear regression method again, and use the set time to correct the original time. The calculated set time is corrected.
数据删除模块400,用于删除所述检测表中所述有效期满的标签和其对应的数据。The data deletion module 400 is configured to delete the expired tags and their corresponding data in the detection table.
因此该装置的工作流程如下:Therefore, the workflow of the device is as follows:
1、标签设置模块为文件存储系统中的数据设置有效期标签。1. The label setting module sets validity period labels for the data in the file storage system.
标签设置模块对分布式文件存储系统中的数据设置有效期标签,种类和形式不限。例如可以对某一数据设置一个单独的标签数据,包括数据有效期、数据类型、数据大小、数据位置等信息的一种或若干种,亦可以作为原有数据的一个属性,附加于原有数据的物理位置或者逻辑位置之上。The label setting module sets validity period labels for the data in the distributed file storage system, and the types and forms are not limited. For example, a separate label data can be set for a certain data, including one or several types of information such as data validity period, data type, data size, data location, etc., or it can be used as an attribute of the original data and added to the original data physical location or logical location.
2、标签映射模块将设置好的有效期标签存到检测表中。2. The label mapping module stores the set validity period label in the detection table.
在系统中建立一个检测表,或者利用系统现有的表类型数据,标签映射模块将1中设置好的有效期标签存放在检测表中。这里的存放可以是将已经设置好的标签移动过去,也可以复制一个标签到检测表。从系统存储数据量的角度来看,因为标签已经含有其对应的数据信息,通过该标签可以寻址到数据,所以直接移动标签到检测表会节省空间。Create a detection table in the system, or use the existing table type data in the system, and the label mapping module stores the validity period label set in 1 in the detection table. The storage here can be to move the label that has been set, or to copy a label to the detection table. From the perspective of the amount of data stored in the system, because the tag already contains its corresponding data information, the data can be addressed through the tag, so moving the tag directly to the detection table will save space.
3、标签遍历模块在设定时间遍历所述检测表,并筛选出所有有效期满的标签。其中,时间计算子模块根据检测表中剩余数据有效期通过线性回归法计算得到遍历检查表的时间,并由时间校正子模块用时间计算子模块计算出的时间对原先计算出的设定时间进行校正。3. The label traversal module traverses the detection table at a set time, and screens out all labels whose validity period has expired. Among them, the time calculation sub-module calculates the time for traversing the inspection table through linear regression method according to the validity period of the remaining data in the detection table, and the time correction sub-module uses the time calculated by the time calculation sub-module to correct the previously calculated set time .
设定时间可以是根据数学算法计算得到,例如根据线性回归方法对现有数据标签进行处理得到下次遍历时间作为设定时间。此外,再次利用线性回归法计算出校正设定时间后,利用所述校正设定时间对所述设定时间进行校正。这样做的好处就是能够保证得到的校正设定时间是最优遍历时间,能够最大程度降低系统资源的消耗率。若装置中没有时间计算子模块和时间校正子模块,也可以人工设置的一个周期,比如三天遍历一次等等,或者在数据量大时设置较短的周期,数据量小时设置较长的周期,能够节省一定的资源。The set time can be calculated according to a mathematical algorithm, for example, the next traversal time is obtained as the set time by processing existing data tags according to a linear regression method. In addition, after the corrected set time is calculated by using the linear regression method again, the set time is corrected by using the corrected set time. The advantage of doing this is that it can ensure that the obtained calibration setting time is the optimal traversal time, and can minimize the consumption rate of system resources. If there is no time calculation sub-module and time correction sub-module in the device, a cycle can also be manually set, such as traversing once every three days, etc., or a shorter cycle can be set when the amount of data is large, and a longer cycle can be set when the amount of data is small , can save certain resources.
在遍历检测表时,可以是在每一个标签存入检测表后就对其进行检测是否有效期满,或者是将所有待检测的标签全部存放在检测表后对其检测,还可以是共同拥有某一种属性的标签存放完毕进行检测并由此实行数据的分块检测,这里的同一属性包括日期、数据类型、数据长度、数据存储位置等属性的任意一种。When traversing the detection table, it can be checked whether the validity period of each label has expired after it is stored in the detection table, or it can be detected after all the labels to be detected are stored in the detection table, or it can be shared The label of an attribute is stored and detected, and then the data is segmented and detected. The same attribute here includes any one of attributes such as date, data type, data length, and data storage location.
筛选是指将有效期满的数据全部挑选出,“挑选”可以将所有有效期满的标签存放在某一存储区域对其进行集中处理,也可以在系统遍历检测数据标签的有效期时,发现有效期满的数据时对其即刻处理;还可以在发现标签的有效期满时对其标记,待遍历结束之后对有标记的标签进行批量处理。Screening refers to selecting all the expired data. "Select" can store all the expired tags in a certain storage area for centralized processing, or find the expired tags when the system traverses the validity period of the detected data tags. When the data is processed, it is processed immediately; it can also be marked when the validity period of the label is found, and the marked labels are processed in batches after the traversal is completed.
4、数据删除模块删除3中筛选出的所有有效期满的标签和其对应的数据。4. The data deletion module deletes all tags and their corresponding data that are screened out in 3 and have expired.
数据删除模块会调用删除程序对3中筛选出的所有有效期满的标签和其对应的数据彻底删除。亦可以提示用户人工删除。The data deletion module will call the deletion program to completely delete all expired tags and their corresponding data filtered out in 3. You can also prompt the user to delete manually.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. It should also be noted that in this specification, relative terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations There is no such actual relationship or order between the operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其他实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710463930.4A CN107291878A (en) | 2017-06-19 | 2017-06-19 | A kind of data-erasure method and device of distributed storage file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710463930.4A CN107291878A (en) | 2017-06-19 | 2017-06-19 | A kind of data-erasure method and device of distributed storage file system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107291878A true CN107291878A (en) | 2017-10-24 |
Family
ID=60096560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710463930.4A Pending CN107291878A (en) | 2017-06-19 | 2017-06-19 | A kind of data-erasure method and device of distributed storage file system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107291878A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108965387A (en) * | 2018-06-09 | 2018-12-07 | 西安电子科技大学 | A kind of equalization methods and system improving P2P data storage survivability |
CN109388624A (en) * | 2018-10-09 | 2019-02-26 | 郑州云海信息技术有限公司 | Distributed document delet method, device, system and computer readable storage medium |
CN109766317A (en) * | 2019-01-08 | 2019-05-17 | 浪潮电子信息产业股份有限公司 | A file deletion method, device, device and storage medium |
CN110083451A (en) * | 2019-04-16 | 2019-08-02 | 惠州Tcl移动通信有限公司 | Network parameter management method, device and the storage medium of mobile terminal |
CN114185947A (en) * | 2021-12-16 | 2022-03-15 | 上海金融期货信息技术有限公司 | Memory management method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6088720A (en) * | 1997-07-29 | 2000-07-11 | Lucent Technologies Inc. | Self-cleaning and forwarding feature for electronic mailboxes |
CN105653198A (en) * | 2014-11-13 | 2016-06-08 | 杭州迪普科技有限公司 | Data processing method and device |
CN105653635A (en) * | 2015-12-25 | 2016-06-08 | 北京奇虎科技有限公司 | Database management method and apparatus |
CN106095850A (en) * | 2016-06-02 | 2016-11-09 | 中国联合网络通信集团有限公司 | A kind of data processing method and equipment |
CN106326774A (en) * | 2016-08-30 | 2017-01-11 | 江苏名通信息科技有限公司 | User data processing method |
CN106599199A (en) * | 2016-12-14 | 2017-04-26 | 国云科技股份有限公司 | A data cache and synchronization method |
CN106681837A (en) * | 2016-12-29 | 2017-05-17 | 北京奇虎科技有限公司 | Data sheet based data eliminating method and device |
-
2017
- 2017-06-19 CN CN201710463930.4A patent/CN107291878A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6088720A (en) * | 1997-07-29 | 2000-07-11 | Lucent Technologies Inc. | Self-cleaning and forwarding feature for electronic mailboxes |
CN105653198A (en) * | 2014-11-13 | 2016-06-08 | 杭州迪普科技有限公司 | Data processing method and device |
CN105653635A (en) * | 2015-12-25 | 2016-06-08 | 北京奇虎科技有限公司 | Database management method and apparatus |
CN106095850A (en) * | 2016-06-02 | 2016-11-09 | 中国联合网络通信集团有限公司 | A kind of data processing method and equipment |
CN106326774A (en) * | 2016-08-30 | 2017-01-11 | 江苏名通信息科技有限公司 | User data processing method |
CN106599199A (en) * | 2016-12-14 | 2017-04-26 | 国云科技股份有限公司 | A data cache and synchronization method |
CN106681837A (en) * | 2016-12-29 | 2017-05-17 | 北京奇虎科技有限公司 | Data sheet based data eliminating method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108965387A (en) * | 2018-06-09 | 2018-12-07 | 西安电子科技大学 | A kind of equalization methods and system improving P2P data storage survivability |
CN108965387B (en) * | 2018-06-09 | 2021-04-06 | 西安电子科技大学 | Balancing method and system for improving survivability of P2P data storage |
CN109388624A (en) * | 2018-10-09 | 2019-02-26 | 郑州云海信息技术有限公司 | Distributed document delet method, device, system and computer readable storage medium |
CN109766317A (en) * | 2019-01-08 | 2019-05-17 | 浪潮电子信息产业股份有限公司 | A file deletion method, device, device and storage medium |
CN109766317B (en) * | 2019-01-08 | 2022-04-22 | 浪潮电子信息产业股份有限公司 | File deletion method, device, equipment and storage medium |
CN110083451A (en) * | 2019-04-16 | 2019-08-02 | 惠州Tcl移动通信有限公司 | Network parameter management method, device and the storage medium of mobile terminal |
CN114185947A (en) * | 2021-12-16 | 2022-03-15 | 上海金融期货信息技术有限公司 | Memory management method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107291878A (en) | A kind of data-erasure method and device of distributed storage file system | |
CN103678494B (en) | Client synchronization services the method and device of end data | |
CN110825733B (en) | Multi-sampling-stream-oriented time series data management method and system | |
CN109255055B (en) | A method and device for accessing graph data based on grouping association table | |
JP2019220195A (en) | System and method for implementing data storage service | |
CN106354729B (en) | Graph data processing method, device and system | |
TW201636888A (en) | Multi-cluster management method and device | |
CN106708565A (en) | Method and device for removing useless picture resources in APK (Android Package) | |
CN105893542A (en) | Method and system for redistributing cold data files in cloud storage system | |
CN104361025B (en) | A kind of multi-source Spatial Data fusion and integrated method | |
CN110335022A (en) | Automatic audit method, device, equipment and storage medium | |
CN104572856A (en) | Converged storage method of service source data | |
CN104113605A (en) | Enterprise cloud application development monitoring processing method | |
CN108717457A (en) | A kind of e-commerce platform big data processing method and system | |
CN118689860A (en) | A data processing method, server, medium and program product for electricity consumption data | |
CN105577763A (en) | A dynamic copy consistency maintenance system, method and cloud storage platform | |
CN107480268A (en) | Data query method and device | |
GB2502076A (en) | Managing memory in a computer system | |
CN110858912A (en) | Streaming media caching method and system, caching policy server and streaming service node | |
CN114328759A (en) | Data construction and management method and terminal for data warehouse | |
CN111061802B (en) | Power data management processing method, device and storage medium | |
CN109885642A (en) | Classification storage method and device towards full-text search | |
CN108574718A (en) | A method and device for creating a cloud host | |
CN107632926B (en) | Service quantity statistical method, device, equipment and computer readable storage medium | |
CN114676961A (en) | Enterprise external migration risk prediction method and device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171024 |