CN111897490B

CN111897490B - Method and device for deleting data

Info

Publication number: CN111897490B
Application number: CN202010653945.9A
Authority: CN
Inventors: 邵华西; 李阳; 李扬
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2024-06-11
Anticipated expiration: 2040-07-08
Also published as: CN111897490A

Abstract

The embodiment of the specification provides a method and a device for deleting data, wherein the method for deleting data comprises the following steps: recording a main key of first designated data to be deleted by a first deleting task and/or a main key of first associated data associated with the main key of the first designated data in a deleting transaction log; extracting a primary key of the first specified data and/or a primary key of the first associated data from the deletion transaction log under the condition that the first deletion task is abnormally exited; deleting the first specified data according to the extracted primary key of the first specified data, and/or deleting the first associated data according to the extracted primary key of the first associated data.

Description

Method and device for deleting data

技术领域Technical Field

本说明书实施例涉及数据处理领域，特别涉及一种删除数据的方法。本说明书一个或者多个实施例同时涉及一种删除数据的装置，一种计算设备，以及一种计算机可读存储介质。The embodiments of this specification relate to the field of data processing, and more particularly to a method for deleting data. One or more embodiments of this specification also relate to a device for deleting data, a computing device, and a computer-readable storage medium.

背景技术Background technique

随着互联网技术的快速发展，人类进入万物互联的工业互联网时代，智能机器与人类、机器与机器之间的广泛互联产生了海量的数据，如带有时间信息和空间信息的时空数据等。With the rapid development of Internet technology, mankind has entered the era of industrial Internet where everything is connected. The extensive interconnection between intelligent machines and humans, and between machines, has generated massive amounts of data, such as spatiotemporal data with time and space information.

为了支持对大数据的存储、分析和计算，可以根据应用场景采用与之相适应的大数据存储计算架构，例如，基于Geomesa、Spark、Hbase的时空大数据架构等。但是，这些大数据存储计算架构在对数据进行删除时，如果删除任务由于一些错误而异常退出，可能出现数据在一部分数据表中已经删除，这些数据被数据库在逻辑上标志为已经删除，但是在另外一些数据表中未被删除的情况。这些数据就成为了异常数据，对数据安全产生威胁。In order to support the storage, analysis and calculation of big data, a big data storage and computing architecture that is suitable for the application scenario can be adopted, such as the spatiotemporal big data architecture based on Geomesa, Spark, and Hbase. However, when deleting data in these big data storage and computing architectures, if the deletion task exits abnormally due to some errors, data may have been deleted in some data tables, and these data are logically marked as deleted by the database, but have not been deleted in other data tables. These data become abnormal data, posing a threat to data security.

发明内容Summary of the invention

有鉴于此，本说明书施例提供了一种删除数据的方法。本说明书一个或者多个实施例同时涉及一种删除数据的装置，一种计算设备，以及一种计算机可读存储介质，以解决现有技术中存在的技术缺陷。In view of this, the present specification provides a method for deleting data. One or more embodiments of the present specification also relate to a device for deleting data, a computing device, and a computer-readable storage medium to solve the technical defects existing in the prior art.

根据本说明书实施例的第一方面，提供了一种删除数据的方法，包括：在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键；在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键；根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除。According to a first aspect of an embodiment of the present specification, a method for deleting data is provided, comprising: recording in a deletion transaction log a primary key of first designated data to be deleted by a first deletion task and/or a primary key of first associated data associated with the primary key of the first designated data; in the event that the first deletion task exits abnormally, extracting from the deletion transaction log the primary key of the first designated data and/or the primary key of the first associated data; deleting the first designated data according to the extracted primary key of the first designated data, and/or deleting the first associated data according to the extracted primary key of the first associated data.

可选地，还包括：查找出第二指定数据表的关联数据表；依据所述第二指定数据表的主键，生成与所述第二指定数据表的主键关联的第二关联数据的主键；从所述第二指定数据表的关联数据表中，查找出不在所述第二关联数据的主键范围内的数据；将所述第二指定数据表的关联数据表中，不在所述第二关联数据的主键范围内的数据删除。Optionally, the method further includes: finding out the associated data table of the second specified data table; generating a primary key of the second associated data associated with the primary key of the second specified data table based on the primary key of the second specified data table; finding out data that is not within the range of the primary key of the second associated data from the associated data table of the second specified data table; and deleting data that is not within the range of the primary key of the second associated data from the associated data table of the second specified data table.

可选地，还包括：对第二删除任务指定删除的第二指定数据表进行删除合法性检测；如果所述删除合法性检测通过，进入所述查找出第二指定数据表的关联数据表的步骤。Optionally, the method further includes: performing a deletion legitimacy check on the second designated data table to be deleted by the second deletion task; if the deletion legitimacy check passes, entering the step of finding out the associated data table of the second designated data table.

可选地，所述将第二指定数据表的关联数据表中，不在所述第二关联数据的主键范围内的数据删除包括：将所述第二指定数据表的关联数据表中，不在所述第二关联数据的主键范围内的数据分批量并发删除。Optionally, deleting the data in the associated data table of the second designated data table that is not within the primary key range of the second associated data includes: concurrently deleting the data in the associated data table of the second designated data table that is not within the primary key range of the second associated data in batches.

可选地，所述第一删除任务用于先删除所述第一关联数据的主键对应的数据，再删除所述第一指定数据的主键对应的数据；所述根据提取出的所述第一指定数据的主键以及所述第一关联数据的主键，将所述第一指定数据以及所述第一关联数据删除包括：先根据所述第一关联数据的主键，将所述第一关联数据删除；再根据提取出的所述第一指定数据的主键，将所述第一指定数据删除。Optionally, the first deletion task is used to first delete the data corresponding to the primary key of the first associated data, and then delete the data corresponding to the primary key of the first specified data; deleting the first specified data and the first associated data based on the extracted primary key of the first specified data and the primary key of the first associated data includes: first deleting the first associated data based on the primary key of the first associated data; and then deleting the first specified data based on the extracted primary key of the first specified data.

可选地，所述第一删除任务包括并发执行的多个删除任务。Optionally, the first deletion task includes multiple deletion tasks executed concurrently.

可选地，还包括：基于生产者消费者模型，并发执行多个第二删除任务，所述多个第二删除任务用于删除第二指定数据表。Optionally, the method further includes: based on a producer-consumer model, concurrently executing a plurality of second deletion tasks, wherein the plurality of second deletion tasks are used to delete a second designated data table.

可选地，所述方法应用于基于Spark作为计算层的大数据架构；所述基于生产者消费者模型，并发执行多个第二删除任务包括：基于Spark接口从数据库获取所述第二指定数据表的主键到Spark主节点；Spark主节点作为生产者将所述第二指定数据表的主键分批量发给多个消费者队列；所述多个消费者队列分别根据接收到的主键生成所述第二关联数据的主键；所述多个消费者队列分别针对接收到的主键以及所述第二关联数据的主键，生成对应于所述第二删除任务的删除请求；所述多个消费者队列分别并发向数据库发送所述删除请求。所述方法还包括：所述多个消费者队列分别将接收到的主键以及对应生成的所述第二关联数据的主键写入所述删除事务日志。Optionally, the method is applied to a big data architecture based on Spark as the computing layer; the producer-consumer model-based, concurrent execution of multiple second deletion tasks includes: obtaining the primary key of the second specified data table from the database to the Spark master node based on the Spark interface; the Spark master node, as a producer, sends the primary key of the second specified data table to multiple consumer queues in batches; the multiple consumer queues generate the primary key of the second associated data according to the received primary key; the multiple consumer queues generate deletion requests corresponding to the second deletion task for the received primary key and the primary key of the second associated data respectively; the multiple consumer queues send the deletion requests to the database concurrently respectively. The method also includes: the multiple consumer queues write the received primary key and the corresponding generated primary key of the second associated data into the deletion transaction log respectively.

可选地，所述删除事务日志，用于以内存文件映射方式进行日志记录。Optionally, the deletion transaction log is used to perform log recording in a memory file mapping manner.

根据本说明书实施例的第二方面，提供了一种删除数据的装置，包括：删除日志记录模块，被配置为在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键。主键提取模块，被配置为在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键。删除恢复模块，被配置为根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除。According to a second aspect of the embodiments of this specification, there is provided a device for deleting data, comprising: a deletion log recording module, configured to record in a deletion transaction log the primary key of the first designated data to be deleted by the first deletion task and/or the primary key of the first associated data associated with the primary key of the first designated data. A primary key extraction module, configured to extract the primary key of the first designated data and/or the primary key of the first associated data from the deletion transaction log when the first deletion task exits abnormally. A deletion recovery module, configured to delete the first designated data according to the extracted primary key of the first designated data, and/or to delete the first associated data according to the extracted primary key of the first associated data.

根据本说明书实施例的第三方面，提供了一种计算设备，包括：存储器和处理器；所述存储器用于存储计算机可执行指令，所述处理器用于执行所述计算机可执行指令：在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键；在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键；根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除。According to a third aspect of an embodiment of the present specification, a computing device is provided, comprising: a memory and a processor; the memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions: recording in a deletion transaction log the primary key of a first designated data to be deleted by a first deletion task and/or the primary key of a first associated data associated with the primary key of the first designated data; in the event that the first deletion task exits abnormally, extracting from the deletion transaction log the primary key of the first designated data and/or the primary key of the first associated data; deleting the first designated data according to the extracted primary key of the first designated data, and/or deleting the first associated data according to the extracted primary key of the first associated data.

根据本说明书实施例的第四方面，提供了一种计算机可读存储介质，其存储有计算机指令，该指令被处理器执行时实现本说明书任意一实施例所述删除数据的方法的步骤。According to a fourth aspect of the embodiments of this specification, a computer-readable storage medium is provided, which stores computer instructions, and when the instructions are executed by a processor, the steps of the method for deleting data described in any embodiment of this specification are implemented.

本说明书一个实施例实现了删除数据的方法，由于该方法在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键，在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键，根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除，从而在删除任务异常退出而导致异常数据产生的情况下，能够根据记录的删除事务日志进行异常数据清理，保证删除成功，提高数据的安全性。An embodiment of the present specification implements a method for deleting data. Since the method records the primary key of the first specified data to be deleted by the first deletion task and/or the primary key of the first associated data associated with the primary key of the first specified data in a deletion transaction log, when the first deletion task exits abnormally, the primary key of the first specified data and/or the primary key of the first associated data are extracted from the deletion transaction log, and the first specified data is deleted according to the extracted primary key of the first specified data, and/or the first associated data is deleted according to the extracted primary key of the first associated data. Therefore, when the deletion task exits abnormally and abnormal data is generated, the abnormal data can be cleaned up according to the recorded deletion transaction log, thereby ensuring successful deletion and improving data security.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本说明书一个实施例提供的一种删除数据的方法的流程图；FIG1 is a flow chart of a method for deleting data provided by an embodiment of the present specification;

图2是本说明书另一个实施例提供的一种删除数据的方法的流程图；FIG2 is a flow chart of a method for deleting data provided by another embodiment of the present specification;

图3是本说明书又一个实施例提供的一种删除数据的方法的流程图；FIG3 is a flow chart of a method for deleting data provided by another embodiment of the present specification;

图4是本说明书一个实施例提供的一种时空大数据架构示意图；FIG4 is a schematic diagram of a spatiotemporal big data architecture provided by an embodiment of this specification;

图5是本说明书再一个实施例提供的一种删除数据的方法的流程图；FIG5 is a flow chart of a method for deleting data provided by yet another embodiment of the present specification;

图6是本说明书一个实施例提供的一种删除数据的装置的结构框图；FIG6 is a structural block diagram of a device for deleting data provided by an embodiment of the present specification;

图7是本说明书另一个实施例提供的一种删除数据的装置的结构框图；FIG7 is a structural block diagram of a device for deleting data provided by another embodiment of the present specification;

图8是本说明书一个实施例提供的一种计算设备的结构框图。FIG8 is a structural block diagram of a computing device provided by an embodiment of the present specification.

具体实施方式Detailed ways

在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本说明书内涵的情况下做类似推广，因此本说明书不受下面公开的具体实施的限制。Many specific details are described in the following description to facilitate a full understanding of this specification. However, this specification can be implemented in many other ways than those described herein, and those skilled in the art can make similar generalizations without violating the connotation of this specification, so this specification is not limited to the specific implementation disclosed below.

在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in one or more embodiments of this specification are only for the purpose of describing specific embodiments, and are not intended to limit one or more embodiments of this specification. The singular forms of "a", "said" and "the" used in one or more embodiments of this specification and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used in one or more embodiments of this specification refers to and includes any or all possible combinations of one or more associated listed items.

应当理解，尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书一个或多个实施例范围的情况下，第一也可以被称为第二，类似地，第二也可以被称为第一。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of this specification, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of one or more embodiments of this specification, the first may also be referred to as the second, and similarly, the second may also be referred to as the first. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".

在本说明书中，提供了一种删除数据的方法，本说明书同时涉及一种删除数据的装置，一种计算设备，以及一种计算机可读存储介质，在下面的实施例中逐一进行详细说明。In this specification, a method for deleting data is provided. This specification also relates to an apparatus for deleting data, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

图1示出了根据本说明书一个实施例提供的一种删除数据的方法的流程图，包括步骤110至步骤130。FIG. 1 shows a flow chart of a method for deleting data according to an embodiment of the present specification, including steps 110 to 130 .

步骤110：在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键。Step 110: Recording the primary key of the first designated data to be deleted by the first deletion task and/or the primary key of the first associated data associated with the primary key of the first designated data in the deletion transaction log.

主键，是数据表中能够唯一标识表中一条记录的标识，例如，可以是每一行数据的列值。The primary key is an identifier that can uniquely identify a record in a data table. For example, it can be the column value of each row of data.

本说明书实施例对被删除的数据以及删除条件并不进行限制。例如，所述第一指定数据可以是面向关系型数据、key-value或其他结构的数据。The embodiments of this specification do not limit the deleted data and the deletion conditions. For example, the first specified data may be relational data, key-value data or other structured data.

为了确保删除事务日志能够有效的记录下来，例如，可以以内存文件映射方式进行日志记录。内存文件映射技术将日志持久化写入磁盘的操作托管给操作系统，如果向数据仓库发送命令删除数据的过程中程序异常退出，操作系统可以把当前在内存中的日志写到磁盘，不会因仅仅将数据写入内存，没有真正持久化到磁盘而影响数据安全性。In order to ensure that the deletion transaction log can be effectively recorded, for example, the log can be recorded in memory file mapping mode. The memory file mapping technology entrusts the operation of persisting the log to the disk to the operating system. If the program exits abnormally during the process of sending a command to the data warehouse to delete data, the operating system can write the log currently in the memory to the disk, and the data security will not be affected by only writing the data to the memory without actually persisting it to the disk.

可选地，例如，还可以在所述删除任务成功的情况下，在所述删除事务日志中清除所述删除任务的记录，防止存储空间的浪费。Optionally, for example, when the deletion task is successful, the record of the deletion task may be cleared from the deletion transaction log to prevent waste of storage space.

为了提高删除效率，例如，所述第一删除任务可以包括并发执行的多个删除任务。In order to improve deletion efficiency, for example, the first deletion task may include multiple deletion tasks executed concurrently.

为了支持在删除任务异常退出后，能够对存在于数据库中的异常数据进行删除，本说明书实施例提供的方法，可以针对删除任务进行事务日志的记录。对于每次删除任务，都可以在删除事务日志路径下，以时间为路径名为本次删除任务建立一个专门的日志存储路径，用于记录删除任务的指定删除的数据表和关联数据表。In order to support the deletion of abnormal data in the database after the deletion task exits abnormally, the method provided in the embodiment of this specification can record the transaction log for the deletion task. For each deletion task, a special log storage path can be established for this deletion task under the deletion transaction log path with time as the path name, which is used to record the data table and related data table specified for deletion of the deletion task.

步骤120：在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键。Step 120: When the first deletion task exits abnormally, extract the primary key of the first designated data and/or the primary key of the first associated data from the deletion transaction log.

步骤130：根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除。Step 130: deleting the first designated data according to the extracted primary key of the first designated data, and/or deleting the first associated data according to the extracted primary key of the first associated data.

由于该方法在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键，在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键，根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除，从而在删除任务异常退出而导致异常数据产生的情况下，能够根据记录的删除事务日志进行异常数据清理，保证删除成功，提高数据的安全性。Since the method records the primary key of the first designated data to be deleted by the first deletion task and/or the primary key of the first associated data associated with the primary key of the first designated data in the deletion transaction log, when the first deletion task exits abnormally, the primary key of the first designated data and/or the primary key of the first associated data are extracted from the deletion transaction log, and the first designated data is deleted according to the extracted primary key of the first designated data, and/or the first associated data is deleted according to the extracted primary key of the first associated data. Therefore, when the deletion task exits abnormally and abnormal data is generated, the abnormal data can be cleaned up according to the recorded deletion transaction log, thereby ensuring successful deletion and improving data security.

例如，在删除任务异常退出后，由于原始数据库中的数据被删除，而部分关联数据仍未被删除，各表的索引无法对应这些未被删除的关联数据，造成数据安全问题，根据本说明书实施例提供的方法，根据删除事务日志中记录的关联数据的主键，可以将残留的关联数据删除，保证删除成功，提高数据的安全性。For example, after a deletion task exits abnormally, since the data in the original database is deleted, but some related data is not deleted, the indexes of each table cannot correspond to these undeleted related data, causing data security issues. According to the method provided in the embodiments of this specification, the remaining related data can be deleted based on the primary key of the related data recorded in the deletion transaction log, thereby ensuring successful deletion and improving data security.

本说明书一个或多个实施例中，在执行删除任务之前，为了进一步清除该删除任务相关的数据表残存的异常数据，可以首先检测删除任务对数据表删除的合法性，校验通过后会检测和删除指定数据表内已经存在的异常数据。具体地，图2示出了根据本说明书另一个实施例提供的一种删除数据的方法的流程图，如图2所示，所述方法还包括步骤101至步骤105。In one or more embodiments of the present specification, before executing a deletion task, in order to further clear the abnormal data remaining in the data table related to the deletion task, the legality of the deletion of the data table by the deletion task can be first detected, and after the verification is passed, the abnormal data already existing in the specified data table will be detected and deleted. Specifically, FIG2 shows a flow chart of a method for deleting data provided according to another embodiment of the present specification. As shown in FIG2, the method also includes steps 101 to 105.

步骤101：对第二删除任务指定删除的第二指定数据表进行删除合法性检测。Step 101: Perform a deletion legitimacy check on the second designated data table to be deleted by the second deletion task.

需要说明的是，所述第二删除任务与所述第一删除任务可以是相同任务，也可以是不同任务。所述第二指定数据表与所述第一指定数据可以指相同数据，也可以指不同数据。It should be noted that the second deletion task and the first deletion task may be the same task or different tasks. The second designated data table and the first designated data may refer to the same data or different data.

例如，所述删除合法性检测可以包括：数据表所在数据库是否存在的检测、数据表是否存在的检测、删除调用方是否存在删除权限的检测、删除的过滤条件中所涉及的数据在数据表中是否存在的检测等中的任一项或多项。For example, the deletion legitimacy check may include any one or more of: checking whether the database where the data table is located exists, checking whether the data table exists, checking whether the deletion caller has deletion authority, checking whether the data involved in the deletion filter conditions exists in the data table, etc.

步骤102：如果所述删除合法性检测通过，查找出所述第二指定数据表的关联数据表。Step 102: If the deletion legality check passes, find out the associated data table of the second designated data table.

例如，可以根据大数据架构提供的关联表的表名的规则，生成第二指定数据表的关联数据表表名，再向数据库发送请求查找该表名对应的关联数据表。For example, the name of the associated data table of the second specified data table may be generated according to the table name rule of the associated table provided by the big data architecture, and then a request may be sent to the database to search for the associated data table corresponding to the table name.

步骤103：依据所述第二指定数据表的主键，生成与所述第二指定数据表的主键关联的第二关联数据的主键。Step 103: Generate a primary key of second associated data associated with the primary key of the second designated data table according to the primary key of the second designated data table.

例如，可以根据大数据架构提供的关联表的主键的生成规则，生成第二指定数据表的主键关联的第二关联数据的主键。For example, the primary key of the second associated data associated with the primary key of the second specified data table may be generated according to the generation rule of the primary key of the associated table provided by the big data architecture.

步骤104：从所述第二指定数据表的关联数据表中，查找出不在所述第二关联数据的主键范围内的数据。Step 104: Searching for data that is not within the primary key range of the second associated data from the associated data table of the second designated data table.

步骤105：将所述第二指定数据表的关联数据表中，不在所述第二关联数据的主键范围内的数据删除。Step 105: Delete the data in the associated data table of the second designated data table that is not within the primary key range of the second associated data.

例如，为了进一步提高删除效率，可以通过分批量并发删除的方式来删除数据。具体地，例如，可以将所述第二指定数据表的关联数据表中，不在所述第二关联数据的主键范围内的数据分批量并发删除。For example, to further improve deletion efficiency, data may be deleted in batches and concurrently. Specifically, for example, data in the associated data table of the second designated data table that is not within the primary key range of the second associated data may be deleted in batches and concurrently.

在该实施例中，由于先检测第二删除任务对第二指定数据表进行删除的合法性，从而在检测通过的情况下，能够确定可以进一步根据第二删除任务的相关信息进行残存异常数据的清除。因此，在检测通过的情况下，根据第二指定数据表，查找出与第二指定数据表存在关联的关联数据表。可以理解的是，如果有删除任务异常退出，该第二指定数据表的关联数据表内，可能存在未能成功删除的残存的异常数据。为了能够清除这些异常数据，该实施例根据第二指定数据表的主键，生成与该主键存在关联的第二关联数据的主键，也即正常应存在的关联数据的主键。可以理解的是，不在正常应存在的关联数据的主键范围内的数据则为残存的异常数据，因此，该实施例从第二指定数据表的关联数据表中，查找出不在第二关联数据的主键范围内的数据，并将其删除，从而实现了将数据库表内残存的异常数据进行监测和删除的目的。In this embodiment, since the legality of the second deletion task deleting the second designated data table is first detected, it can be determined that the residual abnormal data can be further removed according to the relevant information of the second deletion task when the detection is passed. Therefore, when the detection is passed, the associated data table associated with the second designated data table is found out according to the second designated data table. It can be understood that if a deletion task exits abnormally, there may be residual abnormal data that has not been successfully deleted in the associated data table of the second designated data table. In order to be able to remove these abnormal data, this embodiment generates the primary key of the second associated data associated with the primary key according to the primary key of the second designated data table, that is, the primary key of the associated data that should normally exist. It can be understood that the data that is not within the primary key range of the associated data that should normally exist is the residual abnormal data. Therefore, this embodiment finds out the data that is not within the primary key range of the second associated data from the associated data table of the second designated data table, and deletes it, thereby achieving the purpose of monitoring and deleting the residual abnormal data in the database table.

考虑到在对数据进行删除时，关联数据表的主键可以通过指定数据表的主键进行拼接转换得到，先删除关联数据表中的数据，再删除指定数据表的数据，这样，即使删除关联数据表失败，仍然可以通过指定数据表的主键进行再次删除。因此，本说明书一个或多个实施例中，所述第一删除任务用于先删除所述第一关联数据的主键对应的数据，再删除所述第一指定数据的主键对应的数据。所述根据提取出的所述第一指定数据的主键以及所述第一关联数据的主键，将所述第一指定数据以及所述第一关联数据删除可以包括：先根据所述第一关联数据的主键，将所述第一关联数据删除；再根据提取出的所述第一指定数据的主键，将所述第一指定数据删除。Considering that when deleting data, the primary key of the associated data table can be obtained by concatenating and converting the primary key of the specified data table, the data in the associated data table is deleted first, and then the data in the specified data table is deleted. In this way, even if the deletion of the associated data table fails, it can still be deleted again through the primary key of the specified data table. Therefore, in one or more embodiments of the present specification, the first deletion task is used to first delete the data corresponding to the primary key of the first associated data, and then delete the data corresponding to the primary key of the first specified data. Deleting the first specified data and the first associated data according to the extracted primary key of the first specified data and the primary key of the first associated data may include: first deleting the first associated data according to the primary key of the first associated data; and then deleting the first specified data according to the extracted primary key of the first specified data.

本说明书一个或多个实施例中，为了提高删除效率，所述方法还可以基于生产者消费者模型，并发执行多个第二删除任务，所述多个第二删除任务用于删除第二指定数据表。In one or more embodiments of the present specification, in order to improve deletion efficiency, the method may also concurrently execute multiple second deletion tasks based on a producer-consumer model, where the multiple second deletion tasks are used to delete the second designated data table.

例如，所述方法可以应用于基于Spark作为计算层的大数据架构。图3示出了根据本说明书又一个实施例提供的删除数据的方法的流程图，如图3所示，所述方法还可以包括步骤140至步骤145。For example, the method can be applied to a big data architecture based on Spark as a computing layer. FIG3 shows a flowchart of a method for deleting data according to another embodiment of the present specification. As shown in FIG3 , the method can also include steps 140 to 145 .

步骤140：基于Spark接口从数据库获取所述第二指定数据表的主键到Spark主节点。Step 140: Based on the Spark interface, obtain the primary key of the second specified data table from the database to the Spark master node.

步骤141：Spark主节点作为生产者将所述第二指定数据表的主键分批量发给多个消费者队列。Step 141: The Spark master node, as a producer, sends the primary key of the second designated data table to multiple consumer queues in batches.

步骤142：所述多个消费者队列分别根据接收到的主键生成所述第二关联数据的主键。Step 142: The multiple consumer queues respectively generate primary keys for the second associated data according to the received primary keys.

步骤143：所述多个消费者队列分别将接收到的主键以及对应生成的所述第二关联数据的主键写入所述删除事务日志。Step 143: The multiple consumer queues respectively write the received primary key and the correspondingly generated primary key of the second associated data into the deletion transaction log.

步骤144：所述多个消费者队列分别针对接收到的主键以及所述第二关联数据的主键，生成对应于所述第二删除任务的删除请求。Step 144: The multiple consumer queues generate deletion requests corresponding to the second deletion task respectively for the received primary key and the primary key of the second associated data.

步骤145：所述多个消费者队列分别并发向数据库发送所述删除请求。Step 145: The multiple consumer queues send the deletion request to the database concurrently.

可见，在该实施例中，由于将Spark主节点作为生产者，将指定删除的数据表的主键分批量发给多个消费者队列，从而多个消费者队列分别并发向数据库发送删除请求，使多个删除请求各自对应的删除任务相互独立、并发执行，极大发挥了删除任务的处理能力，提高了删除效率。It can be seen that in this embodiment, since the Spark master node is used as the producer, the primary keys of the data table to be deleted are sent to multiple consumer queues in batches, so that the multiple consumer queues send deletion requests to the database concurrently, so that the deletion tasks corresponding to the multiple deletion requests are independent of each other and executed concurrently, which greatly exerts the processing capacity of the deletion tasks and improves the deletion efficiency.

下面，对结合了本说明书多个实施例的一种实施方式进行详细说明。例如，本说明书实施例所述删除数据的方法可以应用于如图4所示基于Geomesa、Spark、Hbase的时空大数据架构中。该大数据架构的客户接入方式灵活，例如，可以接入JDBC访问、Beeline访问等。其中，Spark、Geomesa作为计算层，Hbase作为存储层。Spark，是一种广泛应用的大规模数据处理而设计的快速通用的计算引擎。Hbase，是分布式的面向列的开源数据库。Geomesa：一种开源的进行时空数据处理的工具包。底层存储类型不限，例如：EXT4文件系统、NTFS文件系统等多种文件系统。Below, an implementation method that combines multiple embodiments of this specification is described in detail. For example, the method for deleting data described in the embodiments of this specification can be applied to the spatiotemporal big data architecture based on Geomesa, Spark, and Hbase as shown in Figure 4. The client access method of this big data architecture is flexible, for example, JDBC access, Beeline access, etc. can be accessed. Among them, Spark and Geomesa are used as the computing layer, and Hbase is used as the storage layer. Spark is a fast and general computing engine designed for large-scale data processing that is widely used. Hbase is a distributed column-oriented open source database. Geomesa: An open source toolkit for spatiotemporal data processing. The underlying storage type is not limited, for example: EXT4 file system, NTFS file system and other file systems.

基于图4所示时空大数据架构，图5示出了根据本说明书一个实施例提供的删除数据的方法的流程图，如图5所示，所述方法包括步骤502至步骤528。Based on the spatiotemporal big data architecture shown in FIG. 4 , FIG. 5 shows a flowchart of a method for deleting data provided according to an embodiment of the present specification. As shown in FIG. 5 , the method includes steps 502 to 528 .

步骤502：对第二指定数据表进行删除合法性检测。Step 502: Perform deletion legitimacy check on the second designated data table.

例如，在该实施例中，可以通过调用Hbase接口查询第二指定数据表的数据库是否存在；调用Hbase接口查询第二指定数据表是否存在；调用Hbase接口判断删除调用方对于第二指定数据表是否具有删除权限；判断第二删除任务的过滤条件中所涉及的列在指定数据库表中是否存在。更具体地，例如，本说明书实施例提供的方法可以应用于基于图4所示的时空数据仓库，删除语句可以采用数据库领域常用的SQL语言来描述过滤条件，过滤条件可以通过列的约束来指定，也可以通过复合函数的方式来组成。例如：“where id>0andname＝’hello’”、“where function(id,name)>0and user_defined_function(id，name，id)<1”，根据删除语句，可以检测数据库表中是否存在“id”和“name”这两个列名对应的列。For example, in this embodiment, the existence of the database of the second specified data table can be queried by calling the Hbase interface; the existence of the second specified data table can be queried by calling the Hbase interface; the existence of the second specified data table can be determined by calling the Hbase interface whether the deletion caller has the deletion authority for the second specified data table; and the existence of the columns involved in the filtering conditions of the second deletion task can be determined in the specified database table. More specifically, for example, the method provided in the embodiment of this specification can be applied to the spatiotemporal data warehouse based on Figure 4. The deletion statement can use the SQL language commonly used in the database field to describe the filtering conditions. The filtering conditions can be specified by column constraints or composed in the form of composite functions. For example: "where id>0andname＝'hello'", "where function(id,name)>0and user_defined_function(id,name,id)<1", according to the deletion statement, it can be detected whether there are columns corresponding to the two column names "id" and "name" in the database table.

步骤504：如果删除合法性检测通过，通过Hbase数据库查找第二指定数据表的关联数据表。Step 504: If the deletion legality check passes, search the Hbase database for the associated data table of the second designated data table.

例如，可以根据关联数据表的表名的规则，调用Geomesa接口，根据指定数据表的类型判断是否存在与指定数据表相关联的关联数据表，如时空索引数据表、普通Btree索引数据表等，并向Hbase数据库发送请求确认关联数据表在Hbase中是否存在。具体地，如Geomesa是一套开源的时空计算项目，在Geomesa的源码中给出了给定原始数据表生成关联数据表(比如索引表)的表名的规则。例如，对于数据表“gdeltable”，根据关联数据表的表名的规则，在数据库中会对应生成以下表名的关联数据表：For example, according to the rules of the table names of associated data tables, the Geomesa interface can be called to determine whether there are associated data tables associated with the specified data table, such as spatiotemporal index data tables, ordinary Btree index data tables, etc., according to the type of the specified data table, and send a request to the Hbase database to confirm whether the associated data table exists in Hbase. Specifically, as Geomesa is an open source spatiotemporal computing project, the rules for generating the table names of associated data tables (such as index tables) given the original data table are given in the Geomesa source code. For example, for the data table "gdeltable", according to the rules of the table names of associated data tables, the associated data tables with the following table names will be generated in the database:

gdeltable_gdelt_id(id表)gdeltable_gdelt_id (id table)

gdeltable_gdelt_z2_v2(z2索引表)gdeltable_gdelt_z2_v2 (z2 index table)

gdeltable_gdelt_z3_v2(z3索引表)gdeltable_gdelt_z3_v2 (z3 index table)

步骤506：提取第二指定数据表的主键。Step 506: Extract the primary key of the second specified data table.

例如，关联数据表查找完毕后，可以通过Spark调用Hbase接口，将指定数据表的全部数据拉取到Spark主节点。Spark主节点可以提取指定数据表中列的主键，将其写入删除事务日志。例如，删除事务日志可以持久化到本地文件系统、分布式文件系统或Hbase临时表中其中之一。可以理解的是，由于在步骤506到步骤510删除异常数据的过程中，存在删除失败退出的可能，在失败的情况下，根据持久化的删除事务日志，可以恢复删除，提高性能。在另一个实施方式中，也可以不将步骤506的主键写入删除事务日志，而是通过步骤506到步骤510的重复执行，也可以实现对残留异常数据的删除。For example, after the associated data table is searched, the Hbase interface can be called through Spark to pull all the data of the specified data table to the Spark master node. The Spark master node can extract the primary key of the column in the specified data table and write it into the deletion transaction log. For example, the deletion transaction log can be persisted to one of the local file system, the distributed file system, or the Hbase temporary table. It is understandable that, since there is a possibility of exiting due to deletion failure in the process of deleting abnormal data from step 506 to step 510, in the case of failure, the deletion can be restored according to the persistent deletion transaction log to improve performance. In another embodiment, the primary key of step 506 may not be written into the deletion transaction log, but by repeatedly executing steps 506 to step 510, the deletion of residual abnormal data can also be achieved.

步骤508：生成第二指定数据表的第二关联数据的主键。Step 508: Generate a primary key of the second associated data of the second specified data table.

例如，指定数据表主键提取完成后，可以通过Spark调用Geomesa接口，依据指定数据表的主键列，在本地生成与其对应的关联数据库表主键列。具体地，例如，在Geomesa的源码中给出了给定原始主键，计算关联列的生成规则，根据该生成规则，可以依据指定数据表主键生成关联数据库表主键。如，对于一条包含空间数据点的数据，原始主键为“FeatureID”，其关联的“空间Z2索引数据表”的主键的生成规则，为geomea中定义的：散列键+Z2索引值+FeatureID。For example, after the primary key of the specified data table is extracted, the Geomesa interface can be called through Spark to generate the corresponding primary key column of the associated database table locally based on the primary key column of the specified data table. Specifically, for example, the source code of Geomesa provides the generation rules for calculating the associated column given the original primary key. According to the generation rules, the primary key of the associated database table can be generated based on the primary key of the specified data table. For example, for a piece of data containing spatial data points, the original primary key is "FeatureID", and the generation rule of the primary key of the associated "spatial Z2 index data table" is defined in geomea: hash key + Z2 index value + FeatureID.

步骤510：将第二指定数据表的关联数据表中，不在第二关联数据的主键范围内的数据作为异常数据，且将其并行化删除。Step 510: In the associated data table of the second designated data table, data that is not within the primary key range of the second associated data is treated as abnormal data and deleted in parallel.

例如，可以启动多线程，调用Hbase接口，将步骤504查找出的Hbase关联数据库表中，主键值不在步骤508生成的主键范围内的数据进行并发删除。例如，在删除之前，可以将主键值不在步骤508生成的主键范围内的主键写入删除事务日志，以便在删除失败的情况下，根据删除事务日志恢复删除。当然，也可以不将步骤510需要删除的主键写入删除事务日志，而是通过步骤506到步骤510的重复执行，实现对残留异常数据的删除。For example, multiple threads can be started, and the Hbase interface can be called to concurrently delete the data in the Hbase associated database table found in step 504 whose primary key values are not within the primary key range generated in step 508. For example, before deleting, the primary keys whose primary key values are not within the primary key range generated in step 508 can be written into the deletion transaction log, so that in the event of a deletion failure, the deletion can be restored according to the deletion transaction log. Of course, the primary keys to be deleted in step 510 can also be deleted without writing them into the deletion transaction log, but the deletion of the residual abnormal data can be achieved by repeatedly executing steps 506 to 510.

步骤512：通过检测删除事务日志，判断是否残留异常删除任务。Step 512: Determine whether any abnormal deletion tasks remain by checking the deletion transaction log.

例如，第一删除任务可以是第二删除任务之前的某次删除任务，或者，例如，第一删除任务可以是步骤在第一删除任务异常退出的情况下，删除事务日志中会记录第一删除任务指定删除的第一指定数据的主键以及第一关联数据的主键。For example, the first deletion task may be a deletion task before the second deletion task, or, for example, the first deletion task may be a step. When the first deletion task exits abnormally, the deletion transaction log will record the primary key of the first specified data and the primary key of the first associated data specified to be deleted by the first deletion task.

例如，本说明书实施例提供的方法可以预先提供用于确定删除事务日志的默认参数配置和存储路径，或者，也可以通过用户自定义环境变量的方式确定删除事务日志的参数配置和存储路径。在检测删除事务日志时，可以根据环境变量、配置参数或者默认参数找到删除事务日志路径。例如，如果在启动参数中配置了存储路径，则可以采用配置参数；如果没有配置参数，则可以搜索环境变量里是否设置存储路径；如果没有环境变量，则可以采用默认路径。For example, the method provided in the embodiments of this specification may provide in advance a default parameter configuration and storage path for determining the deletion of the transaction log, or the parameter configuration and storage path for the deletion of the transaction log may be determined by a user-defined environment variable. When detecting the deletion of the transaction log, the deletion transaction log path may be found based on the environment variable, configuration parameter, or default parameter. For example, if the storage path is configured in the startup parameters, the configuration parameter may be used; if there is no configuration parameter, the environment variable may be searched to see whether the storage path is set; if there is no environment variable, the default path may be used.

一实施方式中，可以查找删除事务路径是否存在删除事务日志。如果不存在删除事务日志，说明不存在异常退出的删除事务，则进入步骤520进行正常的删除流程；如果存在删除事务日志，则进入步骤514-516，根据删除事务日志删除数据库内存在的异常数据。In one embodiment, it is possible to find out whether there is a deletion transaction log in the deletion transaction path. If there is no deletion transaction log, it means that there is no deletion transaction that is abnormally exited, and then the process proceeds to step 520 to perform a normal deletion process; if there is a deletion transaction log, then the process proceeds to steps 514-516 to delete the abnormal data in the database according to the deletion transaction log.

步骤514：并行化删除所述删除事务日志中记录的第一关联数据的主键对应的数据。Step 514: Parallel deletion of the data corresponding to the primary key of the first associated data recorded in the deletion transaction log.

例如，当存在删除事务日志时，可以读取删除事务日志中关联数据的日志部分，通过多线程调用Hbase Delete接口，对关联数据的数据进行批量并发删除。For example, when there is a deletion transaction log, the log part of the associated data in the deletion transaction log can be read, and the Hbase Delete interface can be called through multiple threads to perform batch concurrent deletion of the data of the associated data.

步骤516：并行化删除所述删除事务日志中记录的第一指定数据的主键对应的数据。Step 516: Parallelize and delete the data corresponding to the primary key of the first designated data recorded in the deletion transaction log.

当删除完残留的异常关联数据之后，可以继续读取删除事务日志中原始数据部分，提取出待删除的原始数据的主键，对原始数据进行批量并发删除。步After deleting the remaining abnormally associated data, you can continue to read the original data in the deletion transaction log, extract the primary key of the original data to be deleted, and perform batch concurrent deletion of the original data.

完成上述针对第二指定数据表的残留异常数据删除和基于删除事务日志的异常删除恢复步骤之后，下面进入步骤520进行针对第二指定数据表的正常的删除流程。After completing the above steps of deleting the residual abnormal data of the second designated data table and recovering the abnormal deletion based on the deletion transaction log, the process proceeds to step 520 to perform a normal deletion process for the second designated data table.

步骤520：生产者队列基于Spark接口从Hbase数据库获取满足过滤条件的第二指定数据表的主键。Step 520: The producer queue obtains the primary key of the second designated data table that meets the filtering condition from the Hbase database based on the Spark interface.

例如，可以通过Spark调用Hbase接口，将满足过滤条件的第二指定数据表的主键拉取到Spark主节点，Spark主节点作为生产者会将拉取到本地的数据分批量轮流发给多个消费者队列。For example, Spark can call the Hbase interface to pull the primary key of the second specified data table that meets the filtering conditions to the Spark master node. As a producer, the Spark master node will send the pulled local data in batches to multiple consumer queues in turn.

步骤522：多个消费者队列消费数据，并调用Geomesa接口计算出关联列主键，即第二关联数据的主键。Step 522: Multiple consumer queues consume data and call the Geomesa interface to calculate the primary key of the associated column, that is, the primary key of the second associated data.

例如，将多个线程作为多个消费者，每个消费者将从对应的数据队列中不断拉取数据进行消费，每个消费者可以从拉取的数据中提取待删除的第一指定数据的主键，并调用Geomesa接口生成对应的第二关联数据的主键。For example, multiple threads are used as multiple consumers, each consumer will continuously pull data from the corresponding data queue for consumption, each consumer can extract the primary key of the first specified data to be deleted from the pulled data, and call the Geomesa interface to generate the primary key of the corresponding second associated data.

步骤524：多个消费者队列分别将第二指定数据的主键以及第二关联数据的主键写入内存文件映射区。Step 524: The plurality of consumer queues write the primary key of the second designated data and the primary key of the second associated data into the memory file mapping area respectively.

得到第二指定数据的主键和第二关联数据的主键后，为了能够支持删除异常退出后的故障恢复，可以在这个步骤写入删除事务日志需要记录的数据主键。从而，当第二删除任务失败后，可以依据删除事务日志中记录的主键，也即需要在Hbase仓库中删除的数据的主键，删除对应的数据。可以理解的是，在例如Hbase等数据存储中，会为每条数据存储一个主键，依据主键就可以在指定数据库表中删除数据。各个消费者通过内存文件映射方式将主键写入删除事务日志，在写入前可以先记录当前位置点，再通过内存文件映射技术中常用的几个将数据写入内存映射区的接口如oracle JDK的标准接口，将需要删除的数据的主键，即第二指定数据的主键以及第二关联数据的主键写入内存文件映射区。After obtaining the primary key of the second specified data and the primary key of the second associated data, in order to support fault recovery after abnormal deletion, the primary key of the data that needs to be recorded in the deletion transaction log can be written in this step. Thus, when the second deletion task fails, the corresponding data can be deleted based on the primary key recorded in the deletion transaction log, that is, the primary key of the data that needs to be deleted in the Hbase warehouse. It can be understood that in data storage such as Hbase, a primary key will be stored for each data, and data can be deleted in the specified database table based on the primary key. Each consumer writes the primary key to the deletion transaction log through the memory file mapping method. Before writing, the current position point can be recorded first, and then the primary key of the data to be deleted, that is, the primary key of the second specified data and the primary key of the second associated data are written into the memory file mapping area through several commonly used interfaces for writing data to the memory mapping area in the memory file mapping technology, such as the standard interface of Oracle JDK.

步骤526：多个消费者队列并发向Hbase发送删除请求。Step 526: Multiple consumer queues concurrently send deletion requests to Hbase.

例如，第二指定数据的主键以及第二关联数据的主键写入删除事务日志后，当前消费者可以将当前批次的第二关联数据的主键拼接成一个Hbase Delete请求，将关联数据进行删除后再对第二指定数据的主键进行拼接和发送请求删除。在这个步骤中，多个消费者的删除任务相互独立、并行执行，以最大限度的利用了Hbase的并发处理能力。For example, after the primary key of the second specified data and the primary key of the second associated data are written into the deletion transaction log, the current consumer can splice the primary key of the second associated data of the current batch into an Hbase Delete request, delete the associated data, and then splice and send a request to delete the primary key of the second specified data. In this step, the deletion tasks of multiple consumers are independent and executed in parallel to maximize the concurrent processing capabilities of Hbase.

步骤528：删除成功后，清除内存文件映射区的删除事务日志。Step 528: After the deletion is successful, clear the deletion transaction log in the memory file mapping area.

例如，由于在将主键写入删除事务日志前可以先记录当前位置点，从而各个数据消费者通过Hbase delete接口完成数据的删除后，可以根据记录的当前位置点，将对应的内存文件映射区域用空字符覆盖，从而使删除事务日志的写入标记点重置为步骤524记录的初始位置。在该实施例中，先记录删除事务日志，再向Hbase数据仓库发送命令删除数据，通过空字符覆盖以及标记点回退，使正常完成此次删除任务后本个批次的日志记录得以清除，防止存储空间浪费。For example, since the current position point can be recorded before writing the primary key into the deletion transaction log, after each data consumer completes the data deletion through the Hbase delete interface, the corresponding memory file mapping area can be overwritten with a null character according to the recorded current position point, so that the write mark point of the deletion transaction log is reset to the initial position recorded in step 524. In this embodiment, the deletion transaction log is recorded first, and then a command to delete the data is sent to the Hbase data warehouse. Through the null character overwriting and the mark point rollback, the log records of this batch can be cleared after the deletion task is normally completed, thereby preventing storage space waste.

对于删除流程的各个数据消费者，当数据生产者拉取数据完毕、当前消费者对应的消费队列为空且当且消费到数据删除完毕后，可以认为当前消费者的消费任务结束，且当所有消费者的消费任务结束后，整体的数据删除流程结束。For each data consumer in the deletion process, when the data producer has finished pulling data, the consumption queue corresponding to the current consumer is empty, and when the consumption has been completed, the consumption task of the current consumer can be considered to be completed. When the consumption tasks of all consumers are completed, the overall data deletion process is completed.

可见，在该实施例中，由于删除过程采用基于生产者消费者模式的并行流程，生产者逐批次拉取数据，交给多个独立、并行的消费者进行日志记录和生成Hbase Delete请求，提高并发删除效率，最大程度的利用了存储层Hbase的并发能力。It can be seen that in this embodiment, since the deletion process adopts a parallel process based on the producer-consumer model, the producer pulls data in batches and hands it over to multiple independent and parallel consumers for logging and generating Hbase Delete requests, thereby improving the efficiency of concurrent deletion and maximizing the concurrency capability of the storage layer Hbase.

与上述方法实施例相对应，本说明书还提供了删除数据的装置实施例，图6示出了本说明书一个实施例提供的一种删除数据的装置的结构示意图。如图6所示，该装置包括：删除日志记录模块602、主键提取模块604及删除恢复模块606。Corresponding to the above method embodiment, this specification also provides an embodiment of a device for deleting data, and Figure 6 shows a schematic diagram of the structure of a device for deleting data provided by an embodiment of this specification. As shown in Figure 6, the device includes: a deletion log recording module 602, a primary key extraction module 604 and a deletion recovery module 606.

该删除日志记录模块602，可以被配置为在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键。The deletion log recording module 602 may be configured to record in the deletion transaction log the primary key of the first designated data to be deleted by the first deletion task and/or the primary key of the first associated data associated with the primary key of the first designated data.

该主键提取模块604，可以被配置为在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键。The primary key extraction module 604 may be configured to extract the primary key of the first designated data and/or the primary key of the first associated data from the deletion transaction log when the first deletion task exits abnormally.

该删除恢复模块606，可以被配置为根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除。The deletion recovery module 606 may be configured to delete the first designated data according to the extracted primary key of the first designated data, and/or to delete the first associated data according to the extracted primary key of the first associated data.

由于该装置在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键，在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键，根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除，从而在删除任务异常退出而导致异常数据产生的情况下，能够根据记录的删除事务日志进行异常数据清理，保证删除成功，提高数据的安全性。Since the device records the primary key of the first designated data to be deleted by the first deletion task and/or the primary key of the first associated data associated with the primary key of the first designated data in the deletion transaction log, when the first deletion task exits abnormally, the primary key of the first designated data and/or the primary key of the first associated data are extracted from the deletion transaction log, and the first designated data is deleted according to the extracted primary key of the first designated data, and/or the first associated data is deleted according to the extracted primary key of the first associated data. Therefore, when the deletion task exits abnormally and abnormal data is generated, the abnormal data can be cleaned up according to the recorded deletion transaction log, thereby ensuring successful deletion and improving data security.

图7示出了本说明书另一个实施例提供的一种删除数据的装置的结构示意图。如图7所示，该装置还可以包括：合法检测模块608、关联表查找模块610、关联主键生成模块612、异常数据筛选模块614及异常数据删除模块616。Fig. 7 shows a schematic diagram of a structure of a device for deleting data provided by another embodiment of the present specification. As shown in Fig. 7, the device may also include: a legal detection module 608, an association table search module 610, an association primary key generation module 612, an abnormal data screening module 614 and an abnormal data deletion module 616.

该合法检测模块608，可以被配置为对所述第二删除任务指定删除的第二指定数据表进行删除合法性检测。The legality detection module 608 may be configured to perform a deletion legality detection on the second designated data table designated for deletion by the second deletion task.

该关联表查找模块610，可以被配置为如果所述删除合法性检测通过，查找出所述第二指定数据表的关联数据表。The association table search module 610 may be configured to search for an association data table of the second designated data table if the deletion legality check passes.

该关联主键生成模块612，可以被配置为依据所述第二指定数据表的主键，生成与所述第二指定数据表的主键关联的第二关联数据的主键。The associated primary key generating module 612 may be configured to generate a primary key of the second associated data associated with the primary key of the second designated data table according to the primary key of the second designated data table.

该异常数据筛选模块614，可以被配置为从所述第二指定数据表的关联数据表中，查找出不在所述第二关联数据的主键范围内的数据。The abnormal data screening module 614 may be configured to search for data that is not within the primary key range of the second associated data from the associated data table of the second designated data table.

该异常数据删除模块616，可以被配置为将所述第二指定数据表的关联数据表中，不在所述第二关联数据的主键范围内的数据删除。The abnormal data deletion module 616 may be configured to delete data in the associated data table of the second designated data table that is not within the primary key range of the second associated data.

为了提高删除效率，可选地，所述异常数据删除模块616，可以被配置为将所述第二指定数据表的关联数据表中，不在所述第二关联数据的主键范围内的数据分批量并发删除。To improve deletion efficiency, optionally, the abnormal data deletion module 616 may be configured to concurrently delete in batches the data in the associated data table of the second designated data table that is not within the primary key range of the second associated data.

考虑到在对数据进行删除时，关联数据表的主键可以通过指定数据表的主键进行拼接转换得到，先删除关联数据表中的数据，再删除指定数据表的数据，这样，即使删除关联数据表失败，还可以通过指定数据表的主键进行再次删除，因此，本说明书一个或多个实施例中，所述第一删除任务用于先删除所述第一关联数据的主键对应的数据，再删除所述第一指定数据的主键对应的数据。所述删除恢复模块606，可以被配置为先根据所述第一关联数据的主键，将所述第一关联数据删除；再根据提取出的所述第一指定数据的主键，将所述第一指定数据删除。Considering that when deleting data, the primary key of the associated data table can be obtained by concatenating and converting the primary key of the specified data table, the data in the associated data table is deleted first, and then the data in the specified data table is deleted. In this way, even if the deletion of the associated data table fails, it can be deleted again by the primary key of the specified data table. Therefore, in one or more embodiments of this specification, the first deletion task is used to first delete the data corresponding to the primary key of the first associated data, and then delete the data corresponding to the primary key of the first specified data. The deletion recovery module 606 can be configured to first delete the first associated data according to the primary key of the first associated data; and then delete the first specified data according to the extracted primary key of the first specified data.

本说明书一个或多个实施例中，为了提高删除效率，如图7所示，所述装置还可以包括：任务执行模块618，可以被配置为基于生产者消费者模型，并发执行多个第二删除任务，所述多个第二删除任务用于删除第二指定数据表。In one or more embodiments of the present specification, in order to improve deletion efficiency, as shown in Figure 7, the device may also include: a task execution module 618, which can be configured to concurrently execute multiple second deletion tasks based on the producer consumer model, and the multiple second deletion tasks are used to delete the second specified data table.

可选地，所述装置可以配置于基于Spark作为计算层的大数据架构。如图7所示，所述任务执行模块618可以包括：指定数据获取子模块6180、指定数据主键分发子模块6182、关联数据主键生成子模块6184、删除请求生成子模块6186、删除请求发送子模块6188。所述装置还可以包括：主键写入子模块620，可以被配置为在所述多个消费者队列分别将接收到的主键以及对应生成的所述第二关联数据的主键写入所述删除事务日志。Optionally, the device can be configured in a big data architecture based on Spark as a computing layer. As shown in FIG7 , the task execution module 618 may include: a designated data acquisition submodule 6180, a designated data primary key distribution submodule 6182, an associated data primary key generation submodule 6184, a deletion request generation submodule 6186, and a deletion request sending submodule 6188. The device may also include: a primary key writing submodule 620, which may be configured to write the received primary key and the corresponding generated primary key of the second associated data into the deletion transaction log in the multiple consumer queues.

该指定数据获取子模块6180，可以被配置为基于Spark接口从数据库获取所述第二指定数据表的主键到Spark主节点。The designated data acquisition submodule 6180 may be configured to acquire the primary key of the second designated data table from the database to the Spark master node based on the Spark interface.

该指定数据主键分发子模块6182，可以被配置为在Spark主节点作为生产者将所述第二指定数据表的主键分批量发给多个消费者队列。The designated data primary key distribution submodule 6182 may be configured to send the primary key of the second designated data table to multiple consumer queues in batches as a producer on the Spark master node.

该关联数据主键生成子模块6184，可以被配置为在所述多个消费者队列分别根据接收到的主键生成所述第二关联数据的主键。The associated data primary key generation submodule 6184 may be configured to generate the primary key of the second associated data in the multiple consumer queues respectively according to the received primary keys.

该删除请求生成子模块6186，可以被配置为在所述多个消费者队列分别针对接收到的主键以及所述第二关联数据的主键，生成对应于所述第二删除任务的删除请求。The deletion request generation submodule 6186 may be configured to generate a deletion request corresponding to the second deletion task in the multiple consumer queues respectively for the received primary key and the primary key of the second associated data.

该删除请求发送子模块6188，可以被配置为在所述多个消费者队列分别并发向数据库发送所述删除请求。The deletion request sending submodule 6188 may be configured to send the deletion request to the database concurrently in the multiple consumer queues respectively.

在该实施例中，由于将Spark主节点作为生产者，将指定删除的数据表的主键分批量发给多个消费者队列，从而多个消费者队列分别并发向数据库发送删除请求，使多个删除请求各自对应的删除任务相互独立、并发执行，极大发挥了删除任务的处理能力，提高了删除效率。In this embodiment, since the Spark master node is used as the producer, the primary keys of the data table to be deleted are sent to multiple consumer queues in batches, so that the multiple consumer queues send deletion requests to the database concurrently, so that the deletion tasks corresponding to the multiple deletion requests are independent of each other and executed concurrently, which greatly exerts the processing capacity of the deletion tasks and improves the deletion efficiency.

可选地，如图7所示，所述装置还可以包括：日志清除模块630，可以被配置为在所述删除任务成功的情况下，在所述删除事务日志中清除所述删除任务的记录。Optionally, as shown in FIG. 7 , the apparatus may further include: a log clearing module 630 , which may be configured to clear the record of the deletion task in the deletion transaction log when the deletion task is successful.

上述为本实施例的一种删除数据的装置的示意性方案。需要说明的是，该删除数据的装置的技术方案与上述的删除数据的方法的技术方案属于同一构思，删除数据的装置的技术方案未详细描述的细节内容，均可以参见上述删除数据的方法的技术方案的描述。The above is a schematic scheme of a device for deleting data in this embodiment. It should be noted that the technical scheme of the device for deleting data and the technical scheme of the method for deleting data belong to the same concept, and the details of the technical scheme of the device for deleting data that are not described in detail can all be referred to the description of the technical scheme of the method for deleting data.

图8示出了根据本说明书一个实施例提供的一种计算设备800的结构框图。该计算设备800的部件包括但不限于存储器810和处理器820。处理器820与存储器810通过总线830相连接，数据库850用于保存数据。Fig. 8 shows a block diagram of a computing device 800 according to an embodiment of the present specification. The components of the computing device 800 include but are not limited to a memory 810 and a processor 820. The processor 820 is connected to the memory 810 via a bus 830, and a database 850 is used to store data.

计算设备800还包括接入设备840，接入设备840使得计算设备800能够经由一个或多个网络860通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备840可以包括有线或无线的任何类型的网络接口(例如，网络接口卡(NIC))中的一个或多个，诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口，等等。The computing device 800 also includes an access device 840 that enables the computing device 800 to communicate via one or more networks 860. Examples of these networks include a public switched telephone network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communication networks such as the Internet. The access device 840 may include one or more of any type of network interface (e.g., a network interface card (NIC)) that is wired or wireless, such as an IEEE 802.11 wireless local area network (WLAN) wireless interface, a World Wide Interoperability for Microwave Access (Wi-MAX) interface, an Ethernet interface, a universal serial bus (USB) interface, a cellular network interface, a Bluetooth interface, a near field communication (NFC) interface, and the like.

在本说明书的一个实施例中，计算设备800的上述部件以及图8中未示出的其他部件也可以彼此相连接，例如通过总线。应当理解，图8所示的计算设备结构框图仅仅是出于示例的目的，而不是对本说明书范围的限制。本领域技术人员可以根据需要，增添或替换其他部件。In one embodiment of the present specification, the above components of the computing device 800 and other components not shown in FIG8 may also be connected to each other, for example, through a bus. It should be understood that the computing device structure block diagram shown in FIG8 is only for illustrative purposes and is not intended to limit the scope of the present specification. Those skilled in the art may add or replace other components as needed.

计算设备800可以是任何类型的静止或移动计算设备，包括移动计算机或移动计算设备(例如，平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如，智能手机)、可佩戴的计算设备(例如，智能手表、智能眼镜等)或其他类型的移动设备，或者诸如台式计算机或PC的静止计算设备。计算设备800还可以是移动式或静止式的服务器。The computing device 800 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., a tablet computer, a personal digital assistant, a laptop computer, a notebook computer, a netbook, etc.), a mobile phone (e.g., a smart phone), a wearable computing device (e.g., a smart watch, smart glasses, etc.), or other types of mobile devices, or a stationary computing device such as a desktop computer or PC. The computing device 800 may also be a mobile or stationary server.

其中，处理器820用于执行如下计算机可执行指令：The processor 820 is used to execute the following computer executable instructions:

在删除事务日志中记录第一删除任务需要删除的第一指定数据的主键和/或与所述第一指定数据的主键关联的第一关联数据的主键；Recording in the deletion transaction log the primary key of the first designated data to be deleted by the first deletion task and/or the primary key of the first associated data associated with the primary key of the first designated data;

在所述第一删除任务异常退出的情况下，从所述删除事务日志中，提取出所述第一指定数据的主键和/或所述第一关联数据的主键；In the case where the first deletion task exits abnormally, extracting the primary key of the first designated data and/or the primary key of the first associated data from the deletion transaction log;

根据提取出的所述第一指定数据的主键将所述第一指定数据删除，和/或，根据提取出的所述第一关联数据的主键将所述第一关联数据删除。The first designated data is deleted according to the extracted primary key of the first designated data, and/or the first associated data is deleted according to the extracted primary key of the first associated data.

上述为本实施例的一种计算设备的示意性方案。需要说明的是，该计算设备的技术方案与上述的删除数据的方法的技术方案属于同一构思，计算设备的技术方案未详细描述的细节内容，均可以参见上述删除数据的方法的技术方案的描述。The above is a schematic scheme of a computing device of this embodiment. It should be noted that the technical scheme of the computing device and the technical scheme of the above method for deleting data belong to the same concept, and the details not described in detail in the technical scheme of the computing device can be referred to the description of the technical scheme of the above method for deleting data.

本说明书一实施例还提供一种计算机可读存储介质，其存储有计算机指令，该指令被处理器执行时以用于：An embodiment of the present specification further provides a computer-readable storage medium storing computer instructions, which are used when executed by a processor to:

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是，该存储介质的技术方案与上述的删除数据的方法的技术方案属于同一构思，存储介质的技术方案未详细描述的细节内容，均可以参见上述删除数据的方法的技术方案的描述。The above is a schematic scheme of a computer-readable storage medium of this embodiment. It should be noted that the technical scheme of the storage medium and the technical scheme of the above-mentioned method for deleting data belong to the same concept, and the details not described in detail in the technical scheme of the storage medium can be referred to the description of the technical scheme of the above-mentioned method for deleting data.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The above is a description of a specific embodiment of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in an order different from that in the embodiments and still achieve the desired results. In addition, the processes depicted in the accompanying drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

所述计算机指令包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括电载波信号和电信信号。The computer instructions include computer program codes, which may be in source code form, object code form, executable files or some intermediate forms, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media do not include electric carrier signals and telecommunication signals.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本说明书实施例并不受所描述的动作顺序的限制，因为依据本说明书实施例，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本说明书实施例所必须的。It should be noted that, for the convenience of description, the aforementioned method embodiments are all described as a series of action combinations, but those skilled in the art should be aware that the embodiments of this specification are not limited by the described action sequence, because according to the embodiments of this specification, certain steps can be performed in other sequences or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the embodiments of this specification.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本说明书实施例的内容，可作很多的修改和变化。本说明书选取并具体描述这些实施例，是为了更好地解释本说明书实施例的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of this specification disclosed above are only used to help explain this specification. The optional embodiments do not describe all the details in detail, nor do they limit the invention to only the specific implementation methods described. Obviously, many modifications and changes can be made according to the content of the embodiments of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of this specification, so that technicians in the relevant technical field can well understand and use this specification. This specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of deleting data, comprising:

recording a main key of first designated data to be deleted by a first deleting task and/or a main key of first associated data associated with the main key of the first designated data in a deleting transaction log;

extracting a primary key of the first specified data and/or a primary key of the first associated data from the deletion transaction log under the condition that the first deletion task is abnormally exited;

Deleting the first specified data according to the extracted primary key of the first specified data and/or deleting the first associated data according to the extracted primary key of the first associated data;

the first deleting task is used for deleting the data corresponding to the primary key of the first associated data first and then deleting the data corresponding to the primary key of the first designated data;

the deleting the first specified data and the first associated data according to the extracted primary key of the first specified data and the primary key of the first associated data includes:

firstly deleting the first associated data according to the primary key of the first associated data;

then deleting the first appointed data according to the extracted main key of the first appointed data;

The method further comprises the steps of:

Searching an associated data table of the second designated data table;

Generating a main key of second associated data associated with the main key of the second designated data table according to the main key of the second designated data table;

Searching data which is not in the primary key range of the second associated data from the associated data table of the second designated data table;

and deleting data which are not in the primary key range of the second associated data in the associated data table of the second designated data table.

2. The method of claim 1, further comprising:

Performing deletion legitimacy detection on a second designated data table designated to be deleted by a second deletion task;

And if the deletion validity detection is passed, entering the step of searching out the associated data table of the second designated data table.

3. The method of claim 1, the deleting data in the association data table of the second designated data table that is not within a primary key range of the second association data comprising:

and deleting the data which is not in the primary key range of the second associated data in the associated data table of the second designated data table in batches and concurrently.

4. The method of claim 1, the first deletion task comprising a plurality of deletion tasks that are concurrently performed.

5. The method of claim 1, further comprising:

based on the producer consumer model, a plurality of second deletion tasks are concurrently performed, the plurality of second deletion tasks for deleting the second specified data table.

6. The method of claim 5 applied to a big data architecture based on Spark as a compute layer;

the concurrently performing a plurality of second deletion tasks based on the producer consumer model includes:

acquiring a primary key of the second specified data table from a database to a Spark primary node based on a Spark interface;

the Spark master node is used as a producer to distribute the primary keys of the second designated data table to a plurality of consumer queues in batches;

the plurality of consumer queues respectively generate a main key of second associated data according to the received main key;

the plurality of consumer queues respectively generate a deletion request corresponding to the second deletion task for the received primary key and the primary key of the second associated data;

the plurality of consumer queues send the deletion request to a database respectively and concurrently;

The method further comprises the steps of:

And the plurality of consumer queues respectively write the received primary key and the primary key of the second associated data correspondingly generated into the deletion transaction log.

7. The method of claim 1 or 6, wherein the deletion transaction log is used for logging in a memory file mapping manner.

8. An apparatus for deleting data, comprising:

the deletion log recording module is configured to record a main key of first specified data to be deleted by a first deletion task and/or a main key of first associated data associated with the main key of the first specified data in a deletion transaction log;

The primary key extraction module is configured to extract a primary key of the first specified data and/or a primary key of the first associated data from the deletion transaction log under the condition that the first deletion task is abnormally exited;

a deletion restoration module configured to delete the first specified data according to the extracted primary key of the first specified data and/or delete the first associated data according to the extracted primary key of the first associated data;

the apparatus further comprises:

The association table searching module is configured to search out an association data table of the second designated data table;

an associated primary key generation module configured to generate primary keys of second associated data associated with primary keys of the second specified data table in accordance with the primary keys of the second specified data table;

The abnormal data screening module is configured to search out data which is not in the range of the primary key of the second associated data from the associated data table of the second designated data table;

And the abnormal data deleting module is configured to delete data which are not in the primary key range of the second associated data in the associated data table of the second designated data table.

9. A computing device, comprising:

A memory and a processor;

The memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions:

the computer-executable instructions further comprise:

Searching an associated data table of the second designated data table;

10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of deleting data of any one of claims 1 to 7.