[go: up one dir, main page]

CN118535658A - Data processing method, device, equipment, medium and program product - Google Patents

Data processing method, device, equipment, medium and program product Download PDF

Info

Publication number
CN118535658A
CN118535658A CN202410713470.6A CN202410713470A CN118535658A CN 118535658 A CN118535658 A CN 118535658A CN 202410713470 A CN202410713470 A CN 202410713470A CN 118535658 A CN118535658 A CN 118535658A
Authority
CN
China
Prior art keywords
metadata
target
data
processing
management system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410713470.6A
Other languages
Chinese (zh)
Inventor
雷经纬
夏冰沁
唐家星
王睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202410713470.6A priority Critical patent/CN118535658A/en
Publication of CN118535658A publication Critical patent/CN118535658A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供了一种数据处理方法、装置、设备、介质和程序产品,可以应用于数据处理技术和金融科技技术领域。该方法包括:响应于数据管理系统中的定时任务被触发,根据定时任务从数据仓库中确定目标集群,其中,数据管理系统用于对数据仓库中的元数据进行管理,定时任务用于将数据管理系统中的元数据与目标集群中的元数据保持一致;获取对目标集群中的元数据进行处理的处理指令;从处理指令中确定用于对目标集群中的元数据进行变更处理的目标指令;根据目标指令,从目标集群中确定目标元数据,其中,目标元数据表征经过目标指令处理后的元数据;根据目标元数据,对数据管理系统中的元数据进行处理。

The present disclosure provides a data processing method, device, equipment, medium and program product, which can be applied to the fields of data processing technology and financial technology technology. The method includes: in response to a scheduled task in a data management system being triggered, determining a target cluster from a data warehouse according to the scheduled task, wherein the data management system is used to manage metadata in the data warehouse, and the scheduled task is used to keep the metadata in the data management system consistent with the metadata in the target cluster; obtaining a processing instruction for processing the metadata in the target cluster; determining a target instruction for changing the metadata in the target cluster from the processing instruction; determining target metadata from the target cluster according to the target instruction, wherein the target metadata represents the metadata processed by the target instruction; and processing the metadata in the data management system according to the target metadata.

Description

数据处理方法、装置、设备、介质和程序产品Data processing method, device, equipment, medium and program product

技术领域Technical Field

本公开涉及数据处理技术和金融科技技术领域,具体地涉及一种数据处理方法、装置、设备、介质和程序产品。The present disclosure relates to the fields of data processing technology and financial technology, and specifically to a data processing method, device, equipment, medium and program product.

背景技术Background Art

元数据是描述数据的数据,可以记录数据的描述信息。可以利用元数据实现对数据的管理和维护,如元数据中包括数据集的存储格式、索引情况或分区策略等信息,在查询相关信息的情况下,可以通过元数据进行查询。Metadata is data that describes data and can record descriptive information about the data. Metadata can be used to manage and maintain data. For example, metadata includes information such as the storage format, indexing, or partitioning strategy of a data set. When querying related information, metadata can be used for querying.

在实现本公开构思的过程中,发明人发现相关技术中至少存在如下问题:相关技术中对元数据的处理方法效率较低,资源消耗较大。In the process of implementing the concept of the present disclosure, the inventors found that there are at least the following problems in the related art: the processing method of metadata in the related art is inefficient and consumes a lot of resources.

发明内容Summary of the invention

鉴于上述问题,本公开提供了一种数据处理方法、装置、设备、介质和程序产品。In view of the above problems, the present disclosure provides a data processing method, apparatus, device, medium and program product.

根据本公开的第一个方面,提供了一种数据处理方法,包括:According to a first aspect of the present disclosure, there is provided a data processing method, comprising:

响应于数据管理系统中的定时任务被触发,根据上述定时任务从数据仓库中确定目标集群,其中,上述数据管理系统用于对上述数据仓库中的元数据进行管理,上述定时任务用于将上述数据管理系统中的元数据与上述目标集群中的元数据保持一致;In response to a scheduled task in the data management system being triggered, determining a target cluster from the data warehouse according to the scheduled task, wherein the data management system is used to manage metadata in the data warehouse, and the scheduled task is used to keep the metadata in the data management system consistent with the metadata in the target cluster;

获取对上述目标集群中的元数据进行处理的处理指令;Obtaining processing instructions for processing metadata in the target cluster;

从上述处理指令中确定用于对上述目标集群中的元数据进行变更处理的目标指令;Determine a target instruction for performing a change process on the metadata in the target cluster from the processing instructions;

根据上述目标指令,从上述目标集群中确定目标元数据,其中,上述目标元数据表征经过上述目标指令处理后的元数据;以及Determining target metadata from the target cluster according to the target instruction, wherein the target metadata represents metadata processed by the target instruction; and

根据上述目标元数据,对上述数据管理系统中的元数据进行处理。The metadata in the data management system is processed according to the target metadata.

根据本公开的实施例,上述根据上述目标指令,从上述目标集群中确定目标元数据,包括:According to an embodiment of the present disclosure, determining target metadata from the target cluster according to the target instruction includes:

根据上述目标指令,确定与上述目标指令相对应的目标标识;According to the target instruction, determining the target identifier corresponding to the target instruction;

从上述目标集群中确定与上述目标标识相匹配的目标数据表;以及Determine a target data table matching the target identifier from the target cluster; and

根据上述目标数据表的更新时间和更新内容,确定上述目标元数据。The target metadata is determined according to the update time and update content of the target data table.

根据本公开的实施例,根据上述目标元数据,对上述数据管理系统中的元数据进行处理,包括:According to an embodiment of the present disclosure, the metadata in the data management system is processed according to the target metadata, including:

根据上述目标元数据的变更类型,确定对上述数据管理系统中的元数据的处理方式;Determine, according to the change type of the target metadata, how to process the metadata in the data management system;

根据上述处理方式,对上述数据管理系统中的元数据进行处理。According to the above processing method, the metadata in the above data management system is processed.

根据本公开的实施例,上述变更类型包括修改类型、增加类型和删除类型,其中,上述根据上述目标元数据的变更类型,确定对上述数据管理系统中的元数据的处理方式,包括:According to an embodiment of the present disclosure, the above-mentioned change type includes a modification type, an addition type and a deletion type, wherein the above-mentioned determination of a processing method for the metadata in the above-mentioned data management system according to the change type of the above-mentioned target metadata includes:

在确定上述目标元数据的变更类型为修改类型的情况下,确定对上述数据管理系统的元数据的处理方式为修改处理方式;In the case where it is determined that the change type of the target metadata is a modification type, determining that the processing method for the metadata of the data management system is a modification processing method;

在确定上述目标元数据的变更类型为增加类型的情况下,确定对上述数据管理系统的元数据的处理方式为增加处理方式;In the case where it is determined that the change type of the target metadata is an addition type, determining that the processing method for the metadata of the data management system is an addition processing method;

在确定上述目标元数据的变更类型为删除类型的情况下,确定对上述数据管理系统的元数据的处理方式为删除处理方式。When it is determined that the change type of the target metadata is a deletion type, the processing method of the metadata of the data management system is determined to be a deletion processing method.

根据本公开的实施例,上述根据上述处理方式,对上述数据管理系统中的元数据进行处理,包括:According to an embodiment of the present disclosure, the metadata in the data management system is processed according to the processing method, including:

在确定上述处理方式为上述修改处理方式的情况下,从上述数据管理系统中获取与上述目标元数据相关的第一待修改元数据;In the case where it is determined that the processing method is the modification processing method, obtaining first metadata to be modified related to the target metadata from the data management system;

根据上述目标元数据,对上述第一待修改元数据进行修改;Modifying the first metadata to be modified according to the target metadata;

在上述处理方式为上述增加处理方式的情况下,根据上述目标元数据确定第二待修改元数据;In the case where the above processing method is the above adding processing method, determining the second metadata to be modified according to the above target metadata;

在上述数据管理系统中增加上述第二待修改元数据;Adding the second metadata to be modified in the data management system;

在上述处理方式为上述删除处理方式的情况下,从上述数据管理系统中获取与上述目标元数据相关的第三待修改元数据;In the case where the processing method is the deletion processing method, obtaining third metadata to be modified related to the target metadata from the data management system;

将上述第三待修改元数据从上述数据管理系统删除。The third metadata to be modified is deleted from the data management system.

根据本公开的实施例,上述方法还包括:According to an embodiment of the present disclosure, the above method further includes:

响应于上述数据管理系统接收到来自目标对象关于业务数据的查询请求,对上述目标对象进行身份认证;In response to the data management system receiving a query request about business data from a target object, performing identity authentication on the target object;

在确定上述目标对象的身份认证通过的情况下,从上述数据管理系统中获取与上述业务数据相关的元数据;When it is determined that the identity authentication of the target object is passed, obtaining metadata related to the business data from the data management system;

根据与上述业务数据相关的元数据,从业务数据查询模板集合中确定目标查询模板;Determine a target query template from a set of business data query templates according to metadata related to the business data;

响应于接收到上述目标对象关于上述目标查询模板的处理请求,从上述数据仓库中获取上述业务数据。In response to receiving a processing request from the target object regarding the target query template, the business data is acquired from the data warehouse.

根据本公开的实施例,上述方法还包括:According to an embodiment of the present disclosure, the above method further includes:

根据上述数据仓库中上述目标集群的集群规模,确定与上述目标集群相匹配的并行处理数量和处理频率;According to the cluster size of the target cluster in the data warehouse, determine the number of parallel processing and the processing frequency that match the target cluster;

根据上述并行处理数据量和上述处理频率生成上述定时任务。The scheduled task is generated according to the parallel processing data volume and the processing frequency.

根据本公开的实施例,上述方法还包括:According to an embodiment of the present disclosure, the above method further includes:

在确定上述数据管理系统中首次更新上述目标集群中的元数据的情况下,将上述目标集群中全部的元数据更新至上述数据管理系统。When it is determined that the metadata in the target cluster is updated for the first time in the data management system, all metadata in the target cluster are updated to the data management system.

本公开的第二方面提供了一种数据处理装置,包括:A second aspect of the present disclosure provides a data processing device, including:

第一确定模块,用于响应于数据管理系统中的定时任务被触发,根据上述定时任务从数据仓库中确定目标集群,其中,上述数据管理系统用于对上述数据仓库中的元数据进行管理,上述定时任务用于将上述数据管理系统中的元数据与上述目标集群中的元数据保持一致;A first determination module, configured to determine a target cluster from a data warehouse in response to a scheduled task in a data management system being triggered according to the scheduled task, wherein the data management system is configured to manage metadata in the data warehouse, and the scheduled task is configured to keep the metadata in the data management system consistent with the metadata in the target cluster;

第一获取模块,用于获取对上述目标集群中的元数据进行处理的处理指令;A first acquisition module is used to acquire a processing instruction for processing the metadata in the target cluster;

第二确定模块,用于从上述处理指令中确定用于对上述目标集群中的元数据进行变更处理的目标指令;A second determining module is used to determine a target instruction for performing a change process on the metadata in the target cluster from the processing instructions;

第三确定模块,用于根据上述目标指令,从上述目标集群中确定目标元数据,其中,上述目标元数据表征经过上述目标指令处理后的元数据;以及A third determination module is configured to determine target metadata from the target cluster according to the target instruction, wherein the target metadata represents metadata processed by the target instruction; and

处理模块,用于根据上述目标元数据,对上述数据管理系统中的元数据进行处理。The processing module is used to process the metadata in the data management system according to the target metadata.

本公开的第三方面提供了一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个计算机程序,其中,上述一个或多个处理器执行上述一个或多个计算机程序以实现上述方法的步骤。A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more computer programs, wherein the one or more processors execute the one or more computer programs to implement the steps of the above method.

本公开的第四方面还提供了一种计算机可读存储介质,其上存储有计算机程序或指令,上述计算机程序或指令被处理器执行时实现上述方法的步骤。The fourth aspect of the present disclosure further provides a computer-readable storage medium on which a computer program or instruction is stored, and the steps of the above method are implemented when the above computer program or instruction is executed by a processor.

本公开的第五方面还提供了一种计算机程序产品,包括计算机程序或指令,上述计算机程序或指令被处理器执行时实现上述方法的步骤。The fifth aspect of the present disclosure further provides a computer program product, including a computer program or instructions, which implement the steps of the above method when the computer program or instructions are executed by a processor.

根据本公开提供的数据处理方法、装置、设备、介质和程序产品,通过响应于数据管理系统中的定时任务被触发,根据定时任务从数据仓库中确定目标集群,其中,数据管理系统用于对数据仓库中的元数据进行管理,定时任务用于将数据管理系统中的元数据与目标集群中的元数据保持一致;获取对目标集群中的元数据进行处理的处理指令;从处理指令中确定用于对目标集群中的元数据进行变更处理的目标指令;根据目标指令,从目标集群中确定目标元数据,其中,目标元数据表征经过目标指令处理后的元数据;根据目标元数据,对数据管理系统中的元数据进行处理。由于通过设置定时任务可以自动的确定数据仓库中的目标集群,根据目标集群中对元数据进行处理的处理指令,筛选得到对目标集群中的元数据进行变更处理的目标指令,从二可以确定发生变更的目标元数据,基于目标元数据对数据管理系统进行处理,不需要根据目标集群中全部的元数据进行全量更新,避免了全量更新占用大量的资源,提升了资源利用效率。并且通过指令确定目标元数据方便快捷,不需要新旧数据之间进行比对。According to the data processing method, device, equipment, medium and program product provided by the present disclosure, in response to the triggering of a scheduled task in a data management system, a target cluster is determined from a data warehouse according to the scheduled task, wherein the data management system is used to manage metadata in the data warehouse, and the scheduled task is used to keep the metadata in the data management system consistent with the metadata in the target cluster; obtain a processing instruction for processing the metadata in the target cluster; determine a target instruction for changing the metadata in the target cluster from the processing instruction; determine the target metadata from the target cluster according to the target instruction, wherein the target metadata represents the metadata processed by the target instruction; and process the metadata in the data management system according to the target metadata. Since the target cluster in the data warehouse can be automatically determined by setting a scheduled task, the target instruction for changing the metadata in the target cluster is obtained according to the processing instruction for processing the metadata in the target cluster, and the target metadata that has been changed can be determined from the second. The data management system is processed based on the target metadata, and there is no need to perform a full update according to all the metadata in the target cluster, thereby avoiding the full update occupying a large amount of resources and improving resource utilization efficiency. In addition, it is convenient and quick to determine the target metadata through instructions, and there is no need to compare the new and old data.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过以下参照附图对本公开实施例的描述,本公开的上述内容以及其他目的、特征和优点将更为清楚,在附图中:The above contents and other purposes, features and advantages of the present disclosure will become more apparent through the following description of the embodiments of the present disclosure with reference to the accompanying drawings, in which:

图1示意性示出了根据本公开实施例的数据处理方法、装置、设备、介质和程序产品的应用场景图;FIG1 schematically shows an application scenario diagram of a data processing method, apparatus, device, medium, and program product according to an embodiment of the present disclosure;

图2示意性示出了根据本公开实施例的数据处理方法的流程图;FIG2 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure;

图3示意性示出了根据本公开又一实施例的数据处理方法的流程图;FIG3 schematically shows a flow chart of a data processing method according to another embodiment of the present disclosure;

图4示意性示出了根据本公开实施例的数据处理框架的框图;FIG4 schematically shows a block diagram of a data processing framework according to an embodiment of the present disclosure;

图5示意性示出了根据本公开实施例的数据处理装置的结构框图;以及FIG5 schematically shows a structural block diagram of a data processing device according to an embodiment of the present disclosure; and

图6示意性示出了根据本公开实施例的适于实现数据处理方法的电子设备的方框图。FIG6 schematically shows a block diagram of an electronic device suitable for implementing a data processing method according to an embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

以下,将参照附图来描述本公开的实施例。但是应该理解,这些描述只是示例性的,而并非要限制本公开的范围。在下面的详细描述中,为便于解释,阐述了许多具体的细节以提供对本公开实施例的全面理解。然而,明显地,一个或多个实施例在没有这些具体细节的情况下也可以被实施。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本公开的概念。Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. However, it should be understood that these descriptions are exemplary only and are not intended to limit the scope of the present disclosure. In the following detailed description, for ease of explanation, many specific details are set forth to provide a comprehensive understanding of the embodiments of the present disclosure. However, it is apparent that one or more embodiments may also be implemented without these specific details. In addition, in the following description, descriptions of known structures and technologies are omitted to avoid unnecessary confusion of the concepts of the present disclosure.

在此使用的术语仅仅是为了描述具体实施例,而并非意在限制本公开。在此使用的术语“包括”、“包含”等表明了所述特征、步骤、操作和/或部件的存在,但是并不排除存在或添加一个或多个其他特征、步骤、操作或部件。The terms used herein are only for describing specific embodiments and are not intended to limit the present disclosure. The terms "include", "comprising", etc. used herein indicate the existence of the features, steps, operations and/or components, but do not exclude the existence or addition of one or more other features, steps, operations or components.

在此使用的所有术语(包括技术和科学术语)具有本领域技术人员通常所理解的含义,除非另外定义。应注意,这里使用的术语应解释为具有与本说明书的上下文相一致的含义,而不应以理想化或过于刻板的方式来解释。All terms (including technical and scientific terms) used herein have the meanings commonly understood by those skilled in the art unless otherwise defined. It should be noted that the terms used herein should be interpreted as having a meaning consistent with the context of this specification and should not be interpreted in an idealized or overly rigid manner.

在使用类似于“A、B和C等中至少一个”这样的表述的情况下,一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如,“具有A、B和C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。When using expressions such as "at least one of A, B, and C, etc.", they should generally be interpreted according to the meaning of the expression commonly understood by those skilled in the art (for example, "a system having at least one of A, B, and C" should include but is not limited to a system having A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, C, etc.).

在本公开的技术方案中,所涉及的用户信息(包括但不限于用户个人信息、用户图像信息、用户设备信息,例如位置信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、存储、使用、加工、传输、提供、公开和应用等处理,均遵守相关法律法规和标准,采取了必要保密措施,不违背公序良俗,并提供有相应的操作入口,供用户选择授权或者拒绝。In the technical solution of the present disclosure, the user information (including but not limited to user personal information, user image information, user device information, such as location information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved are all information and data authorized by the user or fully authorized by all parties, and the collection, storage, use, processing, transmission, provision, disclosure and application of the relevant data comply with relevant laws, regulations and standards, take necessary confidentiality measures, do not violate public order and good morals, and provide corresponding operation entrances for users to choose to authorize or refuse.

数据管理系统主要服务于数据分析师等用户,这些用户会对元数据联机查询有着高频的需求。例如,一个数据分析师可能需要进行以下操作:数据血统分析:数据分析师需要快速访问元数据以追踪数据的来源,了解数据的历史变化,以及数据是如何被处理和转换的,这对于确保数据的准确性和可靠性至关重要。性能优化查询:为了提高查询效率,数据分析师可能需要查看特定数据集的存储格式、索引情况或分区策略,这些信息都包含在元数据中。安全性和合规性审核:了解谁可以访问特定的数据集,以及这些数据集的访问权限和历史记录,是数据分析师的另一个关键任务,元数据中通常包含这类信息。资源管理和估算:数据分析师需要了解各个数据集的大小和使用情况,以便更好地分配资源和预测未来的存储需求,这也涉及到对元数据的查询。这些场景表明,元数据的联机查询对于数据管理系统的用户来说不仅频繁而且关键,因此提高这一环节的效率和性能对整个数据治理系统的成功至关重要。Data management systems mainly serve users such as data analysts, who have a high frequency demand for online metadata queries. For example, a data analyst may need to perform the following operations: Data lineage analysis: Data analysts need to quickly access metadata to track the source of data, understand the historical changes of data, and how data is processed and transformed, which is critical to ensuring the accuracy and reliability of data. Performance optimization query: In order to improve query efficiency, data analysts may need to view the storage format, index status, or partition strategy of a specific data set, which is contained in metadata. Security and compliance audit: Understanding who can access specific data sets, as well as the access rights and history of these data sets, is another key task for data analysts, and metadata usually contains this information. Resource management and estimation: Data analysts need to understand the size and usage of each data set in order to better allocate resources and predict future storage needs, which also involves querying metadata. These scenarios show that online metadata queries are not only frequent but also critical for users of data management systems, so improving the efficiency and performance of this link is crucial to the success of the entire data governance system.

相关技术中,通常采用大规模并行处理数据库(Massively Parallel ProcessingDatabase,MPPDB)分布式集群构建数据仓库,数据仓库通过批量调度系统进行数据加工,然后供数据分析师查询。以下是数据导入、加工、使用的典型场景:数据导入:将每天产生大量业务数据导入到数据仓库中。在导入过程中,数据可能需要进行格式转换、清洗和验证,以确保数据质量。数据加工:数据被导入后,批量调度系统就会开始加工这些数据。加工过程包括数据合并、聚合、计算等操作,以生成可用于分析的数据集。例如,可能需要计算一天的总交易量,或者生成客户行为的摘要。数据使用:数据加工后,数据分析师可以通过查询接口对这些加工后的数据进行查询。他们可能进行风险分析、趋势预测、行为分析等操作。在这个阶段,高效的元数据查询至关重要,因为它们使数据分析师能够快速理解数据集的结构和特性。In related technologies, a massively parallel processing database (MPPDB) distributed cluster is usually used to build a data warehouse. The data warehouse processes data through a batch scheduling system and then provides it for query by data analysts. The following are typical scenarios for data import, processing, and use: Data import: A large amount of business data generated every day is imported into the data warehouse. During the import process, the data may need to be format converted, cleaned, and verified to ensure data quality. Data processing: After the data is imported, the batch scheduling system will start processing the data. The processing process includes operations such as data merging, aggregation, and calculation to generate a data set that can be used for analysis. For example, it may be necessary to calculate the total transaction volume for a day, or generate a summary of customer behavior. Data use: After the data is processed, data analysts can query the processed data through the query interface. They may perform risk analysis, trend prediction, behavioral analysis, and other operations. At this stage, efficient metadata queries are crucial because they enable data analysts to quickly understand the structure and characteristics of the data set.

当前的数据管理系统在获取元数据时往往需要全量处理。然而,全量处理存在限制,由于MPPDB集群主要面向高吞吐的海量数据加工,其在处理高频、多样化的元数据查询需求时,可能会遇到性能瓶颈。这是因为MPPDB集群优化的对象主要是数据,而不是对元数据的频繁和灵活的查询。此外,批量加工在处理极大规模的元数据时可能会遇到如下问题:批量加工效率低下:在处理大量元数据时,批量加工模块由于处理时间过长,容易造成作业中断。资源消耗与效益不匹配:为了实现全量清理和加载,数据管理系统需要投入大量的作业资源,但与此同时,由于元数据查询是非核心需求,其产出的效益与资源投入往往不成正比。增量变更的元数据识别困难:由于数据仓库通常包含庞大的数据量,处理这些数据时,即使是增量更新,也涉及到巨大的数据集,使得有效地识别和处理更新变得复杂且耗时。数据的及时更新对于决策支持、风险管理、合规性等方面至关重要。这就要求增量更新不仅要准确,还要快速。在处理增量更新时,必须确保数据的一致性和准确性。这意味着任何更新或删除操作都必须精确反映在元数据中,以防止数据不一致或信息丢失。Current data management systems often require full processing when acquiring metadata. However, full processing has limitations. Since the MPPDB cluster is mainly oriented to high-throughput massive data processing, it may encounter performance bottlenecks when processing high-frequency and diverse metadata query requirements. This is because the object of MPPDB cluster optimization is mainly data, rather than frequent and flexible queries on metadata. In addition, batch processing may encounter the following problems when processing extremely large amounts of metadata: Low efficiency of batch processing: When processing a large amount of metadata, the batch processing module is prone to job interruptions due to long processing time. Resource consumption and benefits do not match: In order to achieve full cleaning and loading, the data management system needs to invest a lot of job resources, but at the same time, since metadata query is a non-core requirement, the benefits of its output are often not proportional to the resource investment. Difficulty in identifying metadata for incremental changes: Since data warehouses usually contain a huge amount of data, when processing this data, even incremental updates involve huge data sets, making it complex and time-consuming to effectively identify and process updates. Timely updates of data are crucial for decision support, risk management, compliance, and other aspects. This requires that incremental updates must be not only accurate but also fast. When dealing with incremental updates, data consistency and accuracy must be ensured. This means that any updates or deletes must be accurately reflected in the metadata to prevent data inconsistencies or information loss.

有鉴于此,本公开的实施例提供了一种数据处理方法,包括响应于数据管理系统中的定时任务被触发,根据定时任务从数据仓库中确定目标集群,其中,数据管理系统用于对数据仓库中的元数据进行管理,定时任务用于将数据管理系统中的元数据与目标集群中的元数据保持一致;获取对目标集群中的元数据进行处理的处理指令;从处理指令中确定用于对目标集群中的元数据进行变更处理的目标指令;根据目标指令,从目标集群中确定目标元数据,其中,目标元数据表征经过目标指令处理后的元数据;根据目标元数据,对数据管理系统中的元数据进行处理。In view of this, an embodiment of the present disclosure provides a data processing method, including responding to a scheduled task in a data management system being triggered, determining a target cluster from a data warehouse according to the scheduled task, wherein the data management system is used to manage metadata in the data warehouse, and the scheduled task is used to keep the metadata in the data management system consistent with the metadata in the target cluster; obtaining processing instructions for processing the metadata in the target cluster; determining target instructions for changing the metadata in the target cluster from the processing instructions; determining target metadata from the target cluster according to the target instructions, wherein the target metadata represents the metadata processed by the target instructions; and processing the metadata in the data management system according to the target metadata.

图1示意性示出了根据本公开实施例的数据处理方法、装置、设备、介质和程序产品的应用场景图。FIG1 schematically shows an application scenario diagram of a data processing method, apparatus, device, medium, and program product according to an embodiment of the present disclosure.

如图1所示,根据该实施例的应用场景100可以包括第一终端设备101、第二终端设备102、第三终端设备103、网络104和服务器105。网络104用以在第一终端设备101、第二终端设备102、第三终端设备103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in Fig. 1, the application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is used to provide a medium for a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or optical fiber cables, etc.

用户可以使用第一终端设备101、第二终端设备102、第三终端设备103通过网络104与服务器105交互,以接收或发送消息等。第一终端设备101、第二终端设备102、第三终端设备103上可以安装有各种通讯客户端应用,例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等(仅为示例)。The user can use the first terminal device 101, the second terminal device 102, and the third terminal device 103 to interact with the server 105 through the network 104 to receive or send messages, etc. Various communication client applications can be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (only for example).

第一终端设备101、第二终端设备102、第三终端设备103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The first terminal device 101, the second terminal device 102, and the third terminal device 103 may be various electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.

服务器105可以是提供各种服务的服务器,例如对用户利用第一终端设备101、第二终端设备102、第三终端设备103所浏览的网站提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的用户请求等数据进行分析等处理,并将处理结果(例如根据用户请求获取或生成的网页、信息、或数据等)反馈给终端设备。The server 105 may be a server that provides various services, such as a background management server (only as an example) that provides support for websites browsed by users using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as user requests, and feed back the processing results (such as web pages, information, or data obtained or generated according to user requests) to the terminal device.

需要说明的是,本公开实施例所提供的数据处理方法一般可以由服务器105执行。相应地,本公开实施例所提供的数据处理装置一般可以设置于服务器105中。本公开实施例所提供的数据处理方法也可以由不同于服务器105且能够与第一终端设备101、第二终端设备102、第三终端设备103和/或服务器105通信的服务器或服务器集群执行。相应地,本公开实施例所提供的数据处理装置也可以设置于不同于服务器105且能够与第一终端设备101、第二终端设备102、第三终端设备103和/或服务器105通信的服务器或服务器集群中。It should be noted that the data processing method provided in the embodiment of the present disclosure can generally be executed by the server 105. Accordingly, the data processing device provided in the embodiment of the present disclosure can generally be set in the server 105. The data processing method provided in the embodiment of the present disclosure can also be executed by a server or server cluster that is different from the server 105 and can communicate with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105. Accordingly, the data processing device provided in the embodiment of the present disclosure can also be set in a server or server cluster that is different from the server 105 and can communicate with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105.

应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. Any number of terminal devices, networks and servers may be provided according to implementation requirements.

图2示意性示出了根据本公开实施例的数据处理方法的流程图。FIG2 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure.

如图2所示,该实施例的数据处理方法包括操作S210~操作S250。As shown in FIG. 2 , the data processing method of this embodiment includes operations S210 to S250 .

在操作S210,响应于数据管理系统中的定时任务被触发,根据定时任务从数据仓库中确定目标集群,其中,数据管理系统用于对数据仓库中的元数据进行管理,定时任务用于将数据管理系统中的元数据与目标集群中的元数据保持一致。In operation S210, in response to a scheduled task in the data management system being triggered, a target cluster is determined from the data warehouse according to the scheduled task, wherein the data management system is used to manage metadata in the data warehouse, and the scheduled task is used to keep the metadata in the data management system consistent with the metadata in the target cluster.

在操作S220,获取对目标集群中的元数据进行处理的处理指令。In operation S220 , a processing instruction for processing metadata in the target cluster is obtained.

在操作S230,从处理指令中确定用于对目标集群中的元数据进行变更处理的目标指令。In operation S230 , a target instruction for performing a change process on metadata in a target cluster is determined from the processing instructions.

在操作S240,根据目标指令,从目标集群中确定目标元数据,其中,目标元数据表征经过目标指令处理后的元数据。In operation S240 , target metadata is determined from the target cluster according to the target instruction, wherein the target metadata represents metadata processed by the target instruction.

在操作S250,根据目标元数据,对数据管理系统中的元数据进行处理。In operation S250, metadata in the data management system is processed according to target metadata.

根据本公开的实施例,数据仓库中可以包括业务数据和元数据,业务数据即业务的实际数据,元数据是描述业务数据的数据。According to an embodiment of the present disclosure, a data warehouse may include business data and metadata, where the business data is the actual data of the business, and the metadata is the data that describes the business data.

根据本公开的实施例,为了实现对元数据进行更好的分析,可以通过数据管理系统可以对元数据进行管理。According to an embodiment of the present disclosure, in order to achieve better analysis of metadata, metadata may be managed through a data management system.

根据本公开的实施例,为了确保数据管理系统中的元数据与目标集群中的元数据保持一致,在数据管理系统中设置定时任务,使得数据仓库中的元数据能够定期更新至数据管理系统。According to an embodiment of the present disclosure, in order to ensure that the metadata in the data management system is consistent with the metadata in the target cluster, a scheduled task is set in the data management system so that the metadata in the data warehouse can be regularly updated to the data management system.

根据本公开的实施例,定时任务中可以包括本次任务需要进行同步的集群范围,根据定时任务从数据仓库中确定目标集群。According to an embodiment of the present disclosure, a scheduled task may include a cluster range that needs to be synchronized for this task, and a target cluster is determined from a data warehouse according to the scheduled task.

根据本公开的实施例,可以会根据目标集群的处理日志,获取对目标集群中的元数据进行处理的处理指令,处理指令可以包括对元数据的调用指令、查看指令、变更指令等等。According to an embodiment of the present disclosure, processing instructions for processing metadata in the target cluster may be obtained based on the processing log of the target cluster. The processing instructions may include calling instructions, viewing instructions, and changing instructions for the metadata.

根据本公开的实施例,变更指令可以是用于对元数据进行变更处理的指令,对元数据进行变更处理后,元数据发生变化,为了保证数据管理系统中的元数据与目标集群中的元数据的一致,需要对数据管理系统中的元数据进行处理,根据变更指令可以更好的确定发生变换的目标元数据,从而对数据管理系统中的元数据进行处理,保证数据管理系统的元数据与目标集群中的元数据保持一致。According to an embodiment of the present disclosure, a change instruction may be an instruction for performing change processing on metadata. After the metadata is changed, the metadata changes. In order to ensure the consistency between the metadata in the data management system and the metadata in the target cluster, the metadata in the data management system needs to be processed. According to the change instruction, the target metadata that has undergone the change can be better determined, thereby processing the metadata in the data management system to ensure that the metadata in the data management system is consistent with the metadata in the target cluster.

根据本公开的实施例,通过设置定时任务可以自动的确定数据仓库中的目标集群,根据目标集群中对元数据进行处理的处理指令,筛选得到对目标集群中的元数据进行变更处理的目标指令,从二可以确定发生变更的目标元数据,基于目标元数据对数据管理系统进行处理,不需要根据目标集群中全部的元数据进行全量更新,避免了全量更新占用大量的资源,提升了资源利用效率。并且通过指令确定目标元数据方便快捷,不需要新旧数据之间进行比对。According to the embodiments of the present disclosure, the target cluster in the data warehouse can be automatically determined by setting a scheduled task, and the target instruction for changing the metadata in the target cluster can be screened according to the processing instruction for processing the metadata in the target cluster, so that the changed target metadata can be determined, and the data management system can be processed based on the target metadata, without the need to perform a full update based on all the metadata in the target cluster, thus avoiding the full update from occupying a large amount of resources and improving resource utilization efficiency. In addition, it is convenient and quick to determine the target metadata through instructions, and there is no need to compare the new and old data.

根据本公开的实施例,根据目标指令,从目标集群中确定目标元数据,包括:根据目标指令,确定与目标指令相对应的目标标识;从目标集群中确定与目标标识相匹配的目标数据表;根据目标数据表的更新时间和更新内容,确定目标元数据。According to an embodiment of the present disclosure, target metadata is determined from a target cluster according to a target instruction, including: determining a target identifier corresponding to the target instruction according to the target instruction; determining a target data table matching the target identifier from the target cluster; and determining the target metadata according to an update time and update content of the target data table.

根据本公开的实施例,可以处理指令在处理元数据后可以在元数据所在的数据表中留下标识,因此,目标指令在处理完元数据的数据表也会留下标识,可以根据与目标指令相对应的目标标识,从目标集群中确定目标数据表,目标数据表中可以带有目标标识,即目标数据表被目标指令处理过。According to an embodiment of the present disclosure, a processing instruction can leave an identifier in the data table where the metadata is located after processing the metadata. Therefore, the target instruction will also leave an identifier in the data table after processing the metadata. The target data table can be determined from the target cluster based on the target identifier corresponding to the target instruction. The target data table can have a target identifier, that is, the target data table has been processed by the target instruction.

根据本公开的实施例,可以获取目标数据表的更新时间和更新内容,在更新时间满足条件的情况下,可以根据更新内容确定目标元数据。According to the embodiments of the present disclosure, the update time and update content of the target data table can be obtained, and when the update time meets the conditions, the target metadata can be determined according to the update content.

根据本公开的实施例,更新时间满足条件可以是更新时间在本次定时任务和上次定时任务之间的时间段。更新内容可以是目标数据表中的字段发生变更,可以将目标数据表所在的目标集群、数据库名、表名以及字段变更情况、更新时间等生成一个更新文件。According to an embodiment of the present disclosure, the update time satisfies the condition that the update time is between the current scheduled task and the last scheduled task. The update content may be a change in a field in a target data table, and an update file may be generated including the target cluster, database name, table name, field change status, update time, etc. of the target data table.

根据本公开的实施例,通过与目标指令相对应的目标标识识别目标数据表,相比于通过新旧数据进行比对识别目标数据表,不需要对全部数据进行比对,效率更高。According to an embodiment of the present disclosure, the target data table is identified by a target identifier corresponding to a target instruction. Compared with identifying the target data table by comparing new and old data, it is not necessary to compare all the data, which is more efficient.

根据本公开的实施例,根据目标元数据,对数据管理系统中的元数据进行处理,包括:根据目标元数据的变更类型,确定对数据管理系统中的元数据的处理方式;根据处理方式,对数据管理系统中的元数据进行处理。According to an embodiment of the present disclosure, metadata in a data management system is processed according to target metadata, including: determining a processing method for metadata in the data management system according to a change type of the target metadata; and processing metadata in the data management system according to the processing method.

根据本公开的实施例,目标元数据可以包括不同的变更类型,不同的变更类型对应不同的处理方式,根据目标元数据的变更类型,可以确定与数据管理系统的元数据相匹配的处理方式,从而根据处理方式对数据管理系统中的元数据进行处理。According to an embodiment of the present disclosure, the target metadata may include different change types, and different change types correspond to different processing methods. According to the change type of the target metadata, a processing method that matches the metadata of the data management system can be determined, so that the metadata in the data management system can be processed according to the processing method.

根据本公开的实施例,变更类型可以包括修改类型、增加类型和删除类型。According to an embodiment of the present disclosure, the change type may include a modification type, an addition type, and a deletion type.

根据本公开的实施例,在确定目标元数据的变更类型为修改类型的情况下,确定对数据管理系统的元数据的处理方式为修改处理方式;在确定处理方式为修改处理方式的情况下,从数据管理系统中获取与目标元数据相关的第一待修改元数据;根据目标元数据,对第一待修改元数据进行修改。According to an embodiment of the present disclosure, when it is determined that the change type of the target metadata is a modification type, the processing method for the metadata of the data management system is determined to be a modification processing method; when it is determined that the processing method is a modification processing method, the first metadata to be modified related to the target metadata is obtained from the data management system; according to the target metadata, the first metadata to be modified is modified.

根据本公开的实施例,目标元数据的变更类型为修改类型,与目标元数据相关的第一待修改元数据可以是在数据管理系统中与目标元数据发生变更前的数据一致的元数据。According to an embodiment of the present disclosure, the change type of the target metadata is a modification type, and the first metadata to be modified related to the target metadata may be metadata consistent with data before the target metadata is changed in the data management system.

根据本公开的实施例,可以将第一待修改元数据修改为与目标元数据一致的数据,保证数据管理系统的元数据与目标集群中的元数据保持一致。According to an embodiment of the present disclosure, the first metadata to be modified may be modified to be data consistent with the target metadata, thereby ensuring that the metadata of the data management system is consistent with the metadata in the target cluster.

根据本公开的实施例,在确定目标元数据的变更类型为增加类型的情况下,确定对数据管理系统的元数据的处理方式为增加处理方式;在处理方式为增加处理方式的情况下,根据目标元数据确定第二待修改元数据;在数据管理系统中增加第二待修改元数据。According to an embodiment of the present disclosure, when it is determined that the change type of the target metadata is an addition type, the processing method for the metadata of the data management system is determined to be an addition processing method; when the processing method is the addition processing method, the second metadata to be modified is determined according to the target metadata; and the second metadata to be modified is added in the data management system.

根据本公开的实施例,由于目标元数据的变更类型为增加类型,即数据管理系统中不包括该元数据,第二待修改元数据可以是与目标元数据相同的数据,在数据管理系统中增加第二待修改元数据,保证数据管理系统的元数据与目标集群中的元数据保持一致。According to an embodiment of the present disclosure, since the change type of the target metadata is an addition type, that is, the metadata is not included in the data management system, the second metadata to be modified can be the same data as the target metadata. The second metadata to be modified is added to the data management system to ensure that the metadata of the data management system is consistent with the metadata in the target cluster.

根据本公开的实施例,在确定目标元数据的变更类型为删除类型的情况下,确定对数据管理系统的元数据的处理方式为删除处理方式;在处理方式为删除处理方式的情况下,从数据管理系统中获取与目标元数据相关的第三待修改元数据;将第三待修改元数据从数据管理系统删除。According to an embodiment of the present disclosure, when it is determined that the change type of the target metadata is a deletion type, a processing method for the metadata of the data management system is determined to be a deletion processing method; when the processing method is the deletion processing method, third metadata to be modified related to the target metadata is obtained from the data management system; and the third metadata to be modified is deleted from the data management system.

根据本公开的实施例,目标元数据的变更类型为删除类型,在数据管理系统中与目标元数据相同的数据也需要被删除,第三待修改元数据可以是与目标元数据相同的数据,将第三待修改元数据从数据管理系统中删除,保证数据管理系统的元数据与目标集群中的元数据保持一致。According to an embodiment of the present disclosure, the change type of the target metadata is a deletion type, and the data identical to the target metadata in the data management system also needs to be deleted. The third metadata to be modified can be the same data as the target metadata. The third metadata to be modified is deleted from the data management system to ensure that the metadata of the data management system is consistent with the metadata in the target cluster.

根据本公开的实施例,根据目标元数据的修改类型、增加类型或删除类型,使得数据管理系统中的元数据可以按照对应的处理方式被处理,保证数据管理系统的元数据与目标集群中的元数据保持一致。According to an embodiment of the present disclosure, the metadata in the data management system can be processed in a corresponding manner according to the modification type, addition type or deletion type of the target metadata, ensuring that the metadata of the data management system is consistent with the metadata in the target cluster.

下面通过一个示例说明如何确定目标元数据。The following is an example to illustrate how to determine the target metadata.

根据本公开的实施例,假设在数据仓库中检测到了如下变更信息:集群名:ClusterA;数据库名:Database1;表名:CustomerInfo;字段变更:1、字段名:Age,字段类型从Integer变更为Varchar;2、新增字段名:Email,字段类型:Varchar;最后更新时间:2024年1月1日;状态:更新。目标元数据的相关信息可以记录在更新文件中,更新文件如下表1:According to an embodiment of the present disclosure, it is assumed that the following change information is detected in the data warehouse: cluster name: ClusterA; database name: Database1; table name: CustomerInfo; field changes: 1. Field name: Age, field type changed from Integer to Varchar; 2. New field name: Email, field type: Varchar; last updated on January 1, 2024; status: updated. The relevant information of the target metadata can be recorded in the update file, which is as shown in Table 1 below:

表1Table 1

在这个例子中,CustomerInfo表在Database1数据库中经历了结构变更,其中一个字段的类型被修改,同时增加了一个新字段。这些变更被记录在更新文件中,以便数据管理系统能够同步这些变更,确保其记录与实际的数据仓库状态保持一致。In this example, the CustomerInfo table has undergone a structural change in the Database1 database, where the type of one field has been modified and a new field has been added. These changes are recorded in the update file so that the data management system can synchronize these changes and ensure that its records are consistent with the actual data warehouse state.

根据本公开的实施例,可以根据日志的drop table信息,识别并确认业务上已删除的表,将其状态置为删除。在上述例子的基础上添加一个删除表的场景。假设数据资产应用系统在同一次增量更新过程中还检测到了一个表的删除:集群名:ClusterA;数据库名:Database2;表名:OldSalesData;最后更新时间:2024年1月1日;状态:删除。目标元数据的相关信息可以记录在更新文件中,更新文件如下表2:According to an embodiment of the present disclosure, the deleted table in the business can be identified and confirmed based on the drop table information in the log, and its status can be set to deleted. Add a scenario of deleting a table based on the above example. Assume that the data asset application system also detects the deletion of a table during the same incremental update process: Cluster name: ClusterA; Database name: Database2; Table name: OldSalesData; Last updated: January 1, 2024; Status: Deleted. The relevant information of the target metadata can be recorded in the update file, and the update file is as follows in Table 2:

表2Table 2

在这个增加的场景中,OldSalesData表在Database2数据库中被删除。这个变更同样被记录在更新文件中。字段名和字段类型列为空,因为整个表被删除了。数据管理系统根据这个记录会从其库中删除对应的表的元数据,以保证数据的一致性和准确性。In this additional scenario, the OldSalesData table is deleted in the Database2 database. This change is also recorded in the update file. The field name and field type columns are empty because the entire table is deleted. Based on this record, the data management system will delete the metadata of the corresponding table from its library to ensure data consistency and accuracy.

根据本公开的实施例,将更新状态的记录执行修改处理或者增加出来,删除状态的记录执行删除处理,以保持数据管理系统与数据仓库中的集群的一致性。According to an embodiment of the present disclosure, records in an update state are modified or added, and records in a deletion state are deleted, so as to maintain the consistency of the clusters in the data management system and the data warehouse.

图3示意性示出了根据本公开又一实施例的数据处理方法的流程图。FIG3 schematically shows a flow chart of a data processing method according to yet another embodiment of the present disclosure.

如图3所示,该实施例的数据处理方法包括操作S310~操作S340。As shown in FIG. 3 , the data processing method of this embodiment includes operations S310 to S340 .

在操作S310,响应于数据管理系统接收到来自目标对象关于业务数据的查询请求,对目标对象进行身份认证。In operation S310 , in response to the data management system receiving a query request about business data from a target object, the target object is authenticated.

在操作S320,在确定目标对象的身份认证通过的情况下,从数据管理系统中获取与业务数据相关的元数据。In operation S320, when it is determined that the identity authentication of the target object is passed, metadata related to the business data is obtained from the data management system.

在操作S330,根据与业务数据相关的元数据,从业务数据查询模板集合中确定目标查询模板。In operation S330 , a target query template is determined from a set of business data query templates according to metadata related to the business data.

在操作S340,响应于接收到目标对象关于目标查询模板的处理请求,从数据仓库中获取业务数据。In operation S340 , in response to receiving a processing request from a target object regarding a target query template, business data is acquired from a data warehouse.

根据本公开的实施例,目标对象可以是数据分析师,目标对象可以对业务数据进行查询,业务数据是通过元数据进行查询的。数据管理系统可以在接收到目标对象关于业务数据的查询请求的情况下,对目标对象进行身份认证,确保数据安全性,在确定目标对象的身份认证通过的情况下,可以根据业务数据的标识信息,从数据管理系统中获取与业务数据相关的元数据,根据与业务系统相关的元数据从业务数据查询模板集合中进行匹配,确定目标查询模板,目标查询模板可以是与目标对象查询的业务数据的元数据相匹配的模板,即该元数据可以填入目标查询模板,利用目标查询模板能够精确的查询到业务数据,将目标查询模板展示给目标对象,根据目标对象关于目标查询模板的处理请求,从数据仓库中获取业务数据。According to an embodiment of the present disclosure, the target object may be a data analyst, and the target object may query business data, and the business data is queried through metadata. The data management system may perform identity authentication on the target object when receiving a query request from the target object about business data to ensure data security. When it is determined that the identity authentication of the target object is passed, metadata related to the business data may be obtained from the data management system based on the identification information of the business data, and a target query template may be determined by matching from a set of business data query templates based on the metadata related to the business system. The target query template may be a template that matches the metadata of the business data queried by the target object, that is, the metadata may be filled in the target query template, and the business data may be accurately queried using the target query template, and the target query template may be displayed to the target object, and the business data may be obtained from the data warehouse based on the processing request of the target object about the target query template.

根据本公开的实施例,目标对象对业务数据的查询步骤如下:在数据管理系统中进行身份认证,身份认证通过后,根据数据管理系统中获取的与业务数据相关的元数据,目标对象可以确定与查询的业务数据相关的元数据,从而在业务数据查询模板中确定目标查询模板,目标对象可以根据与查询的业务数据相关的元数据在目标查询模板中进行填写,数据管理系统也可以根据业务数据相关的元数据自动填写目标查询模板,完成后点击查询,连接数据仓库,获取业务数据。According to an embodiment of the present disclosure, the steps for the target object to query business data are as follows: identity authentication is performed in the data management system. After the identity authentication is passed, the target object can determine the metadata related to the queried business data based on the metadata related to the business data obtained in the data management system, thereby determining the target query template in the business data query template. The target object can fill in the target query template based on the metadata related to the queried business data, and the data management system can also automatically fill in the target query template based on the metadata related to the business data. After completion, click query to connect to the data warehouse and obtain business data.

根据本公开的实施例,通过对目标对象进行认证,提升了数据的安全性,通过设置查询模板,增加了查询的便捷性,避免用户在查询过程中不知道从何查询。According to the embodiments of the present disclosure, by authenticating the target object, the security of the data is improved, and by setting the query template, the convenience of the query is increased, thereby preventing the user from not knowing where to query during the query process.

根据本公开的实施例,数据处理方法还可以包括:根据数据仓库中目标集群的集群规模,确定与目标集群相匹配的并行处理数量和处理频率;根据并行处理数据量和处理频率生成定时任务。According to an embodiment of the present disclosure, the data processing method may further include: determining the number of parallel processing and the processing frequency that match the target cluster according to the cluster size of the target cluster in the data warehouse; and generating a scheduled task according to the parallel processing data volume and the processing frequency.

根据本公开的实施例,集群的规模可以包括cluster级、database级、schema级、table级黑名单。其中,Cluster级可以是默认选项,适用于较小规模的集群,对于大规模集群,不建议开启。Database级可以适用于中等规模或业务场景单一的集群。Schema级可以适用于超大规模集群,其中单个数据库内部运行多种业务负载,按照schema对数据进行分类。Table级黑名单可以通过正则匹配进一步缩小不需要扫描的范围,排除集群中的临时表、外部表等。不同的级别可以对应的不同的并行处理数量和处理频率。例如,与cluster级集群对应的并行处理数量可以是5个,处理频率可以是每天,与database级集群对应的并行处理数量可以是10个,处理频率可以是每周,与schema级集群对应的并行处理数量可以是20个表的并行度,处理频率可以是每月。According to an embodiment of the present disclosure, the scale of the cluster may include cluster-level, database-level, schema-level, and table-level blacklists. Among them, the cluster level may be the default option, which is suitable for smaller-scale clusters. It is not recommended to enable it for large-scale clusters. The database level may be suitable for medium-sized clusters or clusters with a single business scenario. The schema level may be suitable for ultra-large-scale clusters, in which a single database runs multiple business loads and classifies data according to the schema. The table-level blacklist can further narrow the scope that does not need to be scanned through regular matching, excluding temporary tables, external tables, etc. in the cluster. Different levels may correspond to different numbers of parallel processing and processing frequencies. For example, the number of parallel processing corresponding to the cluster-level cluster may be 5, and the processing frequency may be daily, the number of parallel processing corresponding to the database-level cluster may be 10, and the processing frequency may be weekly, and the number of parallel processing corresponding to the schema-level cluster may be the parallelism of 20 tables, and the processing frequency may be monthly.

根据本公开的实施例,可以根据目标集群的集群规模确定对应的级别,确定对应的并行处理数据量和处理频率。从而生成定时任务。According to the embodiments of the present disclosure, the corresponding level can be determined according to the cluster size of the target cluster, and the corresponding parallel processing data volume and processing frequency can be determined, thereby generating a scheduled task.

根据本公开的实施例,数据管理系统可以对数据仓库中的元数据进行加工整理,方便分析,因此,在获取数据仓库中的数据时,可以获取数据仓库集群的名称、集群内的数据库名称、数据库中的表名称、表中的字段名称、各字段的数据类型、表或字段的最后更新时间、状态等。According to the embodiments of the present disclosure, the data management system can process and organize the metadata in the data warehouse for easy analysis. Therefore, when obtaining data in the data warehouse, the name of the data warehouse cluster, the name of the database in the cluster, the name of the table in the database, the name of the field in the table, the data type of each field, the last update time of the table or field, the status, etc. can be obtained.

根据本公开的实施例,根据目标集群的大小设置定时任务,可以更好的利用资源,避免资源浪费。According to the embodiments of the present disclosure, by setting the scheduled tasks according to the size of the target cluster, resources can be better utilized and waste of resources can be avoided.

根据本公开的实施例,数据处理方法还可以包括:在确定数据管理系统中首次更新目标集群中的元数据的情况下,将目标集群中全部的元数据更新至数据管理系统。According to an embodiment of the present disclosure, the data processing method may further include: when it is determined that the metadata in the target cluster is updated for the first time in the data management system, updating all metadata in the target cluster to the data management system.

根据本公开的实施例,在首次更新时,需要获取目标集群中的全量数据,后续的更新均基于首次获取的全量数据进行更新。According to an embodiment of the present disclosure, during the first update, it is necessary to obtain the full amount of data in the target cluster, and subsequent updates are all based on the full amount of data obtained for the first time.

根据本公开的实施例,首次更新的元数据可以包括集群名、数据库名、表名、字段名、字段类型、最后更新时间、状态(全部标记为更新)。结果文件全量加载到数据管理系统中,供目标对象查询。According to an embodiment of the present disclosure, the metadata updated for the first time may include cluster name, database name, table name, field name, field type, last update time, and status (all marked as updated). The result file is fully loaded into the data management system for query by the target object.

根据本公开的实施例,通过首次全量更新能够将全部全数据更新至数据管理系统,保证数据的完整性,并作为后续的处理依据。According to the embodiments of the present disclosure, all data can be updated to the data management system through the first full update, ensuring the integrity of the data and serving as a basis for subsequent processing.

图4示意性示出了根据本公开实施例的数据处理框架的框图。FIG4 schematically shows a block diagram of a data processing framework according to an embodiment of the present disclosure.

如图4所示,该数据处理框架400包括数据仓库410、数据管理系统420和数据分析服务系统430。As shown in FIG. 4 , the data processing framework 400 includes a data warehouse 410 , a data management system 420 and a data analysis service system 430 .

数据仓库410包括大规模并行处理数据库(Massively Parallel ProcessingDatabase,MPPDB)集群模块和批量加工模块,数据管理系统420包括数据批量更新模块、数据实时更新模块、数据查询接口模块和数据联机模块,数据分析服务系统430包括业务数据查询模板模块、身份认证模块、业务数据查询模块和元数据查询模块。The data warehouse 410 includes a massively parallel processing database (MPPDB) cluster module and a batch processing module, the data management system 420 includes a data batch update module, a data real-time update module, a data query interface module and a data online module, and the data analysis service system 430 includes a business data query template module, an identity authentication module, a business data query module and a metadata query module.

根据本公开的实施例,MPPDB集群模块作为数据处理和存储的核心,负责存储包括业务数据和元数据在内的大量数据集。例如可以存储交易记录、市场数据和用户信息等数据。批量加工模块可以处理来自多个源的大量数据,执行数据清洗、转换和聚合操作,以准备数据供后续分析使用。该模块也负责生成和更新元数据,如数据集的结构、变更历史和访问权限等。According to the embodiments of the present disclosure, the MPPDB cluster module, as the core of data processing and storage, is responsible for storing a large number of data sets including business data and metadata. For example, data such as transaction records, market data, and user information can be stored. The batch processing module can process a large amount of data from multiple sources, perform data cleaning, conversion, and aggregation operations to prepare the data for subsequent analysis. This module is also responsible for generating and updating metadata, such as the structure of the data set, change history, and access rights.

根据本公开的实施例,数据批量更新模块可以将数据仓库410中加工后的元数据定期更新到数据管理系统420中。这不涉及业务数据本身,而是关注于数据资产的描述性部分,即元数据,例如如数据模型、字段定义、数据质量指标等。数据实时更新模块可以与批量更新模块相对应,数据实时更新模块负责处理需要即时反映的元数据变化,可以提供实时、准确的数据分析。例如,当新的数据集加入或现有数据集结构发生变化时,数据实时更新模块会及时更新元数据。数据查询接口模块可以为目标对象提供访问和查询元数据的接口,数据查询接口模块可以支持对数据资产的复杂查询操作,如数据源分析、数据集结构查询等。数据联机模块可以高效处理元数据查询,数据联机模块中的数据库可以存储所有的元数据信息,并支持高频率的查询,以便目标对象能够快速访问和理解数据资产的详细情况。According to an embodiment of the present disclosure, the data batch update module can regularly update the processed metadata in the data warehouse 410 to the data management system 420. This does not involve the business data itself, but focuses on the descriptive part of the data assets, that is, metadata, such as data models, field definitions, data quality indicators, etc. The data real-time update module can correspond to the batch update module. The data real-time update module is responsible for processing metadata changes that need to be reflected immediately, and can provide real-time and accurate data analysis. For example, when a new data set is added or the structure of an existing data set changes, the data real-time update module will update the metadata in a timely manner. The data query interface module can provide an interface for accessing and querying metadata for the target object, and the data query interface module can support complex query operations on data assets, such as data source analysis, data set structure query, etc. The data online module can efficiently process metadata queries, and the database in the data online module can store all metadata information and support high-frequency queries so that the target object can quickly access and understand the details of the data assets.

根据本公开的实施例,业务数据查询模板模块提供了预定义的业务数据查询模板集合,用于常见的业务数据查询,业务数据查询模板集合直接与数据仓库410的MPPDB集群模块相连,允许目标对象快速执行标准化的数据查询,而无需深入了解底层数据结构。身份认证模块验证访问数据分析服务系统430的目标对象的身份,身份认证模块确保只有授权的目标对象才能访问敏感的业务数据和元数据,从而保护数据的安全性和合规性。业务数据查询模块根据目标查询模块允许目标对象执行自定义查询以访问数据仓库410中的业务数据,业务数据查询模块与MPPDB集群模块紧密集成,提供高效的数据访问和处理能力。元数据查询模块用于查询和访问数据管理系统中存储的元数据。通过这个元数据查询模块,目标对象可以获取关于数据的详细信息,如数据表结构、字段定义、数据质量指标等。According to an embodiment of the present disclosure, the business data query template module provides a set of predefined business data query templates for common business data queries. The business data query template set is directly connected to the MPPDB cluster module of the data warehouse 410, allowing the target object to quickly perform standardized data queries without having to deeply understand the underlying data structure. The identity authentication module verifies the identity of the target object accessing the data analysis service system 430. The identity authentication module ensures that only authorized target objects can access sensitive business data and metadata, thereby protecting the security and compliance of the data. The business data query module allows the target object to perform custom queries to access the business data in the data warehouse 410 according to the target query module. The business data query module is tightly integrated with the MPPDB cluster module to provide efficient data access and processing capabilities. The metadata query module is used to query and access metadata stored in the data management system. Through this metadata query module, the target object can obtain detailed information about the data, such as data table structure, field definition, data quality indicators, etc.

下面通过一个示例来说明本公开实施例的数据处理方法。The data processing method of the embodiment of the present disclosure is described below by using an example.

根据本公开的实施例,假设数据仓库中包括了某个MPPDB集群ClusterA内的数据库实例DB_Finance内的两张表:Customer_Transactions和Account_Balances。在定时任务中选择了两张表:DB_Finance数据库中的Customer_Transactions和 Account_Balances。According to an embodiment of the present disclosure, it is assumed that the data warehouse includes two tables in a database instance DB_Finance in a certain MPPDB cluster ClusterA: Customer_Transactions and Account_Balances. Two tables are selected in the scheduled task: Customer_Transactions and Account_Balances in the DB_Finance database.

数据管理系统首次运行时,对两张表进行全量元数据提取,包括每张表的字段名、类型等,并存储在数据资产管理平台的联机库中。When the data management system is run for the first time, full metadata of the two tables is extracted, including the field name and type of each table, and stored in the online library of the data asset management platform.

在某一时刻,触发定时任务,Customer_Transactions表添加了新字段Transaction_Location,Account_Balances表增加了新记录,并修改了字段Last_Updated_Timestamp,即两张表均发生变化。数据管理系统通过审计日志或最后修改时间识别两张表的变化。对于Customer_Transactions,数据管理系统只提取新字段 Transaction_Location的信息进行更新。对于Account_Balances,系统更新了最新的记录和Last_Updated_Timestamp字段信息。At a certain moment, the scheduled task is triggered, the Customer_Transactions table adds a new field Transaction_Location, the Account_Balances table adds a new record, and the field Last_Updated_Timestamp is modified, that is, both tables are changed. The data management system identifies the changes in the two tables through audit logs or last modification time. For Customer_Transactions, the data management system only extracts the information of the new field Transaction_Location for update. For Account_Balances, the system updates the latest record and the Last_Updated_Timestamp field information.

在目标对象需要查询这两张表的最新结构时,通过数据分析服务系统发送查询请求。数据资产查询模块访问数据管理系统,提供两张表的最新结构信息,包括Customer_Transactions表的新字段和Account_Balances表的更新记录。目标对象使用这些最新的表结构信息进行复杂的数据分析,可能是跨表分析,比如分析客户交易行为与账户余额的相关性,分析结果可以用来生成报告或洞察,以支持业务决策。When the target object needs to query the latest structure of the two tables, it sends a query request through the data analysis service system. The data asset query module accesses the data management system and provides the latest structure information of the two tables, including the new fields of the Customer_Transactions table and the updated records of the Account_Balances table. The target object uses this latest table structure information to perform complex data analysis, which may be cross-table analysis, such as analyzing the correlation between customer transaction behavior and account balances. The analysis results can be used to generate reports or insights to support business decisions.

可以看到数据管理系统和数据分析服务系统同时管理多张表的元数据变化,确保目标对象能够访问到最新且准确的多源数据信息,从而进行有效的数据分析,提高了数据处理的效率,减少了重复的全量数据抽取工作,适用于动态变化和大规模的金融数据仓库环境。It can be seen that the data management system and the data analysis service system manage the metadata changes of multiple tables at the same time, ensuring that the target objects can access the latest and accurate multi-source data information, so as to conduct effective data analysis, improve the efficiency of data processing, and reduce repetitive full data extraction work. It is suitable for dynamically changing and large-scale financial data warehouse environments.

根据本公开的实施例,本公开实施例的数据处理方法可以提升数据仓库处理元数据的效率,减少元数据更新和维护的时间成本。通过精准的元数据管理和增量更新处理,保证了数据管理系统中元数据的质量和准确性,为数据分析和业务决策提供了坚实的数据基础。还可以满足对于复杂数据分析的需求。According to the embodiments of the present disclosure, the data processing method of the embodiments of the present disclosure can improve the efficiency of metadata processing in the data warehouse and reduce the time cost of metadata update and maintenance. Through precise metadata management and incremental update processing, the quality and accuracy of metadata in the data management system are guaranteed, providing a solid data foundation for data analysis and business decision-making. It can also meet the needs of complex data analysis.

基于上述数据处理方法,本公开还提供了一种数据处理装置。以下将结合图5对该装置进行详细描述。Based on the above data processing method, the present disclosure further provides a data processing device, which will be described in detail below in conjunction with FIG5 .

图5示意性示出了根据本公开实施例的数据处理装置的结构框图。FIG5 schematically shows a structural block diagram of a data processing device according to an embodiment of the present disclosure.

如图5所示,该实施例的数据处理装置500包括第一确定模块510、第一获取模块520、第二确定模块530、第三确定模块540和处理模块550。As shown in FIG. 5 , the data processing device 500 of this embodiment includes a first determining module 510 , a first acquiring module 520 , a second determining module 530 , a third determining module 540 and a processing module 550 .

第一确定模块510,用于响应于数据管理系统中的定时任务被触发,根据定时任务从数据仓库中确定目标集群,其中,数据管理系统用于对数据仓库中的元数据进行管理,定时任务用于将数据管理系统中的元数据与目标集群中的元数据保持一致。在一实施例中,第一确定模块510可以用于执行前文描述的操作S210,在此不再赘述。The first determination module 510 is used to determine the target cluster from the data warehouse according to the scheduled task in response to the timing task in the data management system being triggered, wherein the data management system is used to manage the metadata in the data warehouse, and the scheduled task is used to keep the metadata in the data management system consistent with the metadata in the target cluster. In one embodiment, the first determination module 510 can be used to perform the operation S210 described above, which will not be repeated here.

第一获取模块520,用于获取对目标集群中的元数据进行处理的处理指令。在一实施例中,第一获取模块520可以用于执行前文描述的操作S220,在此不再赘述。The first acquisition module 520 is used to acquire a processing instruction for processing metadata in the target cluster. In one embodiment, the first acquisition module 520 can be used to perform the operation S220 described above, which will not be described in detail here.

第二确定模块530,用于从处理指令中确定用于对目标集群中的元数据进行变更处理的目标指令。在一实施例中,第二确定模块530可以用于执行前文描述的操作S230,在此不再赘述。The second determination module 530 is used to determine a target instruction for performing a change process on the metadata in the target cluster from the processing instruction. In one embodiment, the second determination module 530 can be used to perform the operation S230 described above, which will not be described in detail here.

第三确定模块540,用于根据目标指令,从目标集群中确定目标元数据,其中,目标元数据表征经过目标指令处理后的元数据。在一实施例中,第三确定模块540可以用于执行前文描述的操作S240,在此不再赘述。The third determination module 540 is used to determine target metadata from the target cluster according to the target instruction, wherein the target metadata represents metadata processed by the target instruction. In one embodiment, the third determination module 540 can be used to perform the operation S240 described above, which will not be described in detail here.

处理模块550,用于根据目标元数据,对数据管理系统中的元数据进行处理。在一实施例中,处理模块550可以用于执行前文描述的操作S250,在此不再赘述。The processing module 550 is used to process the metadata in the data management system according to the target metadata. In one embodiment, the processing module 550 can be used to perform the operation S250 described above, which will not be described in detail here.

根据本公开的实施例,用于根据目标指令,从目标集群中确定目标元数据的第三确定模块540包括:According to an embodiment of the present disclosure, the third determination module 540 for determining target metadata from the target cluster according to the target instruction includes:

第一确定单元,用于根据目标指令,确定与目标指令相对应的目标标识;A first determining unit, configured to determine a target identifier corresponding to the target instruction according to the target instruction;

第二确定单元,用于从目标集群中确定与目标标识相匹配的目标数据表;以及A second determining unit is used to determine a target data table matching the target identifier from the target cluster; and

第三确定单元,用于根据目标数据表的更新时间和更新内容,确定目标元数据。The third determining unit is used to determine the target metadata according to the update time and update content of the target data table.

根据本公开的实施例,用于根据目标元数据,对数据管理系统中的元数据进行处理的处理模块550包括:According to an embodiment of the present disclosure, the processing module 550 for processing metadata in the data management system according to target metadata includes:

第一处理单元,用于根据目标元数据的变更类型,确定对数据管理系统中的元数据的处理方式;A first processing unit, configured to determine a processing method for the metadata in the data management system according to a change type of the target metadata;

第二处理单元,用于根据处理方式,对数据管理系统中的元数据进行处理。The second processing unit is used to process the metadata in the data management system according to the processing method.

根据本公开的实施例,变更类型包括修改类型、增加类型和删除类型,其中,用于根据目标元数据的变更类型,确定对数据管理系统中的元数据的处理方式的第一处理单元包括:According to an embodiment of the present disclosure, the change type includes a modification type, an addition type, and a deletion type, wherein the first processing unit for determining the processing method of the metadata in the data management system according to the change type of the target metadata includes:

第一处理子单元,用于在确定目标元数据的变更类型为修改类型的情况下,确定对数据管理系统的元数据的处理方式为修改处理方式;The first processing subunit is used to determine that the processing mode of the metadata of the data management system is a modification processing mode when it is determined that the change type of the target metadata is a modification type;

第二处理子单元,用于在确定目标元数据的变更类型为增加类型的情况下,确定对数据管理系统的元数据的处理方式为增加处理方式;The second processing subunit is used to determine that the processing mode of the metadata of the data management system is an addition processing mode when it is determined that the change type of the target metadata is an addition type;

第三处理子单元,用于在确定目标元数据的变更类型为删除类型的情况下,确定对数据管理系统的元数据的处理方式为删除处理方式。The third processing subunit is used to determine that the processing method for the metadata of the data management system is a deletion processing method when it is determined that the change type of the target metadata is a deletion type.

根据本公开的实施例,用于根据处理方式,对数据管理系统中的元数据进行处理的第二处理单元包括:According to an embodiment of the present disclosure, the second processing unit for processing metadata in the data management system according to the processing mode includes:

第四处理子单元,用于在确定处理方式为修改处理方式的情况下,从数据管理系统中获取与目标元数据相关的第一待修改元数据;The fourth processing subunit is used to obtain first metadata to be modified related to the target metadata from the data management system when the processing mode is determined to be the modification processing mode;

第五处理子单元,用于根据目标元数据,对第一待修改元数据进行修改;A fifth processing subunit, configured to modify the first metadata to be modified according to the target metadata;

第六处理子单元,用于在处理方式为增加处理方式的情况下,根据目标元数据确定第二待修改元数据;A sixth processing subunit, configured to determine second metadata to be modified according to the target metadata when the processing mode is an adding processing mode;

第七处理子单元,用于在数据管理系统中增加第二待修改元数据;A seventh processing subunit, configured to add second metadata to be modified in the data management system;

第八处理子单元,用于在处理方式为删除处理方式的情况下,从数据管理系统中获取与目标元数据相关的第三待修改元数据;an eighth processing subunit, configured to obtain third metadata to be modified related to the target metadata from the data management system when the processing mode is a deletion processing mode;

第九处理子单元,用于将第三待修改元数据从数据管理系统删除。The ninth processing subunit is configured to delete the third metadata to be modified from the data management system.

根据本公开的实施例,装置500还包括:According to an embodiment of the present disclosure, the apparatus 500 further includes:

认证模块,用于响应于数据管理系统接收到来自目标对象关于业务数据的查询请求,对目标对象进行身份认证;An authentication module, configured to perform identity authentication on a target object in response to a query request about business data received from the target object by the data management system;

第二获取模块,用于在确定目标对象的身份认证通过的情况下,从数据管理系统中获取与业务数据相关的元数据;The second acquisition module is used to acquire metadata related to the business data from the data management system when it is determined that the identity authentication of the target object has passed;

第四确定模块,用于根据与业务数据相关的元数据,从业务数据查询模板集合中确定目标查询模板;A fourth determination module, configured to determine a target query template from a set of business data query templates according to metadata related to the business data;

第三获取模块,用于响应于接收到目标对象关于目标查询模板的处理请求,从数据仓库中获取业务数据。The third acquisition module is used to acquire business data from the data warehouse in response to receiving a processing request from the target object regarding the target query template.

根据本公开的实施例,装置500还包括:According to an embodiment of the present disclosure, the apparatus 500 further includes:

第五确定模块,用于根据数据仓库中目标集群的集群规模,确定与目标集群相匹配的并行处理数量和处理频率;A fifth determination module, used to determine the number of parallel processing and the processing frequency matching the target cluster according to the cluster size of the target cluster in the data warehouse;

生成模块,用于根据并行处理数据量和处理频率生成定时任务。The generation module is used to generate scheduled tasks according to the parallel processing data volume and processing frequency.

根据本公开的实施例,装置500还包括:According to an embodiment of the present disclosure, the apparatus 500 further includes:

更新模块,用于在确定数据管理系统中首次更新目标集群中的元数据的情况下,将目标集群中全部的元数据更新至数据管理系统。The update module is used to update all metadata in the target cluster to the data management system when it is determined that the metadata in the target cluster is updated for the first time in the data management system.

根据本公开的实施例,第一确定模块510、第一获取模块520、第二确定模块530、第三确定模块540和处理模块550中的任意多个模块可以合并在一个模块中实现,或者其中的任意一个模块可以被拆分成多个模块。或者,这些模块中的一个或多个模块的至少部分功能可以与其他模块的至少部分功能相结合,并在一个模块中实现。根据本公开的实施例,第一确定模块510、第一获取模块520、第二确定模块530、第三确定模块540和处理模块550中的至少一个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式等硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者,第一确定模块510、第一获取模块520、第二确定模块530、第三确定模块540和处理模块550中的至少一个可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。According to an embodiment of the present disclosure, any multiple modules among the first determination module 510, the first acquisition module 520, the second determination module 530, the third determination module 540 and the processing module 550 can be combined into one module for implementation, or any one of the modules can be split into multiple modules. Alternatively, at least part of the functions of one or more of these modules can be combined with at least part of the functions of other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first determination module 510, the first acquisition module 520, the second determination module 530, the third determination module 540 and the processing module 550 can be at least partially implemented as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit (ASIC), or can be implemented by hardware or firmware such as any other reasonable way of integrating or packaging the circuit, or implemented in any one of the three implementation modes of software, hardware and firmware or in a suitable combination of any of them. Alternatively, at least one of the first determination module 510, the first acquisition module 520, the second determination module 530, the third determination module 540 and the processing module 550 may be at least partially implemented as a computer program module, which may perform corresponding functions when executed.

图6示意性示出了根据本公开实施例的适于实现数据处理方法的电子设备的方框图。FIG6 schematically shows a block diagram of an electronic device suitable for implementing a data processing method according to an embodiment of the present disclosure.

如图6所示,根据本公开实施例的电子设备600包括处理器601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。处理器601例如可以包括通用微处理器(例如CPU)、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC))等等。处理器601还可以包括用于缓存用途的板载存储器。处理器601可以包括用于执行根据本公开实施例的方法流程的不同动作的单一处理单元或者是多个处理单元。As shown in FIG6 , the electronic device 600 according to an embodiment of the present disclosure includes a processor 601, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 602 or a program loaded from a storage part 608 into a random access memory (RAM) 603. The processor 601 may include, for example, a general-purpose microprocessor (e.g., a CPU), an instruction set processor and/or a related chipset and/or a dedicated microprocessor (e.g., an application-specific integrated circuit (ASIC)), etc. The processor 601 may also include an onboard memory for caching purposes. The processor 601 may include a single processing unit or multiple processing units for performing different actions of the method flow according to an embodiment of the present disclosure.

在RAM 603中,存储有电子设备600操作所需的各种程序和数据。处理器 601、ROM602以及RAM 603通过总线604彼此相连。处理器601通过执行ROM 602和/或RAM 603中的程序来执行根据本公开实施例的方法流程的各种操作。需要注意,所述程序也可以存储在除ROM 602和RAM 603以外的一个或多个存储器中。处理器601也可以通过执行存储在所述一个或多个存储器中的程序来执行根据本公开实施例的方法流程的各种操作。In RAM 603, various programs and data required for the operation of electronic device 600 are stored. Processor 601, ROM 602 and RAM 603 are connected to each other through bus 604. Processor 601 performs various operations of the method flow according to the embodiment of the present disclosure by executing the program in ROM 602 and/or RAM 603. It should be noted that the program can also be stored in one or more memories other than ROM 602 and RAM 603. Processor 601 can also perform various operations of the method flow according to the embodiment of the present disclosure by executing the program stored in the one or more memories.

根据本公开的实施例,电子设备600还可以包括输入/输出(I/O)接口605,输入/输出(I/O)接口605也连接至总线604。电子设备600还可以包括连接至I/O接口605的以下部件中的一项或多项:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。According to an embodiment of the present disclosure, the electronic device 600 may further include an input/output (I/O) interface 605, which is also connected to the bus 604. The electronic device 600 may further include one or more of the following components connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage portion 608 including a hard disk, etc.; and a communication portion 609 including a network interface card such as a LAN card, a modem, etc. The communication portion 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the I/O interface 605 as needed. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage portion 608 as needed.

本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中描述的设备/装置/系统中所包含的;也可以是单独存在,而未装配入该设备/装置/系统中。上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被执行时,实现根据本公开实施例的方法。The present disclosure also provides a computer-readable storage medium, which may be included in the device/apparatus/system described in the above embodiments; or may exist independently without being assembled into the device/apparatus/system. The above computer-readable storage medium carries one or more programs, and when the above one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.

根据本公开的实施例,计算机可读存储介质可以是非易失性的计算机可读存储介质,例如可以包括但不限于:便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。例如,根据本公开的实施例,计算机可读存储介质可以包括上文描述的ROM 602和/或RAM 603和/或ROM 602和RAM 603以外的一个或多个存储器。According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, for example, it may include but is not limited to: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof. In the present disclosure, a computer-readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, an apparatus or a device. For example, according to an embodiment of the present disclosure, the computer-readable storage medium may include the ROM 602 and/or RAM 603 described above and/or one or more memories other than ROM 602 and RAM 603.

本公开的实施例还包括一种计算机程序产品,其包括计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。当计算机程序产品在计算机系统中运行时,该程序代码用于使计算机系统实现本公开实施例所提供的方法。The embodiment of the present disclosure also includes a computer program product, which includes a computer program, and the computer program contains program code for executing the method shown in the flowchart. When the computer program product is run in a computer system, the program code is used to enable the computer system to implement the method provided by the embodiment of the present disclosure.

在该计算机程序被处理器601执行时执行本公开实施例的系统/装置中限定的上述功能。根据本公开的实施例,上文描述的系统、装置、模块、单元等可以通过计算机程序模块来实现。The above functions defined in the system/device of the embodiment of the present disclosure are performed when the computer program is executed by the processor 601. According to the embodiment of the present disclosure, the system, device, module, unit, etc. described above can be implemented by a computer program module.

在一种实施例中,该计算机程序可以依托于光存储器件、磁存储器件等有形存储介质。在另一种实施例中,该计算机程序也可以在网络介质上以信号的形式进行传输、分发,并通过通信部分609被下载和安装,和/或从可拆卸介质611被安装。该计算机程序包含的程序代码可以用任何适当的网络介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。In one embodiment, the computer program may rely on tangible storage media such as optical storage devices, magnetic storage devices, etc. In another embodiment, the computer program may also be transmitted and distributed in the form of signals on a network medium, and downloaded and installed through the communication part 609, and/or installed from a removable medium 611. The program code contained in the computer program may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.

在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被处理器601执行时,执行本公开实施例的系统中限定的上述功能。根据本公开的实施例,上文描述的系统、设备、装置、模块、单元等可以通过计算机程序模块来实现。In such an embodiment, the computer program can be downloaded and installed from the network through the communication part 609, and/or installed from the removable medium 611. When the computer program is executed by the processor 601, the above functions defined in the system of the embodiment of the present disclosure are performed. According to the embodiment of the present disclosure, the system, device, apparatus, module, unit, etc. described above can be implemented by a computer program module.

根据本公开的实施例,可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例提供的计算机程序的程序代码,具体地,可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。程序设计语言包括但不限于诸如Java,C++,python,“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。According to an embodiment of the present disclosure, the program code for executing the computer program provided by the embodiment of the present disclosure can be written in any combination of one or more programming languages. Specifically, these computing programs can be implemented using high-level process and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, Java, C++, python, "C" language or similar programming languages. The program code can be executed entirely on the user computing device, partially on the user device, partially on the remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device can be connected to the user computing device through any type of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (for example, using an Internet service provider to connect through the Internet).

附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present disclosure. In this regard, each box in the flow chart or block diagram can represent a module, a program segment, or a part of a code, and the above-mentioned module, program segment, or a part of a code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some implementations as replacements, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram or flow chart, and the combination of the boxes in the block diagram or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

本领域技术人员可以理解,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合或/或结合,即使这样的组合或结合没有明确记载于本公开中。特别地,在不脱离本公开精神和教导的情况下,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合。所有这些组合和/或结合均落入本公开的范围。It will be appreciated by those skilled in the art that the features described in the various embodiments and/or claims of the present disclosure may be combined and/or combined in various ways, even if such combinations and/or combinations are not explicitly described in the present disclosure. In particular, the features described in the various embodiments and/or claims of the present disclosure may be combined and/or combined in various ways without departing from the spirit and teachings of the present disclosure. All of these combinations and/or combinations fall within the scope of the present disclosure.

以上对本公开的实施例进行了描述。但是,这些实施例仅仅是为了说明的目的,而并非为了限制本公开的范围。尽管在以上分别描述了各实施例,但是这并不意味着各个实施例中的措施不能有利地结合使用。本公开的范围由所附权利要求及其等同物限定。不脱离本公开的范围,本领域技术人员可以做出多种替代和修改,这些替代和修改都应落在本公开的范围之内。The embodiments of the present disclosure are described above. However, these embodiments are only for the purpose of illustration and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the various embodiments cannot be used in combination to advantage. The scope of the present disclosure is defined by the attached claims and their equivalents. Without departing from the scope of the present disclosure, those skilled in the art may make a variety of substitutions and modifications, which should all fall within the scope of the present disclosure.

Claims (12)

1.一种数据处理方法,其特征在于,所述方法包括:1. A data processing method, characterized in that the method comprises: 响应于数据管理系统中的定时任务被触发,根据所述定时任务从数据仓库中确定目标集群,其中,所述数据管理系统用于对所述数据仓库中的元数据进行管理,所述定时任务用于将所述数据管理系统中的元数据与所述目标集群中的元数据保持一致;In response to a scheduled task in a data management system being triggered, determining a target cluster from a data warehouse according to the scheduled task, wherein the data management system is used to manage metadata in the data warehouse, and the scheduled task is used to keep the metadata in the data management system consistent with the metadata in the target cluster; 获取对所述目标集群中的元数据进行处理的处理指令;Obtaining a processing instruction for processing metadata in the target cluster; 从所述处理指令中确定用于对所述目标集群中的元数据进行变更处理的目标指令;Determining a target instruction for performing a change process on the metadata in the target cluster from the processing instructions; 根据所述目标指令,从所述目标集群中确定目标元数据,其中,所述目标元数据表征经过所述目标指令处理后的元数据;以及Determining target metadata from the target cluster according to the target instruction, wherein the target metadata represents metadata processed by the target instruction; and 根据所述目标元数据,对所述数据管理系统中的元数据进行处理。The metadata in the data management system is processed according to the target metadata. 2.根据权利要求1所述的方法,其特征在于,所述根据所述目标指令,从所述目标集群中确定目标元数据,包括:2. The method according to claim 1, wherein determining target metadata from the target cluster according to the target instruction comprises: 根据所述目标指令,确定与所述目标指令相对应的目标标识;According to the target instruction, determining a target identifier corresponding to the target instruction; 从所述目标集群中确定与所述目标标识相匹配的目标数据表;以及Determining a target data table matching the target identifier from the target cluster; and 根据所述目标数据表的更新时间和更新内容,确定所述目标元数据。The target metadata is determined according to the update time and update content of the target data table. 3.根据权利要求1所述的方法,其特征在于,根据所述目标元数据,对所述数据管理系统中的元数据进行处理,包括:3. The method according to claim 1, characterized in that, according to the target metadata, processing the metadata in the data management system comprises: 根据所述目标元数据的变更类型,确定对所述数据管理系统中的元数据的处理方式;Determining, according to the change type of the target metadata, a processing method for the metadata in the data management system; 根据所述处理方式,对所述数据管理系统中的元数据进行处理。According to the processing method, the metadata in the data management system is processed. 4.根据权利要求3所述的方法,其特征在于,所述变更类型包括修改类型、增加类型和删除类型,其中,所述根据所述目标元数据的变更类型,确定对所述数据管理系统中的元数据的处理方式,包括:4. The method according to claim 3, wherein the change type includes a modification type, an addition type, and a deletion type, wherein determining a processing method for the metadata in the data management system according to the change type of the target metadata comprises: 在确定所述目标元数据的变更类型为修改类型的情况下,确定对所述数据管理系统的元数据的处理方式为修改处理方式;In the case where it is determined that the change type of the target metadata is a modification type, determining that a processing method for the metadata of the data management system is a modification processing method; 在确定所述目标元数据的变更类型为增加类型的情况下,确定对所述数据管理系统的元数据的处理方式为增加处理方式;In the case where it is determined that the change type of the target metadata is an addition type, determining that a processing method for the metadata of the data management system is an addition processing method; 在确定所述目标元数据的变更类型为删除类型的情况下,确定对所述数据管理系统的元数据的处理方式为删除处理方式。When it is determined that the change type of the target metadata is a deletion type, the processing method for the metadata of the data management system is determined to be a deletion processing method. 5.根据权利要求4所述的方法,其特征在于,所述根据所述处理方式,对所述数据管理系统中的元数据进行处理,包括:5. The method according to claim 4, characterized in that the processing of metadata in the data management system according to the processing mode comprises: 在确定所述处理方式为所述修改处理方式的情况下,从所述数据管理系统中获取与所述目标元数据相关的第一待修改元数据;When it is determined that the processing mode is the modification processing mode, obtaining first metadata to be modified related to the target metadata from the data management system; 根据所述目标元数据,对所述第一待修改元数据进行修改;Modifying the first metadata to be modified according to the target metadata; 在所述处理方式为所述增加处理方式的情况下,根据所述目标元数据确定第二待修改元数据;In a case where the processing mode is the adding processing mode, determining second metadata to be modified according to the target metadata; 在所述数据管理系统中增加所述第二待修改元数据;Adding the second metadata to be modified in the data management system; 在所述处理方式为所述删除处理方式的情况下,从所述数据管理系统中获取与所述目标元数据相关的第三待修改元数据;When the processing mode is the deletion processing mode, obtaining third metadata to be modified related to the target metadata from the data management system; 将所述第三待修改元数据从所述数据管理系统删除。The third metadata to be modified is deleted from the data management system. 6.根据权利要求1~5中任一项所述的方法,其特征在于,所述方法还包括:6. The method according to any one of claims 1 to 5, characterized in that the method further comprises: 响应于所述数据管理系统接收到来自目标对象关于业务数据的查询请求,对所述目标对象进行身份认证;In response to the data management system receiving a query request about business data from a target object, performing identity authentication on the target object; 在确定所述目标对象的身份认证通过的情况下,从所述数据管理系统中获取与所述业务数据相关的元数据;When it is determined that the identity authentication of the target object is passed, obtaining metadata related to the business data from the data management system; 根据与所述业务数据相关的元数据,从业务数据查询模板集合中确定目标查询模板;Determining a target query template from a set of business data query templates according to metadata related to the business data; 响应于接收到所述目标对象关于所述目标查询模板的处理请求,从所述数据仓库中获取所述业务数据。In response to receiving a processing request from the target object regarding the target query template, acquiring the business data from the data warehouse. 7.根据权利要求1所述的方法,其特征在于,所述方法还包括:7. The method according to claim 1, characterized in that the method further comprises: 根据所述数据仓库中所述目标集群的集群规模,确定与所述目标集群相匹配的并行处理数量和处理频率;Determining the number of parallel processing and the processing frequency that match the target cluster according to the cluster size of the target cluster in the data warehouse; 根据所述并行处理数据量和所述处理频率生成所述定时任务。The scheduled task is generated according to the parallel processing data volume and the processing frequency. 8.根据权利要求1所述的方法,其特征在于,所述方法还包括:8. The method according to claim 1, characterized in that the method further comprises: 在确定所述数据管理系统中首次更新所述目标集群中的元数据的情况下,将所述目标集群中全部的元数据更新至所述数据管理系统。When it is determined that the metadata in the target cluster is updated for the first time in the data management system, all metadata in the target cluster are updated to the data management system. 9.一种数据处理装置,其特征在于,所述装置包括:9. A data processing device, characterized in that the device comprises: 第一确定模块,用于响应于数据管理系统中的定时任务被触发,根据所述定时任务从数据仓库中确定目标集群,其中,所述数据管理系统用于对所述数据仓库中的元数据进行管理,所述定时任务用于将所述数据管理系统中的元数据与所述目标集群中的元数据保持一致;A first determination module, configured to determine a target cluster from a data warehouse in response to a scheduled task in a data management system being triggered according to the scheduled task, wherein the data management system is configured to manage metadata in the data warehouse, and the scheduled task is configured to keep the metadata in the data management system consistent with the metadata in the target cluster; 第一获取模块,用于获取对所述目标集群中的元数据进行处理的处理指令;A first acquisition module, used to acquire a processing instruction for processing metadata in the target cluster; 第二确定模块,用于从所述处理指令中确定用于对所述目标集群中的元数据进行变更处理的目标指令;A second determining module, configured to determine, from the processing instructions, a target instruction for performing a change process on the metadata in the target cluster; 第三确定模块,用于根据所述目标指令,从所述目标集群中确定目标元数据,其中,所述目标元数据表征经过所述目标指令处理后的元数据;以及a third determination module, configured to determine target metadata from the target cluster according to the target instruction, wherein the target metadata represents metadata processed by the target instruction; and 处理模块,用于根据所述目标元数据,对所述数据管理系统中的元数据进行处理。A processing module is used to process the metadata in the data management system according to the target metadata. 10.一种电子设备,包括:10. An electronic device comprising: 一个或多个处理器;one or more processors; 存储器,用于存储一个或多个计算机程序,a memory for storing one or more computer programs, 其特征在于,所述一个或多个处理器执行所述一个或多个计算机程序以实现根据权利要求1~8中任一项所述方法的步骤。It is characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1 to 8. 11.一种计算机可读存储介质,其上存储有计算机程序或指令,其特征在于,所述计算机程序或指令被处理器执行时实现根据权利要求1~8中任一项所述方法的步骤。11. A computer-readable storage medium having a computer program or instruction stored thereon, wherein the computer program or instruction, when executed by a processor, implements the steps of the method according to any one of claims 1 to 8. 12.一种计算机程序产品,包括计算机程序或指令,其特征在于,所述计算机程序或指令被处理器执行时实现根据权利要求1~8中任一项所述方法的步骤。12. A computer program product, comprising a computer program or instructions, characterized in that when the computer program or instructions are executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented.
CN202410713470.6A 2024-06-04 2024-06-04 Data processing method, device, equipment, medium and program product Pending CN118535658A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410713470.6A CN118535658A (en) 2024-06-04 2024-06-04 Data processing method, device, equipment, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410713470.6A CN118535658A (en) 2024-06-04 2024-06-04 Data processing method, device, equipment, medium and program product

Publications (1)

Publication Number Publication Date
CN118535658A true CN118535658A (en) 2024-08-23

Family

ID=92382401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410713470.6A Pending CN118535658A (en) 2024-06-04 2024-06-04 Data processing method, device, equipment, medium and program product

Country Status (1)

Country Link
CN (1) CN118535658A (en)

Similar Documents

Publication Publication Date Title
US20230126005A1 (en) Consistent filtering of machine learning data
US10521404B2 (en) Data transformations with metadata
US10366053B1 (en) Consistent randomized record-level splitting of machine learning data
US11100420B2 (en) Input processing for machine learning
US11308095B1 (en) Systems and methods for tracking sensitive data in a big data environment
US11823072B2 (en) Customer behavior predictive modeling
US8533235B2 (en) Infrastructure and architecture for development and execution of predictive models
CN111427971B (en) Business modeling method, device, system and medium for computer system
US10360394B2 (en) System and method for creating, tracking, and maintaining big data use cases
CN112925859B (en) Data storage method and device
CN112445866A (en) Data processing method and device, computer readable medium and electronic equipment
CN118535658A (en) Data processing method, device, equipment, medium and program product
CN112131257B (en) Data query method and device
CN118861102A (en) Method, device, equipment and medium for comparing parameter data differences
CN118550833A (en) Method, apparatus, device, medium and program product for generating application security image
KR20220039325A (en) Apparatus for analyzing meter data of ami
CN118445209A (en) Software detection method, device, equipment, medium and program product
CN118708584A (en) A report processing method and device
CN119316435A (en) Cross-cloud increment synchronization method, device, equipment, medium and program product
CN118972403A (en) Data sharing method, device and system
CN117093609A (en) Query statement processing method, device, equipment, medium and program product
CN115576935A (en) Storage cleaning method and device for Hadoop, computer equipment and storage medium
CN118897658A (en) Method, device and equipment for processing object record data based on backend service
CN118796801A (en) Data migration method, device and electronic equipment
CN119025586A (en) Big data processing platform, big data processing method and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination