[go: up one dir, main page]

CN117635371A - An enterprise data quality management method, device, electronic equipment and storage medium - Google Patents

An enterprise data quality management method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117635371A
CN117635371A CN202311594531.3A CN202311594531A CN117635371A CN 117635371 A CN117635371 A CN 117635371A CN 202311594531 A CN202311594531 A CN 202311594531A CN 117635371 A CN117635371 A CN 117635371A
Authority
CN
China
Prior art keywords
data
quality
determining
information
incremental
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311594531.3A
Other languages
Chinese (zh)
Inventor
金柳
李弘思
姚琦
张剑
�田�浩
楚钦钦
刘守华
姚兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Communications Information Technology Group Co ltd
Original Assignee
China Communications Information Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Communications Information Technology Group Co ltd filed Critical China Communications Information Technology Group Co ltd
Priority to CN202311594531.3A priority Critical patent/CN117635371A/en
Publication of CN117635371A publication Critical patent/CN117635371A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • General Factory Administration (AREA)

Abstract

本申请涉及数据管理的技术领域,尤其是涉及一种企业数据质量管理方法、装置、电子设备及存储介质。方法包括:获取业务数据,基于所述业务数据,确定原始数据;将所述原始数据输入贴源层,确定增量数据信息;基于所述增量数据信息,确定问题数据,所述问题数据为不符合质量标准的数据;根据所述问题数据,确定整改报表,并反馈显示。本申请可以提高数据质量管理的准确性。

This application relates to the technical field of data management, and in particular to an enterprise data quality management method, device, electronic equipment and storage medium. The method includes: obtaining business data, and determining original data based on the business data; inputting the original data into the source layer to determine incremental data information; and determining problem data based on the incremental data information, and the problem data is Data that does not meet quality standards; based on the problem data, determine the rectification report and display the feedback. This application can improve the accuracy of data quality management.

Description

一种企业数据质量管理方法、装置、电子设备及存储介质An enterprise data quality management method, device, electronic equipment and storage medium

技术领域Technical field

本申请涉及数据管理的技术领域,尤其是涉及一种企业数据质量管理方法、装置、电子设备及存储介质。This application relates to the technical field of data management, and in particular to an enterprise data quality management method, device, electronic equipment and storage medium.

背景技术Background technique

建筑类集团型企业是一种以建筑设计、施工、管理为主业,多元化发展的企业,这类企业通常具有复杂的业务结构和多样化的数据来源;由于业务特性的影响,建筑类集团型企业对数据的质量要求也更加严格。Construction group enterprises are diversified enterprises with architectural design, construction, and management as their main businesses. Such enterprises usually have complex business structures and diversified data sources; due to the influence of business characteristics, construction groups Large-scale enterprises also have stricter data quality requirements.

传统的企业数据管理方法主要为通过人工制定数据标准、规范数据处理流程、建立数据质量管理平台等方式,实现对数据的全面管理。Traditional enterprise data management methods mainly achieve comprehensive management of data by manually formulating data standards, standardizing data processing procedures, and establishing data quality management platforms.

但是,传统的企业数据管理方法数据质量管理方法往往只关注数据的规范化和标准化,并且容易受人为因素影响,从而导致建筑行业集团型企业数据质量准确性低。However, traditional enterprise data management methods and data quality management methods often only focus on the normalization and standardization of data, and are easily affected by human factors, resulting in low accuracy of data quality in construction industry group enterprises.

发明内容Contents of the invention

为了提高数据质量管理的准确性,本申请提供一种企业数据质量管理方法、装置、电子设备及存储介质。In order to improve the accuracy of data quality management, this application provides an enterprise data quality management method, device, electronic equipment and storage medium.

第一方面,本申请提供一种企业数据质量管理方法,采用如下的技术方案:In the first aspect, this application provides an enterprise data quality management method, adopting the following technical solutions:

一种企业数据质量管理方法,包括:An enterprise data quality management approach that includes:

获取业务数据,基于所述业务数据,确定原始数据,所述业务数据为企业业务对应的数据,所述原始数据为需要进行数据质量检测的数据;Obtain business data, and determine original data based on the business data, where the business data is data corresponding to enterprise business, and the original data is data that needs to be tested for data quality;

将所述原始数据输入贴源层,确定增量数据信息,所述增量数据信息为;Input the original data into the source layer to determine the incremental data information. The incremental data information is;

基于所述增量数据信息,确定问题数据,所述问题数据为不符合质量标准的数据;Based on the incremental data information, problem data is determined, and the problem data is data that does not meet quality standards;

根据所述问题数据,确定整改报表,并反馈显示。Based on the problem data, determine the rectification report and display the feedback.

通过采用上述技术方案,获取到业务数据,从而确定出该建筑类集团型企业具体的企业数据,之后,基于业务数据,确定出原始数据,从而将业务数据中重复或空白的业务数据剔除,以适应后续的数据处理与分析;随即,将原始数据输入贴源层中,识别出增量数据信息,从而确定出具体发生变化的数据以及变化的具体内容;基于增量数据信息,确定出不符合质量标准的数据,即问题数据;一旦发现问题数据,根据问题数据,确定出整改报表,并反馈显示,为相关技术人员提供相应的整改建议;从而提高数据质量管理的准确性。By using the above technical solution, business data is obtained to determine the specific enterprise data of the construction group enterprise. Then, based on the business data, the original data is determined, thereby eliminating duplicate or blank business data in the business data. Adapt to subsequent data processing and analysis; then, input the original data into the source layer to identify the incremental data information, thereby determining the specific changed data and the specific content of the change; based on the incremental data information, determine the inconsistencies Quality standard data is problem data; once problem data is discovered, a rectification report will be determined based on the problem data, and feedback will be displayed to provide relevant technical personnel with corresponding rectification suggestions; thereby improving the accuracy of data quality management.

在一种可能的实现方式中,所述基于所述增量数据信息,确定问题数据,包括:In a possible implementation, determining problem data based on the incremental data information includes:

基于所述增量数据信息,确定数据表信息;Based on the incremental data information, determine the data table information;

获取质量问题;Acquisition quality issues;

根据所述数据表信息以及所述质量问题,确定质量主体;Determine the quality subject based on the data table information and the quality issues;

基于所述质量问题以及所述质量主体,确定质量规则信息;Determine quality rule information based on the quality problem and the quality subject;

若所述质量规则信息与预设质量规则阈值一致,则确定所述质量规则信息为目标质量规则信息;If the quality rule information is consistent with the preset quality rule threshold, the quality rule information is determined to be the target quality rule information;

基于所述目标质量规则信息,确定执行方式;Based on the target quality rule information, determine the execution method;

根据所述执行方式,确定所述问题数据。According to the execution method, the problem data is determined.

通过采用上述技术方案,通过对增量数据信息进行分析,得到数据表信息;在获取质量问题后,通过对质量问题以及数据表信息进行识别分析后,得到质量主体,从而更好的进行质量管理;随即,基于质量问题以及指廊主体,确定出质量规则信息,为后续的数据和分析提供了指导和约束;将质量规则信息与预设质量规则阈值对比,若质量规则信息与预设质量规则阈值一致,则说明该质量规则信息符合需求,随即将该质量规则信息保留并设定为目标质量规则信息;基于目标质量规则信息,确定出对应的执行方式,并通过执行方式,确定出问题数据,从而提高确定问题数据的准确性。By adopting the above technical solution, the data table information is obtained by analyzing the incremental data information; after obtaining the quality problems, the quality subject is obtained by identifying and analyzing the quality problems and data table information, so as to better carry out quality management. ; Then, based on the quality issues and the main body of the corridor, the quality rule information is determined, which provides guidance and constraints for subsequent data and analysis; the quality rule information is compared with the preset quality rule threshold. If the quality rule information is different from the preset quality rule If the thresholds are consistent, it means that the quality rule information meets the requirements, and then the quality rule information is retained and set as the target quality rule information; based on the target quality rule information, the corresponding execution method is determined, and the problematic data is determined through the execution method , thereby improving the accuracy of determining problem data.

在一种可能的实现方式中,所述根据所述数据表信息以及所述质量问题,确定质量主体,包括:In a possible implementation, determining the quality subject based on the data table information and the quality problem includes:

获取业务需求;Obtain business requirements;

基于所述业务需求,确定变更信息;Based on the business requirements, determine the change information;

基于所述变更信息以及所述数据表信息,确定目标表格;Based on the change information and the data table information, determine the target table;

根据所述目标表格,确定第一关键词;Determine the first keyword according to the target table;

根据所述质量问题,确定第二关键词;According to the quality problem, determine the second keyword;

若所述第一关键词与所述第二关键词一致,则将与所述第二关键词对应的质量问题输入至预设表格中,确定所述质量主体。If the first keyword is consistent with the second keyword, the quality problem corresponding to the second keyword is input into the preset table to determine the quality subject.

通过采用上述技术方案,获取业务需求,并基于业务需求,确定出变更信息,从而对数据的变化情况进一步了解;对变更信息以及数据表信息进行分析识别,确定出与变更信息相关联的数据表,即确定出目标表格;随即,分别对目标表格以及质量问题进行关键词设定,分别确定出第一关键词以及第二关键词,其中,目标表格对应第一关键词,质量问题对应第二关键词;随即,将第一关键词与第二关键词进行匹配,若第一关键词与第二关键词一致,则将与第二关键词对应的质量问题输入至预设表格中,确定出质量主体,从而提高数据的利用效率。By using the above technical solutions, business needs are obtained, and change information is determined based on business needs, so as to further understand the changes in data; change information and data table information are analyzed and identified, and the data tables associated with the change information are determined. , that is, the target table is determined; then, keywords are set for the target table and the quality problem respectively, and the first keyword and the second keyword are determined respectively, where the target table corresponds to the first keyword, and the quality problem corresponds to the second keyword. keyword; then, the first keyword and the second keyword are matched. If the first keyword and the second keyword are consistent, the quality problem corresponding to the second keyword is input into the preset table to determine Quality subject, thereby improving data utilization efficiency.

在一种可能的实现方式中,所述基于所述质量问题以及所述质量主体,确定质量规则信息,包括:In a possible implementation, determining quality rule information based on the quality problem and the quality subject includes:

基于所述质量问题,确定质量模板;Based on the quality issues, determine the quality template;

根据所述质量主体,确定填充信息;Determine filling information according to the quality subject;

若所述质量模板与所述质量问题匹配成功,则将所述填充信息添加至质量规则阈值中,确定所述质量规则信息。If the quality template successfully matches the quality problem, the filling information is added to the quality rule threshold, and the quality rule information is determined.

通过采用上述技术方案,在获取到质量问题后,根据质量问题的描述,确定出对应的质量模板,从而确定出问题的处理方式和要求;随即,根据质量主体,确定出填充信息,用于补充完善质量规则阈值;将质量模板与质量问题进行匹配,若质量模板与质量问题匹配成功,则将填充信息添加至质量规则阈值中,确定出质量规则信息,为后续的数据处理和分析提供了指导和约束。By adopting the above technical solution, after obtaining the quality problem, the corresponding quality template is determined according to the description of the quality problem, thereby determining the problem handling method and requirements; then, based on the quality subject, the filling information is determined for supplementary Improve the quality rule threshold; match the quality template with the quality problem. If the quality template matches the quality problem successfully, add the filling information to the quality rule threshold to determine the quality rule information, which provides guidance for subsequent data processing and analysis. and constraints.

在一种可能的实现方式中,所述基于所述目标质量规则信息,确定执行方式,包括:In a possible implementation, determining an execution method based on the target quality rule information includes:

获取质量任务;Get quality tasks;

基于所述质量任务,确定任务属性信息;Based on the quality task, determine task attribute information;

根据所述任务属性信息以及所述目标质量规则信息,确定所述执行方式。The execution mode is determined based on the task attribute information and the target quality rule information.

通过采用上述技术方案,获取到任务质量后,根据对质量任务的分析,确定出质量任务对应的任务属性信息,为后续的执行方式确定提供依据;随即根据任务属性信息以及目标质量规则信息,确定出对应的执行方式,从而实现对不同任务的有效管理,提高任务执行的效率与准确性。By adopting the above technical solution, after obtaining the task quality, based on the analysis of the quality task, the task attribute information corresponding to the quality task is determined, which provides a basis for subsequent determination of the execution method; then, based on the task attribute information and the target quality rule information, the task attribute information corresponding to the quality task is determined. Come up with corresponding execution methods to achieve effective management of different tasks and improve the efficiency and accuracy of task execution.

在一种可能的实现方式中,所述根据所述执行方式,确定所述问题数据,包括:In a possible implementation, determining the problem data according to the execution method includes:

基于所述执行方式,确定质量语句明细;Based on the execution method, determine the quality statement details;

根据所述质量语句明细对所述增量数据信息进行查阅,确定查阅数据;Check the incremental data information according to the quality statement details and determine the consulted data;

若所述查阅数据不存在于预设例外数据库中,则将所述查阅数据输入问题数据库中,确定问题数据。If the query data does not exist in the preset exception database, the query data is input into the problem database to determine the problem data.

通过采用上述技术方案,在确定出执行方式后,根据执行方式,确定出与之对应的质量语句明细;随即,根据质量语句明细对增量数据信息进行查阅,确定出查阅数据;之后,将查阅数据与预设例外数据库进行匹配,若查阅数据不存在与预设例外数据库中,则将查阅数据对应的数据传输问问题数据库中,确定出问题数据库;从而提高确定问题数据的准确性。By adopting the above technical solution, after the execution mode is determined, the corresponding quality statement details are determined according to the execution mode; then, the incremental data information is reviewed according to the quality statement details, and the query data is determined; after that, the query data is determined The data is matched with the preset exception database. If the query data does not exist in the preset exception database, the data corresponding to the query data will be transferred to the problem database to determine the problem database; thereby improving the accuracy of determining the problem data.

在一种可能的实现方式中,所述根据所述问题数据,确定整改报表,并反馈显示,之后还包括:In a possible implementation, determining a rectification report based on the problem data, and displaying the feedback, further includes:

获取整改后的整改数据;Obtain the rectified data after rectification;

对所述整改数据进行验证,确定验证结果。Verify the rectification data and determine the verification results.

通过采用上述技术方案,获取整改后的整改数据,从而对经过整改后的数据进一步的了解;随即,对整改数据进行验证,确定验证结果;从而了解整改数据的质量水平。By using the above technical solution, the rectified data after rectification is obtained, so as to further understand the rectified data; then, the rectified data is verified and the verification results are determined; thereby the quality level of the rectified data is understood.

第二方面,本申请提供一种企业数据质量管理装置,采用如下的技术方案:In the second aspect, this application provides an enterprise data quality management device that adopts the following technical solution:

一种企业数据质量管理装置,包括:原始数据确定模块、增量数据信息确定模块、问题数据确定模块以及整改报表确定模块,其中,An enterprise data quality management device, including: an original data determination module, an incremental data information determination module, a problem data determination module and a rectification report determination module, wherein,

原始数据确定模块,用于获取业务数据,基于所述业务数据,确定原始数据;The original data determination module is used to obtain business data and determine the original data based on the business data;

增量数据信息确定模块,用于将所述原始数据输入贴源层,确定增量数据信息;The incremental data information determination module is used to input the original data into the source layer and determine the incremental data information;

问题数据确定模块,用于基于所述增量数据信息,确定问题数据,所述问题数据为不符合质量标准的数据;A problem data determination module, configured to determine problem data based on the incremental data information, where the problem data is data that does not meet quality standards;

整改报表确定模块,用于根据所述问题数据,确定整改报表,并反馈显示。The rectification report determination module is used to determine the rectification report based on the problem data and provide feedback for display.

通过采用上述技术方案,原始数据确定模块获取到业务数据,从而确定出该建筑类集团型企业具体的企业数据,之后,基于业务数据,确定出原始数据,从而将业务数据中重复或空白的业务数据剔除,以适应后续的数据处理与分析;随即,增量数据信息确定模块将原始数据输入贴源层中,识别出增量数据信息,从而确定出具体发生变化的数据以及变化的具体内容;问题数据确定模块基于增量数据信息,确定出不符合质量标准的数据,即问题数据;整改报表确定模块一旦发现问题数据,根据问题数据,确定出整改报表,并反馈显示,为相关技术人员提供相应的整改建议;从而提高数据质量管理的准确性。By adopting the above technical solution, the original data determination module obtains the business data, thereby determining the specific enterprise data of the construction group enterprise. Then, based on the business data, the original data is determined, thereby removing duplicate or blank businesses in the business data. Data is removed to adapt to subsequent data processing and analysis; then, the incremental data information determination module inputs the original data into the source layer, identifies the incremental data information, and thereby determines the specific changed data and the specific content of the change; Based on the incremental data information, the problem data determination module determines the data that does not meet the quality standards, that is, problem data; once the rectification report determination module finds the problem data, it determines the rectification report based on the problem data, and displays the feedback to provide relevant technical personnel with Corresponding rectification suggestions; thereby improving the accuracy of data quality management.

在一种可能的实现方式中,所述问题数据确定模块包括:数据表信息确定单元、质量问题获取单元、质量主体确定单元、质量规则信息确定单元、目标质量规则信息确定单元、执行方式确定单元以及问题数据确定单元,其中,In a possible implementation, the problem data determination module includes: a data table information determination unit, a quality problem acquisition unit, a quality subject determination unit, a quality rule information determination unit, a target quality rule information determination unit, and an execution mode determination unit and the problem data determination unit, where,

数据表信息确定单元,用于基于所述增量数据信息,确定数据表信息;A data table information determining unit, configured to determine data table information based on the incremental data information;

质量问题获取单元,用于获取质量问题;Quality problem acquisition unit, used to obtain quality problems;

质量主体确定单元,用于根据所述数据表信息以及所述质量问题,确定质量主体;A quality subject determination unit, used to determine the quality subject based on the data table information and the quality problem;

质量规则信息确定单元,用于基于所述质量问题以及所述质量主体,确定质量规则信息;A quality rule information determination unit, configured to determine quality rule information based on the quality problem and the quality subject;

目标质量规则信息确定单元,用于若所述质量规则信息与预设质量规则阈值一致,则确定所述质量规则信息为目标质量规则信息;A target quality rule information determination unit configured to determine that the quality rule information is the target quality rule information if the quality rule information is consistent with the preset quality rule threshold;

执行方式确定单元,用于基于所述目标质量规则信息,确定执行方式;An execution mode determination unit, configured to determine the execution mode based on the target quality rule information;

问题数据确定单元,用于根据所述执行方式,确定所述问题数据。A problem data determining unit is used to determine the problem data according to the execution mode.

在一种可能的实现方式中,所述质量主体确定单元,具体用于:In a possible implementation, the quality subject determination unit is specifically used for:

获取业务需求;Obtain business requirements;

基于所述业务需求,确定变更信息;Based on the business requirements, determine the change information;

基于所述变更信息以及所述数据表信息,确定目标表格;Based on the change information and the data table information, determine the target table;

根据所述目标表格,确定第一关键词;Determine the first keyword according to the target table;

根据所述质量问题,确定第二关键词;According to the quality problem, determine the second keyword;

若所述第一关键词与所述第二关键词一致,则将与所述第二关键词对应的质量问题输入至预设表格中,确定所述质量主体。If the first keyword is consistent with the second keyword, the quality problem corresponding to the second keyword is input into the preset table to determine the quality subject.

在一种可能的实现方式中,所述质量规则信息确定单元,具体用于:In a possible implementation, the quality rule information determining unit is specifically used to:

基于所述质量问题,确定质量模板;Based on the quality issues, determine the quality template;

根据所述质量主体,确定填充信息;Determine filling information according to the quality subject;

若所述质量模板与所述质量问题匹配成功,则将所述填充信息添加至质量规则阈值中,确定所述质量规则信息。If the quality template successfully matches the quality problem, the filling information is added to the quality rule threshold, and the quality rule information is determined.

在一种可能的实现方式中,所述执行方式确定单元,具体用于:In a possible implementation, the execution mode determination unit is specifically used for:

获取质量任务;Get quality tasks;

基于所述质量任务,确定任务属性信息;Based on the quality task, determine task attribute information;

根据所述任务属性信息以及所述目标质量规则信息,确定所述执行方式。The execution mode is determined based on the task attribute information and the target quality rule information.

在一种可能的实现方式中,所述问题数据确定单元,具体用于:In a possible implementation, the problem data determination unit is specifically used for:

基于所述执行方式,确定质量语句明细;Based on the execution method, determine the quality statement details;

根据所述质量语句明细对所述增量数据信息进行查阅,确定查阅数据;Check the incremental data information according to the quality statement details and determine the consulted data;

若所述查阅数据不存在于预设例外数据库中,则将所述查阅数据输入问题数据库中,确定问题数据。If the query data does not exist in the preset exception database, the query data is input into the problem database to determine the problem data.

在一种可能的实现方式中,所述企业数据质量管理装置,还包括:整改数据获取模块以及验证结果确定模块,其中,In a possible implementation, the enterprise data quality management device further includes: a rectification data acquisition module and a verification result determination module, wherein,

整改数据获取模块,用于获取整改后的整改数据;The rectification data acquisition module is used to obtain the rectification data after rectification;

验证结果确定模块,用于对所述整改数据进行验证,确定验证结果。The verification result determination module is used to verify the rectification data and determine the verification result.

第三方面,本申请提供一种电子设备,采用如下的技术方案:In the third aspect, this application provides an electronic device that adopts the following technical solution:

一种电子设备,该电子设备包括:An electronic device including:

至少一个处理器;at least one processor;

存储器;memory;

至少一个应用程序,其中至少一个应用程序被存储在存储器中并被配置为由至少一个处理器执行,所述至少一个应用程序配置用于:执行上述企业数据质量管理方法。At least one application program, wherein at least one application program is stored in the memory and configured to be executed by at least one processor, the at least one application program is configured to: execute the above enterprise data quality management method.

第四方面,本申请提供一种计算机可读存储介质,采用如下的技术方案:In the fourth aspect, this application provides a computer-readable storage medium, adopting the following technical solution:

一种计算机可读存储介质,包括:存储有能够被处理器加载并执行上述企业数据质量管理方法的计算机程序。A computer-readable storage medium includes: storing a computer program that can be loaded by a processor and execute the above enterprise data quality management method.

综上所述,本申请包括以下有益技术效果:To sum up, this application includes the following beneficial technical effects:

获取到业务数据,从而确定出该建筑类集团型企业具体的企业数据,之后,基于业务数据,确定出原始数据,从而将业务数据中重复或空白的业务数据剔除,以适应后续的数据处理与分析;随即,将原始数据输入贴源层中,识别出增量数据信息,从而确定出具体发生变化的数据以及变化的具体内容;基于增量数据信息,确定出不符合质量标准的数据,即问题数据;一旦发现问题数据,根据问题数据,确定出整改报表,并反馈显示,为相关技术人员提供相应的整改建议;从而提高数据质量管理的准确性。The business data is obtained to determine the specific enterprise data of the construction group enterprise. Then, based on the business data, the original data is determined to eliminate duplicate or blank business data in the business data to adapt to subsequent data processing and Analysis; then, input the original data into the source layer to identify the incremental data information, thereby determining the specific changed data and the specific content of the change; based on the incremental data information, determine the data that does not meet the quality standards, that is, Problem data; once problem data is discovered, a rectification report will be determined based on the problem data, and feedback will be displayed to provide relevant technical personnel with corresponding rectification suggestions; thereby improving the accuracy of data quality management.

附图说明Description of drawings

图1是本申请企业数据质量管理方法的流程示意图;Figure 1 is a schematic flow chart of the enterprise data quality management method of this application;

图2是本申请企业数据质量管理装置的方框示意图;Figure 2 is a block diagram of the enterprise data quality management device of this application;

图3是本申请实施例电子设备的示意图。Figure 3 is a schematic diagram of an electronic device according to an embodiment of the present application.

具体实施方式Detailed ways

以下结合附图1-3对本申请作进一步详细说明。The present application will be further described in detail below in conjunction with Figures 1-3.

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments These are part of the embodiments of this application, but not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

本申请实施例提供了一种企业数据质量管理方法,由电子设备执行,该电子设备可以为服务器,也可以为终端设备,其中,该服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式设备,还可以是提供云计算服务的云服务器。终端设备可以是智能手机、平板电脑、笔记本电脑、台式电脑等,但并不局限于此,该终端设备以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例在此不做限制。The embodiment of the present application provides an enterprise data quality management method, which is executed by an electronic device. The electronic device can be a server or a terminal device. The server can be an independent physical server or multiple physical servers. The server cluster or distributed device formed by it can also be a cloud server that provides cloud computing services. The terminal device can be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto. The terminal device and the server can be connected directly or indirectly through wired or wireless communication methods. The embodiments of the present application are not limited to this. Make restrictions.

参照图1,该方法包括:步骤S101、步骤S102、步骤S103以及步骤S104,其中:Referring to Figure 1, the method includes: step S101, step S102, step S103 and step S104, wherein:

步骤S101、获取业务数据,基于业务数据,确定原始数据。Step S101: Obtain business data and determine original data based on the business data.

在申请实施例中,业务数据包含主数据、生产经营数据以及市场营销数据;其中主数据具体包含项目主数据、管理机构主数据、往来单位主数据、金融机构主数据,生产经营数据具体包含合同信息、项目信息、交工信息、竣工信息,市场营销数据具体包含房地产月产、企业调差表、新签合同表、合同表;原始数据为需要进行数据质量检测的数据。In the application embodiment, the business data includes master data, production and operation data, and marketing data; the master data specifically includes project master data, management agency master data, counterparty unit master data, and financial institution master data, and the production and operation data specifically includes contract Information, project information, delivery information, completion information, and marketing data specifically include real estate monthly production, corporate adjustment tables, newly signed contract tables, and contract tables; the original data is data that needs to be tested for data quality.

具体地,电子设备在接收到由技术人员传输的业务数据后,对数据进行清洗,将无效、重复或空白的数据剔除,随即电子设备将各业务系统对应的业务数据转换为统一的数据格式,并将转换完成后的业务数据存储到数据湖STG(缓冲层)中,电子设备将落入数据湖STG(缓冲层)中的业务数据设定为原始数据。Specifically, after receiving the business data transmitted by technical personnel, the electronic device cleans the data and removes invalid, duplicate or blank data. Then the electronic device converts the business data corresponding to each business system into a unified data format. And store the converted business data in the data lake STG (buffer layer), and the electronic device sets the business data falling into the data lake STG (buffer layer) as original data.

在电子设备确定出原始数据后,由于并不是全部的原始数据都需要进行数据检查处理,于是电子设备会通过编写SQL语句或拖拽的方式生成数据库中建立视图的SQL语句,从而控制数据检查范围。After the electronic device determines the original data, since not all the original data needs to be inspected and processed, the electronic device will generate an SQL statement to create a view in the database by writing an SQL statement or dragging and dropping, thereby controlling the scope of the data inspection. .

步骤S102、将原始数据输入贴源层,确定增量数据信息。Step S102: Input the original data into the source layer to determine the incremental data information.

在申请实施例中,增量数据信息包含增量数据以及每条增量数据条目增加时期信息。In the application embodiment, the incremental data information includes incremental data and increase period information for each incremental data entry.

具体地,电子设备在得到原始数据后,将数据湖STG(缓冲层)中的原始数据通过数据抽取、转换和加载等操作集成到贴源层中;随即,电子设备在贴源层对增量数据进行识别,确定出更新或新的原始数据,并将这些数据设定为增量数据,并确定出每条增量数据条目增加时期信息,例如每条增量数据条目添加时间戳、日期;从而确保数据的准确性和完整性。Specifically, after obtaining the original data, the electronic device integrates the original data in the STG (buffer layer) of the data lake into the source layer through operations such as data extraction, conversion and loading; then, the electronic device increments the data in the source layer. Identify the data, determine the updated or new original data, set these data as incremental data, and determine the period information added to each incremental data entry, such as adding a timestamp and date to each incremental data entry; This ensures data accuracy and completeness.

步骤S103、基于增量数据信息,确定问题数据。Step S103: Determine problem data based on the incremental data information.

在申请实施例中,问题数据为不符合质量标准的数据。In the application embodiment, the problem data is data that does not meet quality standards.

具体地,电子设备对贴源层中的数据进行质量检查,通过对增量数据规则检查,评估数据的完整性和准确性,电子设备通过对增量数据信息进行比较和分析,识别出异常或错误的数据条目;随即,电子设备将异常或错误的数据条目设定为问题数据,并将识别出的问题数据存储至问题数据库中,以便后续的整改和处理。Specifically, the electronic device performs a quality check on the data in the source layer, and evaluates the integrity and accuracy of the data by checking the incremental data rules. The electronic device identifies abnormalities or errors by comparing and analyzing the incremental data information. Wrong data entries; then, the electronic device sets the abnormal or erroneous data entries as problem data, and stores the identified problem data in the problem database for subsequent rectification and processing.

步骤S104、根据问题数据,确定整改报表,并反馈显示。Step S104: Determine the rectification report based on the problem data, and display the feedback.

具体地,电子设备根据识别出的问题数据和数据质量检查的结果,读取问题数据库,展示按不同维度汇总与明细的问题数据;电子设备根据问题数据库历史数据情况,确定出每条问题数据的权重,并根据问题数据库历史数据情况以及每条问题数据的权重进行评分;电子设备根据由技术人员传输至电子设备中业务需求,确定出对应的整改报表,并传输至技术人员的显示设备中,提示技术人员根据整改报表对数据进行整改;在技术人员整改之后,电子设备获取整改后的数据并对整改后的数据进行验证,若数据属于例外数据,则将对应的数据传输至例外数据库中。Specifically, the electronic device reads the problem database based on the identified problem data and the results of the data quality inspection, and displays the problem data summarized and detailed in different dimensions; the electronic device determines the status of each piece of problem data based on the historical data of the problem database. The weight is scored based on the historical data of the problem database and the weight of each problem data; the electronic equipment determines the corresponding rectification report based on the business requirements transmitted to the electronic equipment by the technical personnel, and transmits them to the technical personnel's display equipment. The technician is prompted to rectify the data according to the rectification report; after the technician makes the rectification, the electronic device obtains the rectified data and verifies the rectified data. If the data is exceptional data, the corresponding data is transferred to the exception database.

本申请实施例提供了一种企业数据质量管理方法,获取到业务数据,从而确定出该建筑类集团型企业具体的企业数据,之后,基于业务数据,确定出原始数据,从而将业务数据中重复或空白的业务数据剔除,以适应后续的数据处理与分析;随即,将原始数据输入贴源层中,识别出增量数据信息,从而确定出具体发生变化的数据以及变化的具体内容;基于增量数据信息,确定出不符合质量标准的数据,即问题数据;一旦发现问题数据,根据问题数据,确定出整改报表,并反馈显示,为相关技术人员提供相应的整改建议;从而提高数据质量管理的准确性。The embodiment of the present application provides an enterprise data quality management method to obtain business data, thereby determining the specific enterprise data of the construction group enterprise, and then determine the original data based on the business data, thereby duplicating the business data or blank business data is eliminated to adapt to subsequent data processing and analysis; then, the original data is input into the source layer to identify the incremental data information, thereby determining the specific changed data and the specific content of the change; based on the incremental Quantitative data information is used to determine data that does not meet quality standards, that is, problem data; once problem data is found, a rectification report is determined based on the problem data, and feedback is displayed to provide relevant technical personnel with corresponding rectification suggestions; thereby improving data quality management accuracy.

步骤S103中,基于增量数据信息,确定问题数据,具体包括:基于增量数据信息,确定数据表信息;获取质量问题;根据数据表信息以及质量问题,确定质量主体;基于质量问题以及质量主体,确定质量规则信息;若质量规则信息与预设质量规则阈值一致,则确定质量规则信息为目标质量规则信息;基于目标质量规则信息,确定执行方式;根据执行方式,确定问题数据。In step S103, problem data is determined based on incremental data information, which specifically includes: determining data table information based on incremental data information; obtaining quality issues; determining quality subjects based on data table information and quality issues; based on quality issues and quality subjects , determine the quality rule information; if the quality rule information is consistent with the preset quality rule threshold, determine the quality rule information as the target quality rule information; determine the execution method based on the target quality rule information; determine the problem data based on the execution method.

在申请实施例中,质量问题为根据实际业务梳理的线下文档,主要为业务语言描述要检查的内容;工作组为用户的集合;规则组为SQL语句的集合。In the application embodiment, the quality issues are offline documents compiled based on actual business, mainly describing the content to be checked in business language; the work group is a collection of users; the rule group is a collection of SQL statements.

具体地,电子设备通过对增量数据信息识别,确定出每个增量数据信息所对应的不同数据表,电子设备在获取到由技术人员传输的质量问题后,将处在不同数据表的数据表信息与质量问题进行匹配,将与质量问题匹配成功的数据表信息进行整合与链接,构建出包含所有需要应用或展示全部信息的质量主体;随即,电子设备分别对质量问题以及质量主体进行字段提取,并将提取出的字段进行匹配,确定出相应的质量规则信息。Specifically, the electronic device identifies the different data tables corresponding to each incremental data information by identifying the incremental data information. After the electronic device obtains the quality issues transmitted by the technical personnel, it will locate the data in the different data tables. The table information is matched with the quality issues, and the data table information that successfully matches the quality issues is integrated and linked to construct a quality subject that contains all the information that needs to be applied or displayed; then, the electronic device fields the quality issues and the quality subject respectively. Extract and match the extracted fields to determine the corresponding quality rule information.

进一步地,在确定出质量规则信息后,将质量规则信息与预设质量规则阈值进行对比,若指令规则信息与预设质量规则阈值一致,则说明对应的质量规则信息为需要统一执行的质量规则信息;若质量规则信息与预设质量规则阈值一致,则确定对应的质量规则信息为 目标质量规则信息;随即,电子设备将目标质量规则进行归集整理,确定执行方式;在确定出执行方式后,电子设备根据执行方式,确定出不符合质量标准要求的数据,即问题数据。Further, after the quality rule information is determined, the quality rule information is compared with the preset quality rule threshold. If the instruction rule information is consistent with the preset quality rule threshold, it means that the corresponding quality rule information is a quality rule that needs to be uniformly executed. information; if the quality rule information is consistent with the preset quality rule threshold, the corresponding quality rule information is determined to be the target quality rule information; then, the electronic device collects and organizes the target quality rules and determines the execution method; after determining the execution method , the electronic device determines data that does not meet the quality standard requirements, that is, problem data, based on the execution method.

其中,每个质量规则都有对应的权限,通过工作组、规则组管理控制规则的权限,电子设备通过配置规则组批量管理SQL语句。Among them, each quality rule has corresponding permissions. The permissions of the control rules are managed through work groups and rule groups. Electronic devices manage SQL statements in batches through configuring rule groups.

根据数据表信息以及质量问题,确定质量主体,具体包括:获取业务需求;基于业务需求,确定变更信息;基于变更信息以及数据表信息,确定目标表格;根据目标表格,确定第一关键词;根据质量问题,确定第二关键词;若第一关键词与第二关键词一致,则将与第二关键词对应的质量问题输入至预设表格中,确定质量主体。Determine the quality subject based on the data table information and quality issues, including: obtaining business requirements; determining change information based on business requirements; determining the target table based on the change information and data table information; determining the first keyword based on the target table; For quality issues, determine the second keyword; if the first keyword is consistent with the second keyword, enter the quality issues corresponding to the second keyword into the preset table to determine the quality subject.

在申请实施例中,变更信息为需要增加、删减以及变更的数据表信息。In the application embodiment, the change information is data table information that needs to be added, deleted, and changed.

具体地,电子设备在接收到由技术人员传输的业务需求后,根据业务需求对数据表信息进行检索匹配,确定出与业务需求对应的数据表信息,电子设备将与业务需求对应的数据表信息设定为变更信息;随即,电子设备将变更信息与数据表信息进行整理汇总,得到新的数据表,并将其设定为目标表格;之后,电子设备分别对目标表格以及质量问题进行关键词提取,分别得到第一关键词以及第二关键词;电子设备将第一关键词以及第二关键词进行匹配,若第一关键词与第二关键词匹配成功,则说明该目标表格出现了对应的质量问题,则电子设备将与第二关键词对应的质量问题输入至预设表格中,并存储至指定的主体视图存储库中,确定出质量主体。Specifically, after receiving the business requirements transmitted by the technical personnel, the electronic device retrieves and matches the data table information according to the business requirements, determines the data table information corresponding to the business requirements, and the electronic device retrieves the data table information corresponding to the business requirements. Set as change information; then, the electronic device sorts and summarizes the change information and data table information, obtains a new data table, and sets it as the target table; after that, the electronic device performs keyword analysis on the target table and quality issues respectively. Extract and obtain the first keyword and the second keyword respectively; the electronic device matches the first keyword and the second keyword. If the first keyword and the second keyword match successfully, it means that the corresponding keyword appears in the target table. If there is a quality problem, the electronic device will input the quality problem corresponding to the second keyword into the preset table and store it in the designated subject view storage library to determine the quality subject.

其中,存储至指定的主体视图存储库中的方式为,电子设备在确定质量主体时选择需要存储的数据库存储,包含数据库地址、端口、用户名、密码等信息,随即电子设备确定存储是执行建立的视图的SQL语句,每一个质量主体匹配一个视图,视图的名称即为质量主体的名称。Among them, the method of storing into the designated subject view repository is that the electronic device selects the database storage that needs to be stored when determining the quality subject, including database address, port, user name, password and other information, and then the electronic device determines that the storage is to be established. The SQL statement of the view, each quality subject matches a view, and the name of the view is the name of the quality subject.

基于质量问题以及质量主体,确定质量规则信息,具体包括:基于质量问题,确定质量模板;根据质量主体,确定填充信息;若质量模板与质量问题匹配成功,则将填充信息添加至质量规则阈值中,确定质量规则信息。Determine the quality rule information based on the quality problem and the quality subject, including: determining the quality template based on the quality problem; determining the filling information based on the quality subject; if the quality template matches the quality problem successfully, the filling information is added to the quality rule threshold , determine the quality rule information.

在申请实施例中,填充内容为配置规则名称、规则描述、所属规则组、检查数据源、白名单、质量模板、以及在数据库中选择数据表与字段填充质量模板中所规定的变量。In the application embodiment, the filling content includes configuring rule names, rule descriptions, rule groups to which they belong, checking data sources, whitelists, quality templates, and selecting data tables and fields in the database to fill variables specified in the quality template.

具体地,电子设备通过对质量问题进行识别,选择对应的检查的数据,确定出质量模板并匹配质量模板对应的SQL语句,例如,质量问题为项目表的项目名称不能为空,可以直接确定质量模板为select count(*) FROM ${table_a} WHERE ${column_a} IS NULL;由于在实际数据业务中质量模板并不能百分百使用与其他数据表信息,于是,电子设备将质量模板与预设质量模板进行匹配,确定出可复用的质量模板;电子设备通过对质量主体的识别,确定出填充信息;若质量模板与质量问题匹配成功,电子设备则将填充内容添加至质量规则阈值中,确定出符合该质量主体的质量规则信息。Specifically, the electronic device identifies the quality problem, selects the corresponding inspection data, determines the quality template, and matches the SQL statement corresponding to the quality template. For example, if the quality problem is that the project name in the project table cannot be empty, the quality can be determined directly. The template is select count(*) FROM ${table_a} WHERE ${column_a} IS NULL; since the quality template cannot be used 100% with other data table information in actual data services, the electronic device will combine the quality template with the default The quality template is matched to determine a reusable quality template; the electronic device determines the filling information by identifying the quality subject; if the quality template matches the quality problem successfully, the electronic device adds the filling content to the quality rule threshold. Determine the quality rule information that conforms to the quality subject.

基于目标质量规则信息,确定执行方式,具体包括:获取质量任务;基于质量任务,确定任务属性信息;根据任务属性信息以及目标质量规则信息,确定执行方式。Based on the target quality rule information, the execution method is determined, which specifically includes: obtaining the quality task; based on the quality task, determining the task attribute information; and determining the execution method based on the task attribute information and the target quality rule information.

在申请实施例中,任务属性信息包含任务名称、调度周期、业务标签。In the application embodiment, the task attribute information includes task name, scheduling period, and business label.

具体地,由技术人员在质量任务操作界面中新建质量任务,并将其传输至电子设备中,电子设备在接收到质量任务后,基于质量任务,确定出每个质量任务对应的任务属性信息从而更好的了解任务的要求与目标;随即电子设备根据确定出任务属性信息以及目标质量规则信息,确定与之对应的执行方式,从而确定出符合需求的执行方式。Specifically, the technician creates a new quality task in the quality task operation interface and transmits it to the electronic device. After receiving the quality task, the electronic device determines the task attribute information corresponding to each quality task based on the quality task. Better understand the requirements and goals of the task; then the electronic device determines the corresponding execution method based on the determined task attribute information and target quality rule information, thereby determining the execution method that meets the needs.

根据执行方式,确定问题数据,具体包括:基于执行方式,确定质量语句明细;根据质量语句明细对增量数据信息进行查阅,确定查阅数据;若查阅数据不存在于预设例外数据库中,则将查阅数据输入问题数据库中,确定问题数据。Determine the problem data based on the execution method, including: Determine the quality statement details based on the execution method; Check the incremental data information based on the quality statement details to determine the query data; If the query data does not exist in the preset exception database, then Check the data and enter it into the problem database to determine the problem data.

具体地,电子设备在确定出执行方式后,立即读取配置质量任务平台的任务编码并生成获取该任务质量下所有的目标质量规则信息的质量语句明细指令;电子设备查询或检索,得到对应任务质量下所有的目标质量规则信息的质量语句明细指令,电子设备将得到的质量语句明细存储在ODS层中;随即,电子设备对存储在ODS层中质量语句明细进行查询,将查询处的数据设定为查询数据;电子设备在确定出查询数据后,将查询数据与预设例外数据库中的数据进行比对,若查询数据存不在于预设例外数据库中,则将该查询数据传输至问题数据库中,并将其设定为问题数据;若查询数据存在与预设例外数据库中,则将该查询数据进行删除;电子设备在确定出问题数据后,将确定出的问题数据按质量任务维度放入数据库中不同的数据表信息。Specifically, after determining the execution method, the electronic device immediately reads the task code of the configuration quality task platform and generates a quality statement detailed instruction to obtain all target quality rule information under the task quality; the electronic device queries or retrieves and obtains the corresponding task. The electronic device stores the quality statement details of all target quality rule information under quality in the ODS layer; then, the electronic device queries the quality statement details stored in the ODS layer and sets the data at the query location. It is determined as query data; after the electronic device determines the query data, it compares the query data with the data in the preset exception database. If the query data does not exist in the preset exception database, the query data is transmitted to the problem database. and set it as problem data; if the query data exists in the preset exception database, the query data will be deleted; after the electronic device determines the problem data, it will place the determined problem data according to the quality task dimension. Enter different data table information in the database.

根据问题数据,确定整改报表,并反馈显示,之后还包括:获取整改后的整改数据;对整改数据进行验证,确定验证结果。Based on the problem data, determine the rectification report and provide feedback for display. This also includes: obtaining the rectification data after rectification; verifying the rectification data and determining the verification results.

具体地,当技术人员根据整改报表对数据进行整改后,随即,电子设备对整改后的数据进行提取,并将提取出的数据确定为整改数据;为了确定数据在经过整改后是否符合需求,电子设备将整改数据在出具整改报表数进行验证,若此时整改数据存在预设例外数据库中,则将对应的整改数据传输至例外数据库中。Specifically, when the technical staff rectifies the data according to the rectification report, the electronic equipment extracts the rectified data and determines the extracted data as rectified data; in order to determine whether the data meets the requirements after rectification, the electronic device The device will verify the rectification data before issuing the rectification report. If the rectification data exists in the preset exception database at this time, the corresponding rectification data will be transferred to the exception database.

参照图2,企业数据质量管理20具体可以包括:原始数据确定模块201、增量数据信息确定模块202、问题数据确定模块203以及整改报表确定模块204,其中,Referring to Figure 2, enterprise data quality management 20 may specifically include: original data determination module 201, incremental data information determination module 202, problem data determination module 203, and rectification report determination module 204, wherein,

原始数据确定模块201,用于获取业务数据,基于业务数据,确定原始数据;The original data determination module 201 is used to obtain business data and determine the original data based on the business data;

增量数据信息确定模块202,用于将原始数据输入贴源层,确定增量数据信息;The incremental data information determination module 202 is used to input the original data into the source layer and determine the incremental data information;

问题数据确定模块203,用于基于增量数据信息,确定问题数据,问题数据为不符合质量标准的数据;The problem data determination module 203 is used to determine problem data based on the incremental data information. The problem data is data that does not meet the quality standards;

整改报表确定模块204,用于根据问题数据,确定整改报表,并反馈显示。The rectification report determination module 204 is used to determine the rectification report based on the problem data and provide feedback for display.

本申请实施例的一种可能的实现方式,问题数据确定模块203包括:数据表信息确定单元、质量问题获取单元、质量主体确定单元、质量规则信息确定单元、目标质量规则信息确定单元、执行方式确定单元以及问题数据确定单元,其中,In a possible implementation of the embodiment of this application, the problem data determination module 203 includes: a data table information determination unit, a quality problem acquisition unit, a quality subject determination unit, a quality rule information determination unit, a target quality rule information determination unit, and an execution method. Determine the unit and the problem data determine the unit, where,

数据表信息确定单元,用于基于增量数据信息,确定数据表信息;The data table information determination unit is used to determine the data table information based on incremental data information;

质量问题获取单元,用于获取质量问题;Quality problem acquisition unit, used to obtain quality problems;

质量主体确定单元,用于根据数据表信息以及质量问题,确定质量主体;The quality subject determination unit is used to determine the quality subject based on the data table information and quality issues;

质量规则信息确定单元,用于基于质量问题以及质量主体,确定质量规则信息;The quality rule information determination unit is used to determine quality rule information based on quality issues and quality subjects;

目标质量规则信息确定单元,用于若质量规则信息与预设质量规则阈值一致,则确定质量规则信息为目标质量规则信息;The target quality rule information determination unit is used to determine the quality rule information as the target quality rule information if the quality rule information is consistent with the preset quality rule threshold;

执行方式确定单元,用于基于目标质量规则信息,确定执行方式;The execution mode determination unit is used to determine the execution mode based on the target quality rule information;

问题数据确定单元,用于根据执行方式,确定问题数据。The problem data determination unit is used to determine the problem data according to the execution mode.

本申请实施例的一种可能的实现方式,质量主体确定单元,具体用于:One possible implementation of the embodiment of this application, the quality subject determination unit, is specifically used for:

获取业务需求;Obtain business requirements;

基于业务需求,确定变更信息;Based on business needs, determine change information;

基于变更信息以及数据表信息,确定目标表格;Based on the change information and data table information, determine the target table;

根据目标表格,确定第一关键词;According to the target table, determine the first keyword;

根据质量问题,确定第二关键词;Based on quality issues, determine the second keyword;

若第一关键词与第二关键词一致,则将与第二关键词对应的质量问题输入至预设表格中,确定质量主体。If the first keyword is consistent with the second keyword, the quality issue corresponding to the second keyword is input into the preset table to determine the quality subject.

本申请实施例的一种可能的实现方式,质量规则信息确定单元,具体用于:A possible implementation manner of the embodiment of this application, the quality rule information determination unit, is specifically used for:

基于质量问题,确定质量模板;Based on quality issues, determine the quality template;

根据质量主体,确定填充信息;According to the quality subject, determine the filling information;

若质量模板与质量问题匹配成功,则将填充信息添加至质量规则阈值中,确定质量规则信息。If the quality template matches the quality problem successfully, the filling information is added to the quality rule threshold to determine the quality rule information.

本申请实施例的一种可能的实现方式,执行方式确定单元,具体用于:One possible implementation manner of the embodiment of this application, the execution mode determination unit, is specifically used for:

获取质量任务;Get quality tasks;

基于质量任务,确定任务属性信息;Based on quality tasks, determine task attribute information;

根据任务属性信息以及目标质量规则信息,确定执行方式。Determine the execution method based on task attribute information and target quality rule information.

本申请实施例的一种可能的实现方式,问题数据确定单元,具体用于:One possible implementation manner of the embodiment of this application, the problem data determination unit, is specifically used for:

基于执行方式,确定质量语句明细;Based on the execution method, determine the quality statement details;

根据质量语句明细对增量数据信息进行查阅,确定查阅数据;Check the incremental data information according to the quality statement details and determine the data to be checked;

若查阅数据不存在于预设例外数据库中,则将查阅数据输入问题数据库中,确定问题数据。If the query data does not exist in the preset exception database, the query data will be entered into the problem database to determine the problem data.

本申请实施例的一种可能的实现方式,企业数据质量管理装置20,还包括:整改数据获取模块以及验证结果确定模块,其中,One possible implementation of the embodiment of this application, the enterprise data quality management device 20, also includes: a rectification data acquisition module and a verification result determination module, wherein,

整改数据获取模块,用于获取整改后的整改数据;The rectification data acquisition module is used to obtain the rectification data after rectification;

验证结果确定模块,用于对整改数据进行验证,确定验证结果。The verification result determination module is used to verify the rectification data and determine the verification results.

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

本申请实施例还从实体装置的角度介绍了一种电子设备,如图3所示,图3所示的电子设备30包括:处理器301和存储器303。其中,处理器301和存储器303相连,如通过总线302相连。可选地,电子设备30还可以包括收发器304。需要说明的是,实际应用中收发器304不限于一个,该电子设备30的结构并不构成对本申请实施例的限定。The embodiment of the present application also introduces an electronic device from the perspective of a physical device, as shown in Figure 3. The electronic device 30 shown in Figure 3 includes: a processor 301 and a memory 303. Among them, the processor 301 and the memory 303 are connected, such as through a bus 302. Optionally, electronic device 30 may also include a transceiver 304. It should be noted that in practical applications, the number of transceivers 304 is not limited to one, and the structure of the electronic device 30 does not constitute a limitation on the embodiments of the present application.

处理器301可以是CPU(Central Processing Unit,中央处理器),通用处理器,DSP(Digital Signal Processor,数据信号处理器),ASIC(Application SpecificIntegrated Circuit,专用集成电路),FPGA(Field Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器301也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。The processor 301 may be a CPU (Central Processing Unit, central processing unit), a general-purpose processor, a DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with this disclosure. The processor 301 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, etc.

总线302可包括一通路,在上述组件之间传送信息。总线302可以是PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(ExtendedIndustry Standard Architecture,扩展工业标准结构)总线等。总线302可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Bus 302 may include a path that carries information between the above-mentioned components. The bus 302 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus, etc. The bus 302 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 3, but it does not mean that there is only one bus or one type of bus.

存储器303可以是ROM(Read Only Memory,只读存储器)或可存储静态信息和指令的其他类型的静态存储设备,RAM(Random Access Memory,随机存取存储器)或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM(Electrically ErasableProgrammable Read Only Memory,电可擦可编程只读存储器)、CD-ROM(Compact DiscRead Only Memory,只读光盘)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。The memory 303 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, RAM (Random Access Memory, random access memory) or other types that can store information and instructions. Dynamic storage devices can also be EEPROM (Electrically Erasable Programmable Read Only Memory), CD-ROM (Compact DiscRead Only Memory) or other optical disc storage, optical disc storage (including compressed optical discs, Laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), magnetic disk storage medium or other magnetic storage device, or any other device capable of carrying or storing desired program code in the form of instructions or data structures that can be accessed by a computer medium, but not limited to this.

存储器303用于存储执行本申请方案的应用程序代码,并由处理器301来控制执行。处理器301用于执行存储器303中存储的应用程序代码,以实现前述方法实施例所示的内容。The memory 303 is used to store application program code for executing the solution of the present application, and is controlled by the processor 301 for execution. The processor 301 is used to execute the application program code stored in the memory 303 to implement the contents shown in the foregoing method embodiments.

其中,电子设备包括但不限于:移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。还可以为服务器等。图3示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Among them, electronic devices include but are not limited to: mobile phones, laptop computers, digital broadcast receivers, PDAs (personal digital assistants), PAD (tablet computers), PMP (portable multimedia players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc. mobile terminals such as digital TVs, desktop computers, etc. Also for servers etc. The electronic device shown in FIG. 3 is only an example and should not impose any restrictions on the functions and usage scope of the embodiments of the present application.

应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although various steps in the flowchart of the accompanying drawings are shown in sequence as indicated by arrows, these steps are not necessarily performed in the order indicated by arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least some of the steps in the flow chart of the accompanying drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and their execution order is also It does not necessarily need to be performed sequentially, but may be performed in turn or alternately with other steps or sub-steps of other steps or at least part of the stages.

以上仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above are only some of the embodiments of the present application. It should be pointed out that those of ordinary skill in the art can also make several improvements and modifications without departing from the principles of the present application. These improvements and modifications should also be regarded as This is the protection scope of this application.

Claims (10)

1. A method for enterprise data quality management, comprising:
acquiring service data, and determining original data based on the service data, wherein the service data is data corresponding to enterprise service, and the original data is data needing to be subjected to data quality detection;
inputting the original data into a source pasting layer, and determining incremental data information, wherein the incremental data information is;
based on the incremental data information, determining problem data, wherein the problem data is data which does not accord with a quality standard;
and determining a correction report according to the problem data, and feeding back and displaying the correction report.
2. The method of claim 1, wherein said determining problem data based on said incremental data information comprises:
determining data table information based on the incremental data information;
acquiring quality problems;
determining a quality main body according to the data sheet information and the quality problem;
determining quality rule information based on the quality issue and the quality body;
if the quality rule information is consistent with a preset quality rule threshold, determining the quality rule information as target quality rule information;
determining an execution mode based on the target quality rule information;
and determining the problem data according to the execution mode.
3. The method of claim 2, wherein said determining a quality body based on said data table information and said quality problem comprises:
acquiring service requirements;
determining change information based on the service demand;
determining a target table based on the change information and the data table information;
determining a first keyword according to the target table;
determining a second keyword according to the quality problem;
and if the first keyword is consistent with the second keyword, inputting a quality problem corresponding to the second keyword into a preset table, and determining the quality main body.
4. The method of claim 2, wherein said determining quality rule information based on said quality problem and said quality body comprises:
determining a quality template based on the quality problem;
determining filling information according to the quality main body;
and if the quality template is successfully matched with the quality problem, adding the filling information into a quality rule threshold value, and determining the quality rule information.
5. The method for managing quality of enterprise data according to claim 2, wherein said determining an execution mode based on said target quality rule information comprises:
acquiring a quality task;
determining task attribute information based on the quality task;
and determining the execution mode according to the task attribute information and the target quality rule information.
6. The method for managing quality of enterprise data according to claim 2, wherein said determining said problem data according to said execution mode comprises:
determining quality statement details based on the execution mode;
referring to the incremental data information according to the quality statement details, and determining reference data;
and if the reference data does not exist in the preset exception database, inputting the reference data into a problem database, and determining problem data.
7. The method for managing enterprise data quality as claimed in claim 1, wherein said determining a correction report based on said problem data, and feeding back for display, further comprises:
obtaining rectifying data after rectifying;
and verifying the rectification data, and determining a verification result.
8. An enterprise data quality management apparatus, comprising:
the system comprises an original data determining module, a service data processing module and a data processing module, wherein the original data determining module is used for acquiring service data and determining the original data based on the service data;
the incremental data information determining module is used for inputting the original data into the source pasting layer and determining incremental data information;
the problem data determining module is used for determining problem data based on the incremental data information, wherein the problem data is data which does not accord with the quality standard;
and the correction report determining module is used for determining a correction report according to the problem data and feeding back and displaying the correction report.
9. An electronic device, comprising:
at least one processor;
a memory;
at least one application program, wherein the at least one application program is stored in the memory and configured to be executed by the at least one processor, the at least one application program configured to: an enterprise data quality management method as claimed in any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed in a computer, causes the computer to perform an enterprise data quality management method according to any of claims 1-7.
CN202311594531.3A 2023-11-25 2023-11-25 An enterprise data quality management method, device, electronic equipment and storage medium Pending CN117635371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311594531.3A CN117635371A (en) 2023-11-25 2023-11-25 An enterprise data quality management method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311594531.3A CN117635371A (en) 2023-11-25 2023-11-25 An enterprise data quality management method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117635371A true CN117635371A (en) 2024-03-01

Family

ID=90017460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311594531.3A Pending CN117635371A (en) 2023-11-25 2023-11-25 An enterprise data quality management method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117635371A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704590A (en) * 2017-09-30 2018-02-16 深圳市华傲数据技术有限公司 A kind of data processing method and system based on data warehouse
CN110472109A (en) * 2019-07-30 2019-11-19 深圳中科保泰科技有限公司 Mobilism Data Quality Analysis method and plateform system
CN112181967A (en) * 2020-09-29 2021-01-05 中国平安人寿保险股份有限公司 Method and device for monitoring source data quality, computer equipment and medium
CN113282588A (en) * 2021-06-11 2021-08-20 亿景智联(北京)科技有限公司 Method and device for evaluating quality of spatio-temporal data
CN115129716A (en) * 2022-06-27 2022-09-30 浪潮工业互联网股份有限公司 A data management method, device and storage medium for industrial big data
CN115543981A (en) * 2022-10-12 2022-12-30 浪潮软件股份有限公司 Data quality detection method and device, medium, equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704590A (en) * 2017-09-30 2018-02-16 深圳市华傲数据技术有限公司 A kind of data processing method and system based on data warehouse
CN110472109A (en) * 2019-07-30 2019-11-19 深圳中科保泰科技有限公司 Mobilism Data Quality Analysis method and plateform system
CN112181967A (en) * 2020-09-29 2021-01-05 中国平安人寿保险股份有限公司 Method and device for monitoring source data quality, computer equipment and medium
CN113282588A (en) * 2021-06-11 2021-08-20 亿景智联(北京)科技有限公司 Method and device for evaluating quality of spatio-temporal data
CN115129716A (en) * 2022-06-27 2022-09-30 浪潮工业互联网股份有限公司 A data management method, device and storage medium for industrial big data
CN115543981A (en) * 2022-10-12 2022-12-30 浪潮软件股份有限公司 Data quality detection method and device, medium, equipment

Similar Documents

Publication Publication Date Title
CN109522746B (en) A data processing method, electronic device and computer storage medium
US7788213B2 (en) System and method for a multiple disciplinary normalization of source for metadata integration with ETL processing layer of complex data across multiple claim engine sources in support of the creation of universal/enterprise healthcare claims record
CN111383101A (en) Post-loan risk monitoring method, device, equipment and computer-readable storage medium
US8060532B2 (en) Determining suitability of entity to provide products or services based on factors of acquisition context
US20080306984A1 (en) System and method for semantic normalization of source for metadata integration with etl processing layer of complex data across multiple data sources particularly for clinical research and applicable to other domains
US20230169065A1 (en) System for uploading information into a metadata repository
US9292486B2 (en) Validation of formulas with external sources
CN110162516A (en) A kind of method and system that the data based on mass data processing are administered
CN113722370A (en) Data management method, device, equipment and medium based on index analysis
CN112162922A (en) Method, device, server and storage medium for determining difference of new and old systems
CN114356928A (en) Risk analysis method and device, electronic equipment and storage medium
CN116594683A (en) Code annotation information generation method, device, equipment and storage medium
US20230185549A1 (en) Automatic Workflow Generation
CN109947797B (en) Data inspection device and method
CN114185791B (en) A data mapping file testing method, device, equipment and storage medium
CN111582754A (en) Risk checking method, device and equipment and computer readable storage medium
CN115136129A (en) Digital inspection method and device applied to clinical test and related equipment
CN112990741B (en) A workload assessment method, device, equipment and storage medium
CN114218925B (en) Data processing method, device, equipment, medium and program product
CN117635371A (en) An enterprise data quality management method, device, electronic equipment and storage medium
CN109697141B (en) Method and device for visual testing
CN107273293B (en) Big data system performance test method and device and electronic equipment
CN114925050A (en) Data verification method, device, electronic device and storage medium based on knowledge base
CN115809228A (en) Data comparison method and device, storage medium and electronic equipment
CN112488862A (en) Error reporting information processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20240301

RJ01 Rejection of invention patent application after publication