CN117033356A

CN117033356A - Real-time data quality optimization method for power enterprise production

Info

Publication number: CN117033356A
Application number: CN202310917437.0A
Authority: CN
Inventors: 李欢欢; 曹光; 崔钰杰
Original assignee: Everbright Envirotech China Ltd; Everbright Environmental Protection Research Institute Nanjing Co Ltd; Everbright Environmental Protection Technology Research Institute Shenzhen Co Ltd
Current assignee: Everbright Envirotech China Ltd; Everbright Environmental Protection Research Institute Nanjing Co Ltd; Everbright Environmental Protection Technology Research Institute Shenzhen Co Ltd
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-11-10

Abstract

The invention discloses a real-time data quality optimization method for power enterprise production, which comprises classifying and grading production data of a power enterprise, constructing a basic data quality model for evaluating the quality of the production data, respectively collecting historical data and real-time data of automatic data collection and manual data filling, cleaning, screening and correlation analysis are carried out on the collected production data based on the basic data quality model, and constructing a data optimization model for carrying out quality optimization and verification on the production data; the method realizes on-line data quality monitoring, analysis and optimization of important on-site automatic acquisition data of the power enterprise, improves the core data quality of the power enterprise, and ensures the effectiveness of normal development of production operation management of the enterprise.

Description

Real-time data quality optimization method for power enterprise production

Technical Field

The invention relates to the field of online data quality monitoring, in particular to a real-time data quality optimization method for power enterprise production.

Background

The real-time data of the current power enterprise in the production operation process is mainly acquired and obtained by various sensors on the production site, and is further accessed into various informationized systems and large data platforms of the enterprise by various data acquisition instruments, various on-site automatic control systems (such as DCS), information reporting modules and the like. Over time, various sensors and various automation/informatization systems on site may be aged or failed, the precision of the equipment is insufficient, and the reliability is reduced, so that the data quality problems such as deviation, distortion, instability, even missing mining, large drift and the like of certain degree of data are caused, and the effectiveness and accuracy of enterprise production operation process management and control are seriously affected and interfered. Therefore, continuous automatic online monitoring, analysis and calculation optimization are required to be carried out on the data quality of the real-time data produced by the power enterprise, so that enterprise operation management personnel are pushed to timely handle the problem of data abnormality, the data quality is continuously improved and promoted, and the accuracy and reliability of management and control of the production operation process of the enterprise are improved.

In the existing real-time data quality monitoring method, the data quality of the collected real-time data of the power enterprise is automatically judged, and alarming reminding and prompting related personnel to correct abnormal data quality data are carried out; the existing method only considers the data timeliness (untimely refreshing) and the limited accuracy (conventional normal numerical value interval judgment) to judge the data quality, is only suitable for data quality check under the condition of stable working conditions, and has low accuracy for judging the data quality under the condition of variable working conditions and complex conditions.

The method adopts a mode of combining basic data quality model judgment and modeling and data optimization based on a neural network, comprehensively monitors and optimizes real-time data quality step by step from four aspects of data quality integrity, normalization, timeliness and accuracy, and achieves important and dead-angle-free full coverage in the data quality improvement process. The method can be used for carrying out basic quality analysis and judgment on general data, can also be used for carrying out calculation and optimization on core important correlation data at a higher level, and can be used for accurately and carefully judging and improving the data quality.

Disclosure of Invention

The purpose of the invention is that: the invention designs a real-time data quality optimization method for power enterprise production, which comprises the processes of basic preparation (including data classification and basic data quality model design), data acquisition, data screening and cleaning (including historical data screening and cleaning and target data basic data quality judgment and data marking), data correlation analysis, data tuning model construction/verification based on a historical data correction self-learning neural network (Modified Input neural network, MINN), data tuning based on a MINN model, result verification and data correction and the like, and realizes the quality improvement of the real-time data for power enterprise production by the collaborative promotion in three directions of basic data quality model judgment, machine learning model data tuning and manual judgment correction, thereby comprehensively improving the reliability and reliability of core real-time data and realizing the comprehensive improvement of data quality.

The invention designs a real-time data quality optimization method for power enterprise production, which comprises the following steps of S1-S8, and the checksum quality optimization of the power enterprise production data is completed:

step S1: classifying and grading production data of various types of power enterprises respectively, wherein the classifying comprises classifying the production data of various types into automatic acquisition data and manual reporting data, and the grading comprises grading the importance of the production data of various types according to the influence range, the direct reference times and the importance coefficient of the production data of various types;

step S2: constructing a basic data quality model for evaluating the quality of the production data, wherein the basic data quality model defines the integrity score, the normalization score, the timeliness score and the accuracy score of the production data of various types, and respectively calculating the total data quality score of the production data of various types according to the four scores;

step S3: according to importance classification of various production data, respectively acquiring automatic acquisition data of preset level, historical data and real-time data of manual filling data, storing the historical data into a temporary historical data table of a real-time database, and storing the real-time data into the real-time database;

step S4: cleaning and screening historical data, carrying out quality judgment on real-time data according to a basic data quality model, judging that the real-time data is abnormal data or normal data according to a calculation result of the basic data quality model, and recording the abnormal data into a basic data quality abnormal data detail table;

performing time mark alignment operation on the cleaned and screened historical data and the real-time data which is judged to be normal data;

step S5: performing correlation analysis by adopting a principal component analysis method according to the historical data and the real-time data obtained in the step S4 respectively, and eliminating production data which have no correlation with other production data;

step S6: constructing a data tuning model based on a correction self-learning neural network, correcting the training sample by taking the historical data obtained in the step S4 as an input training sample, and carrying out iterative training on the data tuning model by taking the target value and the adjustment quantity of the training sample as output to obtain a trained data tuning model;

step S7: inputting the real-time data obtained in the step S4 into a trained data tuning model, tuning the real-time data based on the data tuning model, taking the output of the data tuning model as tuning data, and judging the quality of each tuning data;

step S8: after the calculation and quality judgment of a batch of tuning data are completed, the relative error percentages of the tuning data and the original acquired data of all production data are calculated, and the calibration and quality optimization of the production data of the power enterprise are completed by descending order and recording.

The beneficial effects are that: aiming at the problem that the real-time data quality is reduced due to various data acquisition and transmission equipment, system aging and other reasons of various power enterprises at present, various power enterprises generally adopt a mode based on simple rule judgment to check and correct the data quality, for example, judge the data accuracy based on simple threshold range rules, judge the validity of specific data based on certain special specifications and the like, and the method has limited application range and low accuracy and cannot meet the requirements of efficiently carrying out data quality check and optimization on massive real-time data.

The invention designs a real-time data quality optimization method for power enterprise production, which realizes comprehensive monitoring and optimizing real-time data quality by steps from four aspects of data quality integrity, normalization, timeliness and accuracy through data classification, constructing a basic data quality rule model and a data optimization model based on a correction self-learning neural network, realizes data quality verification and online correction of important real-time data for production, reminds operation managers of positioning and analyzing problem data and possible reasons thereof according to the result, processes data abnormality problems through a proper method, thereby realizing comprehensive improvement of data quality and guaranteeing the effectiveness of normal development of production operation management of enterprises.

Drawings

FIG. 1 is a flow chart of a method for optimizing quality of real-time data produced by an electrical power enterprise according to an embodiment of the present invention;

FIG. 2 is a metadata analysis schematic of index X provided according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a MINN network structure according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

The current various power enterprises mostly realize real-time acquisition and online monitoring of production real-time data, and the production operation management system is online to assist in improving the operation management level of the enterprises. However, as time goes by, due to aging of field devices and degradation of reliability of the system, the situation that the quality of the collected real-time data of various informatization systems and large data platforms of the enterprise is degraded, such as integrity, normalization, timeliness, accuracy and the like, is caused, the accuracy of daily management of the enterprise is affected, the failure of a production field control system is possibly caused, and even various production operation safety accidents are caused, such as shutdown and production loss are caused, so that the normal operation of the enterprise is obviously affected. Most of power enterprises at present monitor and manage the data quality of the collected real-time data in a mode of judging based on a simple threshold range rule, so that the data quality monitoring and management method is limited in application range, large in limitation and low in accuracy, and cannot meet the requirement of efficiently developing data quality optimization.

The invention aims to design a real-time data quality optimization method for power enterprise production, which realizes data quality verification and data online correction of important real-time production data by data classification, constructing a basic data quality rule model and a data optimization model based on a correction self-learning neural network, reminds operation management personnel to locate and analyze problem data and possible reasons thereof according to the result, and processes data abnormality problems by a proper method, thereby realizing comprehensive improvement of data quality and guaranteeing the effectiveness of normal development of production operation management of enterprises.

Referring to fig. 1, the method for optimizing the quality of real-time data produced by an electric power enterprise provided by the embodiment of the invention executes the following steps S1-S8 to finish the checksum quality optimization of the production data of the electric power enterprise:

firstly, determining an object range of data quality optimization, and selecting part of important production real-time data from an enterprise mass data/index library to perform data quality optimization. The data objects with high priority can be screened out through data classification and data grading, and the specific process is as follows:

(1) Data classification

The invention can improve the quality of the developed data, and the object data can be divided into two categories, namely automatic data collection and manual data filling. The definition is as follows:

automatically collecting data: the data are collected at different time and frequency levels such as a second level, a minute level, an hour level and the like, and are mainly obtained through automatic collection of various sensors, such as various voltage, current, power, real-time generated energy, pressure, temperature, flow, real-time calculation efficiency and the like.

Manually reporting data: the part of data which cannot be obtained through automatic acquisition is supplemented through a manual recording mode, and the part of data which is recorded regularly (such as fuel calorific value, liquid level which cannot be acquired and the like) and the part of data which is recorded initially (such as design values of various performance parameters of equipment or a system and the like) are contained.

(1) Data ranking

Each business domain (such as finance, operation, personnel and the like) of the power enterprise is based on the data influence range N _all Number of direct references N _y And the importance coefficient alpha realizes the grading of the two types of data, and respectively calculates the grade index S and the data grade of each data of the two types of data (the description is assumed to be divided into three grades);

a level index S for characterizing the importance level of various kinds of production data is defined as follows:

S＝k ₁ *N _all +k ₂ *N _y +k ₃ *α

wherein N is _all Indicating the range of influence of the production data, N _y Represents the number of direct references, alpha represents the importance coefficient, k ₁ 、k ₂ 、k ₃ The method is characterized in that the method comprises the steps of respectively defining a value range of 0-1 for the influence range of production data, the number of direct references and the influence factor of an importance coefficient;

the two types of production data of automatic acquisition data and manual filling data are respectively arranged in descending order of the level index S of each type of production data to obtain an automatic acquisition data sequence { S1 } _i I=1, 2, …, m } and manually-filled data { S2 } _i I=1, 2, …, n }, where m is the number of auto-acquisitionsAccording to the type number of the production data in the sequence, n is the type number of the production data in the manual filling data sequence; each is subjected to trisection division to obtain each equimolecular sequence { S1 } _j1 }、{S1 _j2 }、{S1 _j3 Sum { S2 } _j1 }、{S2 _j2 }、{S2 _j3 The importance levels of the two stages are respectively a first stage, a second stage and a third stage, wherein the first stage is the highest level, and the third stage is the lowest level. After the data classification and classification work is completed, the priority online data quality optimization work can be performed on the high-level data (first level) according to the needs.

Above N _all 、N _y And alpha is calculated as follows:

based on three aspects of index classification, natural attribute and management attribute of production data, an enterprise index dictionary is formulated, and the influence range N of all indexes is calculated by using a metadata analysis module or manually according to the index dictionary _all And number of direct references N _y The method comprises the steps of carrying out a first treatment on the surface of the And secondly, carrying out coefficient assignment alpha on the service importance degree of all the data according to the practical application situation of each service domain of the enterprise.

(1) Index dictionary design

The enterprise index dictionary definition table is designed as follows. The table is used for combining with a company index system to obtain a detailed index dictionary detail table in each service field.

(2) Metadata analysis

Performing influence analysis and association analysis on each data, and calculating the number N of direct influence data of all downstream levels _zj And indirectly influence the number of data N _jj Further calculate the influence range N of the production data _all And number of direct references N _y The specific formula is as follows:

N _all ＝N _zj +N _jj

N _y ＝N _jj

wherein the direct influence is all influence data of a first level downstream of the current production data, and the indirect influence is all influence data of a second level downstream of the current production data and all other levels thereof, and the schematic diagram refers to fig. 2, in fig. 2, N of the index X _zj ＝3，N _jj =7, thus N _all ＝10，N _y ＝3。

(3) Importance coefficient

And the business domains assign importance coefficients alpha to all indexes in the self range according to the business importance degree, so that the important financial indexes and business indexes affecting or reflecting the enterprise experience condition are higher in general, and the assignment is lower in the contrary. The enterprise operation assessment index model can be generally used as a reference basis for importance coefficient alpha assignment.

in step S2, basic data quality judgment rules are preset, and the total data quality score F of various production data is calculated _{Total (S)} The formula is as follows:

F _{total (S)} ＝μ ₁ *F _w +μ ₂ *F _g +μ ₃ *F _j +μ ₄ *F _z ＝μ ₁ *F _w +μ ₂ *∑(F _gx *k _x )+μ ₃ *F _j +μ ₄ *F _z

Wherein F is _w For the integrity score, F _g For normalization score, F _j For timeliness score, F _z For accuracy score, F _gx Score for each subdivision canonical type, k _x Mu for each subdivision canonical type weight ₁ ～μ ₄ Score coefficients, μ for 4 dimensions, respectively ₁ +μ ₂ +μ ₃ +μ ₄ ＝1，μ ₁ ～μ ₄ The general value is 0.25, and can be adjusted by self-definition. The 4-dimensional score is calculated as follows:

F＝100-Δ*N _x

wherein, delta is the reduction of the basic data quality judgment rule according to the preset basic data quality judgment rule, N _x The number of times the basic data quality decision rule is violated.

The basic data quality judgment rule is specifically as follows, and if the basic data quality judgment rule does not accord with the judgment rule, the judgment is correspondingly carried out:

1) Data integrity rules: whether it can be null.

2) The data normalization rules may be further refined into the following subdivision types:

data type: numerical value or character or date type etc

Units: data standard unit of measure

Whether or not it can be negative: yes/no

Whether or not it can be 0: yes/no

Decimal place number: non-negative integer

Others

3) Data timeliness rules: data update time.

4) Data accuracy rules: the value range rule, namely, setting the upper limit and the lower limit of the data.

and selecting production data with high importance levels of automatically acquired data and manually filled data, starting a data acquisition task, wherein the data acquisition comprises historical data acquisition and real-time data acquisition.

(1) Historical data collection

The time frame may be selected to be within the past year based on historical data in the universal OPC interface or other interface collection timing database, and the data collection time interval may be set to a level of per second, per minute, or otherwise as desired. The historical data collection period can be set to be executed once a week or a month, the historical data collected at this time is stored in a temporary historical data table of the real-time database after each execution, and the historical data collected at last time is covered.

The collected historical data is mainly used for the following step S6: the construction/verification work of the correction self-learning neural network data tuning model based on the historical data is that the step S7: and providing a model basis based on data tuning of the modified self-learning neural network model.

(2) Real-time data acquisition

Based on the general OPC interface or other interfaces, the real-time data is collected from the data source automation system and stored in the real-time database in real time, the collected real-time data is used for further data quality judgment and data tuning work, the data can be stored for three months according to actual conditions generally, the storage time is longer than the data quality judgment and data tuning period, and the data is automatically deleted from the real-time database after the data exceeds the period.

(1) Historical data screening and cleaning

In the step S3, a batch of historical data is collected every preset collection period, in the step S4, each batch of historical data is cleaned and screened according to the following rules, and if one of the following rules is not satisfied, all the historical data collected in the batch are discarded:

1) The operation condition of the production system is in a steady state or a quasi-steady state;

steady state: the change rate of all core operation parameters of the system is smaller than the respective reference change rate (such as the change rate of the main steam temperature is smaller than 0.4%/second) and lasts for a certain period of time (such as 1 minute) or longer;

quasi-steady state: the change rate of all core operation parameters of the system is in the respective reference change rate interval (for example, the change rate of the main steam temperature is less than 1%/second and is more than or equal to 0.4%/second) and lasts for a certain period of time (for example, 1 minute) or more;

2) All the historical data do not violate the basic data quality judgment rule;

3) All the history data interrelated logic is reasonable and has no abnormality;

the qualified historical data meeting all the conditions is further transferred to a real-time database historical data table.

(2) Target real-time data basic data quality judgment and data marking

And checking the real-time data by using a basic data quality judging rule, judging the real-time data which does not meet the basic data quality rule as abnormal data, recording a basic data quality abnormal data detail table, and informing related information to related personnel in a mode of mail or short message and the like. The data satisfying all the basic data quality rules is normal data. The basic data quality abnormal data detail table comprises information such as data name, data type, data level, data acquisition time, data dimensions (such as service range, date, type and the like), data quality total score, and data quality element item score.

(3) Time scale alignment

Performing time mark alignment operation on the cleaned and screened historical data and the real-time data which are judged to be normal data, and performing time mark alignment according to the time stamp of each production data and the time of an integer minute to obtain the data values of all production data with fixed interval time (for example, 1 min), wherein the time mark alignment calculation method adopts an equal-proportion linear estimation method and is described as follows:

set non-standard time t ₁ 、t ₂ Is x ₁ 、x ₂ Then for the standard time t ₀ (t ₁ <t ₀ <t ₂ ) Standard data x obtained by calculation ₀ The method comprises the following steps:

in step S5, the historical data and the real-time data obtained in step S4 are respectively standardized, and the covariance between each production data of the historical data and the real-time data is calculated by adopting a PCA principal component analysis method, and the covariance between each production data and the other production data is removed to be 0 production data.

And (3) performing correlation analysis on the historical data subjected to screening and cleaning in the step (S4) to provide a training data set for constructing a modified self-learning neural network data tuning model. And performing the next correlation analysis on the target data judged to be normal in the steps, and then performing data tuning by using the created modified self-learning neural network data tuning model.

The invention adopts a principal component analysis (Principal Component Analysis, PCA) to realize the correlation analysis of target data, and converts a plurality of variable data with higher correlation into independent new variable data (namely, principal components).

1) Normalization process

Mapping the original acquired data to the [ -1,1] interval, wherein the standardized transformation formula is as follows:

X＝2(X’-X’ _min )/(X’ _max -X’ _min )-1

wherein X 'is the acquired measurement value of the original variable data, X' _min And X' _max Is the upper and lower limit of the original variable data X'.

2) PCA calculation

Assume that m times of valued data of original n target data variables are normalized and expressed as the following matrix X:

the covariance matrix is Y:

where cov (Xi, xj) represents the covariance of the data in the ith and jth columns in X.

Carrying out characteristic decomposition on Y to obtain Y=U ∈U ^T

Wherein: characteristic value lambda with lambda as Y _i Composed diagonal array

Cumulative contribution rate R of the first h variable data _h The method comprises the following steps:

R _h ＝(λ ₁ +λ ₂ +...+λ _h )/(λ ₁ +λ ₂ +...+λ _n )

according to the accumulated contribution rate threshold R ₀ (preferably 0.7 to 0.9)<R _h The principal component number h value can be determined. In addition, a data variable with covariance approaching 0 of all variables is screened out for Y through a covariance matrix, the data variable has no correlation with other data, and the data variable can be removed from the data variable without participating in subsequent data tuning model training and tuning steps, so that the final data variable number n' is obtained.

Step S6: constructing a data tuning model based on a modified self-learning neural network (MINN), correcting the training sample by taking the historical data obtained in the step S4 as an input training sample, and iteratively training the data tuning model by taking a target value and an adjustment amount of the training sample as output to obtain a trained data tuning model;

the prototype of the MINN is an input self-learning neural network (Input Neural Network, INN), which has better applicability in nonlinear system data reduction and data error detection. The MINN disclosed by the invention is formed by carrying out local correction on the MINN, so that the stability and convergence of a network can be improved to a certain extent, and the accuracy of model application is further improved.

Performing MINN training modeling by using the well-selected and cleaned historical data according to the steps, wherein the MINN consists of three layers of nodes of an input layer, an intermediate hidden layer and an output layer, whereinThe input layer corresponds to the main component of the collected data variable, so that the node number N thereof _in Less than the output layer N _out . The input layer and the output layer adopt linear activation functions, the middle layer adopts nonlinear activation function logic functions, and the MINN network structure schematic diagram is shown in figure 3.

(1) Determining network structure

The method comprises the steps of inputting node number, middle hidden layer node number and outputting node number:

number of input data nodes N _in =h, which is the number of main components of all the measurement point data;

number of output nodes N _out =n', i.e. the number of final acquired data variables;

number of intermediate hidden layer nodesRounding;

(2) Initializing network parameters of MINN

Namely, inputting a value matrix and a network weight matrix, and initializing random value ranges of various data to be [ -1,1].

(3) Network training

Let the history data to be trained which is screened, cleaned and saved share N _X The training objective function of the original INN for each batch is:

wherein X 'is' _ij And X _ij The ith batch of jth network output values and training sample measurement values, respectively.

The training objective of the above formula is to minimize the sum of the differences between the output target values and the actual values of all batches and measurement point data, but because the training process does not limit the input values of the network, the input value range is possibly abnormally large, and the stability of the network is affected. The invention carries out certain correction on the training data, adds a limiting item of the input parameters in the objective function, so that the training value of the network input parameters is in a reasonable range, the stability of the network is improved to a certain extent, and in the iterative training process aiming at the data tuning model in the step S6, the training objective function is as follows:

wherein A is _ij’ Inputting a value for the j' network node of the ith batch, wherein mu is an input parameter weight coefficient, and p ₁ 、p ₂ And p ₃ Three different limiting parameters of the input value for the data tuning model. The values of the model are related to the model structure and the number of training samples, different data can be taken for carrying out model training for multiple times, comparing convergence results, and the most suitable value is selected according to the actual model training effect, so that the model can be ensured to be converged to a given range well and quickly.

And (3) performing iterative training on the network by adopting a conventional BP algorithm (namely an error back propagation gradient descent method), calculating to obtain a network output estimated value and a network input value adjustment quantity of each group of sample data, and then calculating to obtain a network weight adjustment quantity and updating. And carrying out iterative computation on the computing process for a plurality of times until the objective function converges to a given range, finally determining a final network weight, and then determining a data tuning model.

The model can be regularly trained and updated according to the collected latest historical data so as to be continuously perfected and optimized, and version control of the model is realized.

unlike conventional BP network, the MINN model invoking calculation process does not obtain output once according to input, but needs to perform iterative calculation for many times according to the verification objective function minimization target in the similar network training process, so as to confirm reasonable network input values, and compared with the network training process, the calculation of network weights is reduced. In the step S7, in the process of tuning real-time data based on the data tuning model, a tuning objective function is as follows:

and (3) marking the acquired data in the step (S4) as normal, selecting target object data based on self-learning neural network tuning, and inputting the target object data into a trained MINN data tuning model for calculation to obtain final tuning data. The calculation mode can be set into two modes of real-time calculation or timing batch calculation, if the number of data measuring points is small (for example, less than 10), a real-time calculation mode can be adopted, and if the number of data measuring points is large (for example, more than 10), the system operation efficiency is considered, and the mode is recommended to be set into the timing batch calculation mode.

Let the measurement value set of a batch of collected real-time data be { X } _i I=1, 2, &..n }, and the output value set obtained by calculation of the data tuning model is { X } _i ' i=1, 2, n }, if for any i, |x _i ’-X _i |≤Ω _i ，Ω _i And if the ith data variable of the model training result corresponds to a residual threshold value under the specified confidence limit, the tuning data obtained by the calculation is qualified, otherwise, the output target data item which does not meet the condition in the tuning target function is required to be recalculated after being corrected, until all measured value calculation results meet the condition, and the model data value can be regarded as the final model tuning data value of each data.

And finally, carrying out inverse standardization calculation on the obtained model tuning value to recover the obtained real tuning value of the data with dimension, and finally, counting the final result into a tuning calculation data table of the target data.

After the MINN model data tuning calculation is completed for a certain time, the relative error percentages of the tuning values and the original acquisition values of all data of all acquisition batches are further calculated, and the data are arranged and recorded in descending order. And alarming and pushing the data of the first 10% -20% with the largest variation of the MINN model tuning and correcting to related operation management personnel for reminding and informing, and receiving or ignoring the corrected value according to the manual confirmation result, wherein the ignored data tuning result is invalid, and the ignored data tuning result is not recorded and counted as abnormal data.

The system carries out comprehensive recording on effective data with obvious change of MINN data tuning calculation results and effective data with abnormal judgment based on basic data quality to form an abnormal data information detail table, wherein each field is defined as follows:

field name	Data type	Description of the invention
			id	string	Data numbering
name	string	Data name
			data_type	string	Data type
level	string	Data level
			model_cal	string	Whether to participate in data tuning: yes/no
time	datetime	Data acquisition time
			col_type	string	The data acquisition mode is as follows: automatic/manual
ori_value	float	Collecting data values
			cal_value	float	Model tuning value
deviation	float	Tuning and optimizing the calculation of the relative deviation: % of (B)
			source_name	string	Data Source name
remark	string	Remarks

And positioning and confirming the data sources of various abnormal data, reminding workers to perform accurate fixed-point inspection, eliminating hidden danger of the data quality of the data sources in modes of system software and hardware upgrading, communication link upgrading or sensor replacement and the like, and continuously improving the data quality.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims

1. The real-time data quality optimization method for power enterprise production is characterized in that the following steps S1-S8 are executed to finish checksum quality optimization for power enterprise production data:

2. The method for optimizing the quality of real-time data in power enterprise production according to claim 1, wherein the method for classifying the importance of each kind of production data in step S1 is as follows:

S＝k ₁ *N _all +k ₂ *N _y +k ₃ *α

wherein N is _all Indicating the range of influence of the production data, N _y Represents the number of direct references, alpha represents the importance coefficient, k ₁ 、k ₂ 、k ₃ The influence factors of the influence range, the direct reference times and the importance coefficient of the production data are respectively;

determining based on three aspects of index classification, natural attribute and management attribute of production dataDirect influence and indirect influence relation between production data, and calculating influence range N of the production data _all And number of direct references N _y The specific formula is as follows:

N _aH ＝N _zj +N _jj

N _y ＝N _jj

wherein N is _zj Representing the number of direct influencing data of the production data, N _jj The number of indirect influence data representing production data;

the two types of production data of automatic acquisition data and manual filling data are respectively arranged in descending order of the level index S of each type of production data to obtain an automatic acquisition data sequence { S1 } _i I=1, 2, …, m } and manually-filled data { S2 } _i I=1, 2, …, n }, where m is the number of types of production data in the automatically acquired data sequence and n is the number of types of production data in the manually filled data sequence; each is subjected to trisection division to obtain each equimolecular sequence { S1 } _j1 }、{S1 _j2 }、{S1 _j3 Sum { S2 } _j1 }、{S2 _j2 }、{S2 _j3 The importance levels of the two stages are respectively a first stage, a second stage and a third stage, wherein the first stage is the highest level, and the third stage is the lowest level.

3. The method for optimizing real-time data quality in power enterprise production according to claim 1, wherein in step S2, basic data quality determination rules are preset, and a total data quality score F of each kind of production data is calculated _{Total (S)} The formula is as follows:

F _{total (S)} ＝μ ₁ *F _w +μ ₂ *F _g +μ ₃ *Fj+μ ₄ *F _z ＝μ ₁ *F _w +μ ₂ *∑(F _gx *k _x )+μ ₃ *Fj+μ ₄ *F _z

Wherein F is _w For the integrity score, F _g For normalization score, F _j For timeliness score, F _z For accuracy score, F _gx Score for each subdivision canonical type, k _x For each thinNormalized type weight, μ ₁ ～μ ₄ The 4 dimension scores are calculated as follows:

F＝100-Δ*N _x

4. The method for optimizing the quality of real-time data in power enterprise production according to claim 3, wherein in step S3, a batch of historical data is collected every preset collection period, and in step S4, each batch of historical data is cleaned and screened according to the following rules, and if one of the following rules is not satisfied, all the historical data collected in the batch are discarded:

the operation condition of the production system is in a steady state or a quasi-steady state; all the historical data do not violate the basic data quality judgment rule; all the history data interrelated logic is reasonable and has no abnormality;

the qualified historical data meeting all the conditions are further transferred to a real-time database historical data table;

checking real-time data by using a basic data quality judgment rule, judging real-time data which does not meet the basic data quality rule as abnormal data, and giving a total score F of the data quality _{Total (S)} Recording the 4 dimension scores into a basic data quality abnormal data detail table, wherein real-time data meeting all basic data quality rules is normal data;

and performing time mark alignment operation on the washed and screened historical data and the real-time data which are judged to be normal data, and performing time mark alignment according to the time stamps of all production data and the time of integer minutes to obtain the data values of all the production data with fixed interval time.

5. The method for optimizing the quality of real-time data produced by an electric power enterprise according to claim 1, wherein in step S5, the historical data and the real-time data obtained in step S4 are respectively standardized, and the covariance between each production data of the historical data and the real-time data is calculated by adopting a principal component analysis PCA, and the production data with covariance of each other production data being 0 is removed.

6. The method for optimizing the quality of real-time data produced by an electric power enterprise according to claim 1, wherein in the iterative training process for the data optimization model in step S6, a training objective function thereof is as follows:

wherein A is _ij’ Inputting a value for the j' network node of the ith batch, wherein mu is an input parameter weight coefficient, and p ₁ 、p ₂ And p ₃ Three different limiting parameters of the input value for the data tuning model.

7. The method for optimizing the quality of real-time data produced by an electric power enterprise according to claim 6, wherein in the step S7, in the process of optimizing the real-time data based on the data optimizing model, an optimizing objective function is as follows:

let the measurement value set of a batch of collected real-time data be { X } _i I=1, 2, &..n }, and the output value set obtained by calculation of the data tuning model is { X } _i ' i=1, 2, n }, if for any i, |x _i ’-X _i |≤Ω _i ，Ω _i If the ith data variable of the model training result corresponds to a residual error threshold value under the specified confidence limit, the tuning data obtained by the calculation is qualified, otherwise, the output target data item which does not meet the condition in the tuning target function is required to be recalculated after being corrected until all measured value calculation results meet the requirementIn this condition, the model data value can be regarded as the final model tuning data value for each data.