CN110597792A

CN110597792A - Multi-level redundant data fusion method and device based on synchronous line loss data fusion

Info

Publication number: CN110597792A
Application number: CN201910546801.0A
Authority: CN
Inventors: 王维洲; 拜润卿; 辛永; 黄文思; 胡航海; 刘福潮; 邢延东; 陆鑫; 陈婧; 张海龙; 史玉杰; 谷峪; 施炜炜; 陈力; 陈仕彬; 薛迎卫; 范成锋; 郝如海; 祁莹; 赵红
Original assignee: State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; State Grid Gansu Electric Power Co Ltd; Electric Power Research Institute of State Grid Gansu Electric Power Co Ltd; National Network Information and Communication Industry Group Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Information and Telecommunication Co Ltd; State Grid Gansu Electric Power Co Ltd; Electric Power Research Institute of State Grid Gansu Electric Power Co Ltd; National Network Information and Communication Industry Group Co Ltd
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2019-12-20

Abstract

The invention discloses a method and a device for fusing multistage redundant data based on synchronous line loss data fusion, wherein the method comprises the following steps: cleaning the power grid data by using a button tool; carrying out abnormal data identification, system clustering analysis and positive-negative correlation analysis on the cleaned power grid data to obtain an abnormal data discrimination result; and performing multi-level redundant data check correction according to the abnormal data discrimination result to finish the correction of the power grid equipment parameters, the calculation model, the topological data and the electric quantity data. According to the fusion method provided by the embodiment of the invention, multi-source data can be treated by utilizing multi-level redundant data fusion, the data quality is improved, and the use requirement is effectively met.

Description

Multi-level redundant data fusion method and device based on synchronous line loss data fusion

技术领域technical field

本发明涉及电网技术领域，特别涉及一种基于同期线损数据融合的多级冗余数据融合方法及装置。The invention relates to the technical field of power grids, in particular to a multi-level redundant data fusion method and device based on synchronous line loss data fusion.

背景技术Background technique

同期线损系统集成六大业务系统及三大平台，与传统线损统计相比，同期线损不仅需要不同系统之间数据的传输、关联贯通及融合，还需要保证数据的准确性，因此异常数据识别与治理至关重要。The simultaneous line loss system integrates six major business systems and three major platforms. Compared with the traditional line loss statistics, the simultaneous line loss not only requires data transmission, correlation and integration between different systems, but also needs to ensure the accuracy of the data. Therefore, abnormal data Identification and governance are critical.

大数据处理技术：利用Kettle工具实现大数据的抽取、转换，Kettle是一款开源的ETL工具，可以在Window、Linux上运行，数据抽取高效稳定。这个ETL工具集它允许管理来自不同数据库的数据，通过提供一个图形化的用户环境来描述想做什么。Big data processing technology: use the Kettle tool to realize the extraction and conversion of big data. Kettle is an open source ETL tool that can run on Windows and Linux, and the data extraction is efficient and stable. This ETL toolset allows to manage data from different databases by providing a graphical user environment to describe what one wants to do.

多级冗余数据清洗技术：多级冗余数据清洗技术包括：利用滑动窗口与集合论的融合冗余数据清洗方法、借鉴参考标签思想结合信号轻度特征的较差冗余数据清洗方法。将海量数据清洗方法运用的电力系统多级冗余数据清理中，提升电力系统数据质量。Multi-level redundant data cleaning technology: Multi-level redundant data cleaning technology includes: the fusion redundant data cleaning method using sliding window and set theory, and the poor redundant data cleaning method that uses the idea of reference labels combined with light signal characteristics. In the multi-level redundant data cleaning of the power system using the mass data cleaning method, the data quality of the power system is improved.

数据冗余处理技术：冗余数据处理主要分三个阶段：数据收集阶段、数据识别及比较阶段、数据整合阶段。数据收集阶段主要指系统从外部获取相应的数据，并对收集到的数据进行存储、分类，以便于数据识别；数据识别及比较阶段，系统根据所存储数据的特征进行重复性判别，采用相应的算法来确定收集到的数据是唯一的还是重复的；数据整合阶段是指系统在经过数据识别后对重复数据的处理。Data redundancy processing technology: redundant data processing is mainly divided into three stages: data collection stage, data identification and comparison stage, and data integration stage. The data collection phase mainly refers to the system obtaining corresponding data from the outside, and storing and classifying the collected data for data identification; in the data identification and comparison phase, the system conducts repeatability discrimination according to the characteristics of the stored data, and adopts corresponding Algorithms are used to determine whether the collected data is unique or duplicate; the data integration stage refers to the processing of duplicate data by the system after data identification.

聚类分析：是一种探索性的分析，在分类的过程中，人们不必事先给出一个分类的标准，聚类分析能够从样本数据出发，自动进行分类。聚类分析所使用方法的不同，常常会得到不同的结论。不同研究者对于同一组数据进行聚类分析，所得到的聚类数未必一致。Cluster analysis: It is an exploratory analysis. In the process of classification, people do not need to give a classification standard in advance. Cluster analysis can start from sample data and automatically classify. Different methods used in cluster analysis often lead to different conclusions. When different researchers perform cluster analysis on the same set of data, the number of clusters obtained may not be consistent.

相关分析：在回归与相关分析中，因变量值随自变量值的增大(减小)而减小(增大)，在这种情况下，因变量和自变量的相关系数为负值，即负相关。正相关是指自变量增长，因变量也跟着增长。两个变量变动方向相同，一个变量由大到小或由小到大变化时，另一个变量亦由大到小或由小到大变化。Correlation analysis: In regression and correlation analysis, the value of the dependent variable decreases (increases) as the value of the independent variable increases (decreases). In this case, the correlation coefficient between the dependent variable and the independent variable is negative, That is negative correlation. A positive correlation means that as the independent variable increases, the dependent variable also increases. Two variables change in the same direction. When one variable changes from large to small or from small to large, the other variable also changes from large to small or from small to large.

但是，现有的同期线损系统集成了多种源端电力业务系统及平台，数据的质量严重影响线损计算的准确性，亟待改进。However, the existing synchronous line loss system integrates a variety of source-end power business systems and platforms, and the quality of data seriously affects the accuracy of line loss calculation, which needs to be improved urgently.

发明内容Contents of the invention

本发明旨在至少在一定程度上解决相关技术中的技术问题之一。The present invention aims to solve one of the technical problems in the related art at least to a certain extent.

为此，本发明的一个目的在于提出一种基于同期线损数据融合的多级冗余数据融合方法，该融合方法可以利用多级冗余数据融合对多源数据进行治理，提升数据质量，有效满足使用需求。For this reason, an object of the present invention is to propose a multi-level redundant data fusion method based on synchronous line loss data fusion, which can use multi-level redundant data fusion to manage multi-source data, improve data quality, and effectively To meet the needs of use.

本发明的另一个目的在于提出一种基于同期线损数据融合的多级冗余数据融合装置。Another object of the present invention is to propose a multi-level redundant data fusion device based on synchronous line loss data fusion.

为达到上述目的，本发明一方面实施例提出了一种基于同期线损数据融合的多级冗余数据融合方法，包括以下步骤：利用Kettle工具对所述电网数据进行清洗；对清洗后的电网数据进行异常数据识别、系统聚类分析与正负相关分析，得到异常数据甄别结果；根据所述异常数据甄别结果进行多级冗余数据校验修正，以完成电网设备参数、计算模型、拓扑数据与电量数据的修正。In order to achieve the above object, an embodiment of the present invention proposes a multi-level redundant data fusion method based on synchronous line loss data fusion, including the following steps: using the Kettle tool to clean the grid data; Perform abnormal data identification, system cluster analysis and positive and negative correlation analysis on the data to obtain abnormal data screening results; perform multi-level redundant data verification and correction according to the abnormal data screening results to complete power grid equipment parameters, calculation models, and topology data. and power data correction.

本发明实施例的基于同期线损数据融合的多级冗余数据融合方法，基于多级冗余数据融合，对多级融合冗余数据进行清洗、分析、应用，实现电网异常数据快速、自动识别及修正，提升数据质量，提高计算准确度，减少业务人员线下核实的工作量，有效满足使用需求。The multi-level redundant data fusion method based on synchronous line loss data fusion in the embodiment of the present invention cleans, analyzes and applies multi-level fused redundant data based on multi-level redundant data fusion, and realizes fast and automatic identification of power grid abnormal data And corrections, improve data quality, improve calculation accuracy, reduce the workload of business personnel for offline verification, and effectively meet the needs of use.

另外，根据本发明上述实施例的基于同期线损数据融合的多级冗余数据融合方法还可以具有以下附加的技术特征：In addition, the multi-level redundant data fusion method based on synchronous line loss data fusion according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

进一步地，在本发明的一个实施例中，所述对清洗后的多级融合冗余数进行异常数据识别、系统聚类分析与正负相关分析，包括：利用大数据处理方法对所述电网数据进行分析，以甄别出第一异常数据；利用系统聚类分析法对所述电网数据进行分析，以甄别出第二异常数据；利用正负相关分析法对所述电网数据进行分析，以甄别出第三异常数据。Further, in an embodiment of the present invention, the abnormal data identification, system cluster analysis and positive and negative correlation analysis of the cleaned multi-level fusion redundant numbers include: using big data processing methods to analyze the power grid Analyze the data to identify the first abnormal data; analyze the grid data by using the system cluster analysis method to identify the second abnormal data; analyze the grid data by using the positive and negative correlation analysis method to identify Get the third abnormal data.

进一步地，在本发明的一个实施例中，还包括：利用循环冗余校验方法对数据进行传输校验。Further, in an embodiment of the present invention, the method further includes: performing a transmission check on the data by using a cyclic redundancy check method.

进一步地，在本发明的一个实施例中，所述利用Kettle工具对所述多级融合冗余数据进行清洗，进一步包括：利用滑动窗口与集合论进行融合冗余数据清洗；和/或根据参考标签思想结合信号轻度特征进行冗余数据清洗。Further, in one embodiment of the present invention, the cleaning of the multi-level fused redundant data using the Kettle tool further includes: using sliding windows and set theory to clean the fused redundant data; and/or according to reference The idea of labeling combined with light features of the signal is used to clean redundant data.

进一步地，在本发明的一个实施例中，所述根据所述异常数据甄别结果进行多级冗余数据校验修正，包括：利用多级冗余数据对主网、配网、台区各级设备参数、电网拓扑数据、计算模型和采集数据进行校验修正。Further, in an embodiment of the present invention, the multi-level redundant data verification and correction according to the abnormal data screening results includes: using multi-level redundant data to perform the check and correction at each level of the main network, the distribution network, and the station area. Equipment parameters, power grid topology data, calculation models and collected data are verified and corrected.

为达到上述目的，本发明另一方面实施例提出了一种基于同期线损数据融合的多级冗余数据融合装置，包括：清洗模块，用于利用Kettle工具对所述电网数据进进行清洗；甄选模块，用于对清洗后的电网数据进行异常数据识别、系统聚类分析与正负相关分析，得到异常数据甄别结果；修正模块，用于根据所述异常数据甄别结果进行多级冗余数据校验修正，以完成电网设备参数、计算模型、拓扑数据与电量数据的修正。In order to achieve the above object, another embodiment of the present invention proposes a multi-level redundant data fusion device based on synchronous line loss data fusion, including: a cleaning module, which is used to clean the grid data using Kettle tools; The selection module is used to perform abnormal data identification, system cluster analysis and positive and negative correlation analysis on the cleaned power grid data to obtain abnormal data screening results; the correction module is used to perform multi-level redundant data according to the abnormal data screening results Check and correct to complete the correction of grid equipment parameters, calculation models, topology data and power data.

本发明实施例的基于同期线损数据融合的多级冗余数据融合装置，基于多级冗余数据融合，对多级融合冗余数据进行清洗、分析、应用，实现电网异常数据快速、自动识别及修正，提升数据质量，提高计算准确度，减少业务人员线下核实的工作量，有效满足使用需求。The multi-level redundant data fusion device based on synchronous line loss data fusion in the embodiment of the present invention cleans, analyzes and applies multi-level fused redundant data based on multi-level redundant data fusion, and realizes fast and automatic identification of power grid abnormal data And corrections, improve data quality, improve calculation accuracy, reduce the workload of business personnel for offline verification, and effectively meet the needs of use.

另外，根据本发明上述实施例的基于同期线损数据融合的多级冗余数据融合装置还可以具有以下附加的技术特征：In addition, the multi-level redundant data fusion device based on synchronous line loss data fusion according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

进一步地，在本发明的一个实施例中，所述甄选模块包括：第一甄选单元，用于利用大数据处理方法对所述电网数据进行分析，以甄别出第一异常数据；第二甄选单元，用于利用系统聚类分析法对所述电网数据进行分析，以甄别出第二异常数据；第三甄选单元，用于利用正负相关分析法对所述电网数据进行分析，以甄别出第三异常数据。Further, in one embodiment of the present invention, the selection module includes: a first selection unit, configured to analyze the grid data by using a big data processing method to identify first abnormal data; a second selection unit , for analyzing the grid data by using a system clustering analysis method to identify the second abnormal data; the third selection unit is used for analyzing the grid data by using a positive and negative correlation analysis method to identify the second abnormal data Three abnormal data.

进一步地，在本发明的一个实施例中，还包括：校验模块，用于利用循环冗余校验方法对数据进行传输校验。Further, in an embodiment of the present invention, it also includes: a verification module, configured to perform transmission verification on data by using a cyclic redundancy check method.

进一步地，在本发明的一个实施例中，所述清洗模块进一步用于利用滑动窗口与集合论进行融合冗余数据清洗，和/或根据参考标签思想结合信号轻度特征进行冗余数据清洗。Further, in one embodiment of the present invention, the cleaning module is further used to clean redundant data fusion by using sliding window and set theory, and/or clean redundant data according to the idea of reference labels combined with light features of signals.

进一步地，在本发明的一个实施例中，所述修正模块进一步用于利用多级冗余数据对主网、配网、台区各级设备参数、电网拓扑数据、计算模型和采集数据进行校验修正。Furthermore, in one embodiment of the present invention, the correction module is further used to use multi-level redundant data to calibrate the main network, distribution network, equipment parameters at all levels in the station area, power grid topology data, calculation models and collected data. test correction.

本发明附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1为根据本发明实施例的基于同期线损数据融合的多级冗余数据融合方法的流程图；1 is a flowchart of a multi-level redundant data fusion method based on synchronous line loss data fusion according to an embodiment of the present invention;

图2为根据本发明一个具体实施例的基于同期线损数据融合的多级冗余数据融合方法的流程图；2 is a flow chart of a multi-level redundant data fusion method based on synchronous line loss data fusion according to a specific embodiment of the present invention;

图3为根据本发明实施例的基于同期线损数据融合的多级冗余数据融合装置的方框示意图。FIG. 3 is a schematic block diagram of a multi-level redundant data fusion device based on synchronous line loss data fusion according to an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

下面参照附图描述根据本发明实施例提出的基于同期线损数据融合的多级冗余数据融合方法及装置，首先将参照附图描述根据本发明实施例提出的基于同期线损数据融合的多级冗余数据融合方法。The multi-level redundant data fusion method and device based on synchronous line loss data fusion proposed according to the embodiments of the present invention will be described below with reference to the accompanying drawings. Level redundant data fusion method.

图1是本发明一个实施例的基于同期线损数据融合的多级冗余数据融合方法的流程图。FIG. 1 is a flowchart of a multi-level redundant data fusion method based on synchronous line loss data fusion according to an embodiment of the present invention.

如图1所示，该基于同期线损数据融合的多级冗余数据融合方法包括以下步骤：As shown in Figure 1, the multi-level redundant data fusion method based on synchronous line loss data fusion includes the following steps:

在步骤S101中，利用Kettle工具对电网数据进行清洗。In step S101, the Kettle tool is used to clean the grid data.

可以理解的是，如图2所示，对于冗余数据清洗，利用大数据Kettle工具，对同期线损系统多源融合数据开展设备参数、拓扑关系及运行数据等多级冗余数据清洗，即利用大数据工具Kettle，对电网数据进行收集，包括变电站、变压器、开关、补偿设备、输电线路、配电线路、配变、台区、用户、运行数据、图形数据等。It can be understood that, as shown in Figure 2, for redundant data cleaning, use the big data Kettle tool to carry out multi-level redundant data cleaning of equipment parameters, topology relations, and operating data on the multi-source fusion data of the line loss system in the same period, that is, Use the big data tool Kettle to collect grid data, including substations, transformers, switches, compensation equipment, transmission lines, distribution lines, distribution transformers, stations, users, operating data, graphic data, etc.

进一步地，在本发明的一个实施例中，利用Kettle工具对多级融合冗余数据进行清洗，进一步包括：利用滑动窗口与集合论进行融合冗余数据清洗；和/或根据参考标签思想结合信号轻度特征进行冗余数据清洗。Further, in one embodiment of the present invention, using the Kettle tool to clean the multi-level fused redundant data further includes: using sliding windows and set theory to clean the fused redundant data; and/or combining signals according to the idea of reference labels Mild features are used for redundant data cleaning.

可以理解的是，如图2所示，对于多级冗余数据清洗，利用多源融合冗余数据清洗方法、多源交叉冗余数据清洗方法，实现电网多级冗余数据清洗。It can be understood that, as shown in Figure 2, for multi-level redundant data cleaning, the multi-level redundant data cleaning method of the power grid is realized by using the multi-source fusion redundant data cleaning method and the multi-source cross redundant data cleaning method.

具体地，多级冗余数据清洗技术：①利用滑动窗口与集合论的融合冗余数据清洗方法。首先，以原始数据唯一标识为关键字段初始化融合标签列表，融合标签列表中节点都是由原始数据元组构成的元组缓存列队；其次，从多个读写器试试接受数据流，解析元组的唯一标识属性，根据元组的唯一标识确定融合标签列表中该读写器元组缓存列队的存储位置，将新元组原始数据组插入元组缓存列队确保时间升序；然后，根据时间滑动窗口的大小和噪声阈值，对每个元组缓存列队进行噪声过滤和单点冗余消除；最后一句随机算法，选择组内任意元组的唯一标识昨晚清洗数据的元组的唯一标识，进行数据清理。②借鉴参考标签思想结合信号轻度特征的较差冗余数据清洗方法。首先读取配置文件绑定参数标签和读写器的归属关系，创建初始的交叉标签列表，并从读写器不揭短推送的原始数据流试试更新交叉标签列表；其次，检测交叉标签列表中是否存在参考标签以为的待仲裁标签，如果没有，则过滤交叉标签列表中的过期交叉信息元组保持合理的内存占用量，如果有，则根据设定的滑动窗口检测标签分组中的元组缓存列队；然后，计算时间滑动窗口内，待仲裁标签相对应已监测到该标签的读写器信号强度；最后，计算待仲裁标签相对应读写器参考标签信号强度向量的欧氏距离。利用最小化相对位置相似度的方法仲裁出较差冗余数据的归真读写器，仲裁后通过互斥处理消除交叉冗余数据。Specifically, the multi-level redundant data cleaning technology: ①Using the fusion redundant data cleaning method of sliding window and set theory. First, initialize the fusion label list with the unique identifier of the original data as the key field. The nodes in the fusion label list are tuple cache queues composed of original data tuples; secondly, try to receive data streams from multiple readers and parse them. The unique identification attribute of the tuple, according to the unique identification of the tuple, determine the storage location of the tuple cache queue of the reader in the fusion tag list, and insert the original data group of the new tuple into the tuple cache queue to ensure that the time is ascending; then, according to the time The size of the sliding window and the noise threshold, perform noise filtering and single-point redundancy elimination on each tuple cache queue; the last random algorithm selects the unique identifier of any tuple in the group The unique identifier of the tuple that cleaned the data last night, Perform data cleaning. ② A poor redundant data cleaning method based on the idea of reference labels combined with the mild features of the signal. First read the affiliation relationship between the configuration file binding parameter label and the reader, create an initial cross-label list, and try to update the cross-label list from the raw data stream pushed by the reader; secondly, detect the cross-label list Whether there is a tag to be arbitrated by the reference tag, if not, filter the expired cross information tuple in the cross tag list to maintain a reasonable memory usage, if yes, detect the tuple cache in the tag group according to the set sliding window Then, within the time sliding window, the signal strength of the tag to be arbitrated corresponds to the reader that has monitored the tag; finally, the Euclidean distance between the tag to be arbitrated and the signal strength vector of the reader-writer reference tag is calculated. Using the method of minimizing the relative position similarity to arbitrate out the true reader-writer with poor redundant data, after the arbitration, the cross-redundant data is eliminated through mutual exclusion processing.

在步骤S102中，对清洗后的电网数据进行异常数据识别、系统聚类分析与正负相关分析，得到异常数据甄别结果。In step S102, abnormal data identification, system cluster analysis and positive and negative correlation analysis are performed on the cleaned grid data to obtain abnormal data screening results.

其中，在本发明的一个实施例中，对清洗后的多级融合冗余数进行异常数据识别、系统聚类分析与正负相关分析，包括：利用大数据处理方法对电网数据进行分析，以甄别出第一异常数据；利用系统聚类分析法对电网数据进行分析，以甄别出第二异常数据；利用正负相关分析法对电网数据进行分析，以甄别出第三异常数据。Among them, in one embodiment of the present invention, abnormal data identification, system cluster analysis and positive and negative correlation analysis are performed on the cleaned multi-level fusion redundant numbers, including: using big data processing methods to analyze power grid data to The first abnormal data is identified; the grid data is analyzed by using the system cluster analysis method to identify the second abnormal data; the positive and negative correlation analysis method is used to analyze the grid data to identify the third abnormal data.

具体而言，如图2所示，对于异常数据识别，同期线损系统集成六大业务系统及三大平台，利用大数据处理技术，对缺失、跳变等异常数据进行甄别；对于聚类分析，利用系统聚类分析法对电网数据进行分析，分析甄别异常数据；正负相关性分析，利用正负相关分析法对电网数据进行分析，分析甄别异常数据。Specifically, as shown in Figure 2, for abnormal data identification, the simultaneous line loss system integrates six major business systems and three major platforms, and uses big data processing technology to identify abnormal data such as missing and jumping; for cluster analysis, Use system clustering analysis method to analyze power grid data, analyze and screen abnormal data; positive and negative correlation analysis, use positive and negative correlation analysis method to analyze power grid data, analyze and screen abnormal data.

举例而言，利用大数据处理技术，对收集到的数据进行数据识别及比较，根据数据的特征进行重复判断，采用相应的算法来确定收集到的数据的唯一性，对冗余数据进行处理。For example, use big data processing technology to identify and compare the collected data, make repeated judgments based on the characteristics of the data, use corresponding algorithms to determine the uniqueness of the collected data, and process redundant data.

进一步地，针对集成了六大业务系统及三大平台的同期线损系统数据，利用大数据处理技术，对变电站、变压器、开关、补偿设备、输电线路、配电线路、配变、台区、用户等设备数据缺失、运行数据采集失败、运行数据跳变等异常数据进行甄别。Further, for the synchronous line loss system data that integrates six major business systems and three major platforms, use big data processing technology to analyze substations, transformers, switches, compensation equipment, transmission lines, distribution lines, distribution transformers, station areas, and users Screen out abnormal data such as missing equipment data, failure to collect operational data, and jumps in operational data.

进一步地，系统聚类分析，是一种采用传统的统计聚类分析法，确定聚类主要包括同类型设备参数、同一设备运行数据。首先，同类型设备参数，例如S9-250/10型变压器，受生产厂家影响，空载损耗、负载损耗、短路电压百分比、空载电流百分比、电阻、电抗等参数，在一定范围内分布，若出现异常跳变则为异常数据；其次，同一设备运行数据，运行数据波动存在一定的连续性，通过聚类可准确定位运行数据突变。Furthermore, the system cluster analysis is a traditional statistical cluster analysis method to determine that the cluster mainly includes the parameters of the same type of equipment and the operating data of the same equipment. First of all, the parameters of the same type of equipment, such as S9-250/10 transformers, are affected by the manufacturer, and parameters such as no-load loss, load loss, short-circuit voltage percentage, no-load current percentage, resistance, and reactance are distributed within a certain range. Abnormal jumps are abnormal data; secondly, there is a certain continuity in the operation data of the same equipment, and the operation data fluctuations can be accurately located through clustering.

进一步地，相关性分析，利用正负相关分析电网数据进行分析，分析甄别异常数据。相关关系是变了之间的不确定的依存关系，相关性分析就是研究这种变量间不确定依存关系及其密切程度的一种常用的统计方法，通常用相关系数加以度量，相关系数是描述变量之间相关程度和方向的统计量，通常用r表示，并且满足-1≤r≤1，给定变量的数据(x_i，y_i)，i＝1，2，……，n，样本数据的相关系数计算如下：Further, correlation analysis uses positive and negative correlation analysis to analyze power grid data, and analyzes and screens out abnormal data. Correlation is the uncertain dependence between variables. Correlation analysis is a commonly used statistical method to study the uncertain dependence between variables and their closeness. It is usually measured by correlation coefficient. Correlation coefficient is a description Statistics of the degree and direction of correlation between variables, usually expressed by r, and satisfying -1≤r≤1, given the data of variables ( _xi , y _i ), i=1, 2,..., n, sample The correlation coefficient of the data is calculated as follows:

相关性分析就是变量之间密切程度的分析，其任务是对变量之间是否存在必然的联系，联系的密切程度，变动的方向做出符合实际的判读，并测定它们联系的密切程度，检验其有效性。在多源数据融合的过程中，存在一定量的异常数据，这些数据直观上远远偏离其他数据，它们的存在使变量间的密切程度降低，利用相关性分析，准确定位、修正多源数据中的异常数据，指导数据治理工作开展。Correlation analysis is the analysis of the degree of closeness between variables. Its task is to make a realistic interpretation of whether there is an inevitable connection between the variables, the degree of closeness of the connection, and the direction of change, and to measure the degree of closeness of their connection, and to test its effectiveness. In the process of multi-source data fusion, there is a certain amount of abnormal data. These data are intuitively far away from other data. Their existence reduces the closeness between variables. Using correlation analysis, accurately locate and correct multi-source data. abnormal data to guide the development of data governance.

在步骤S103中，根据异常数据甄别结果进行多级冗余数据校验修正，以完成电网设备参数、计算模型、拓扑数据与电量数据的修正。In step S103, multi-level redundant data verification and correction is performed according to the abnormal data screening results, so as to complete the correction of grid equipment parameters, calculation models, topology data and power data.

进一步地，在本发明的一个实施例中，根据异常数据甄别结果进行多级冗余数据校验修正，包括：利用多级冗余数据对主网、配网、台区各级设备参数、电网拓扑数据、计算模型和采集数据进行校验修正。Further, in one embodiment of the present invention, multi-level redundant data verification and correction is performed according to the abnormal data screening results, including: using multi-level redundant data to check the main network, distribution network, station area equipment parameters at all levels, power grid The topology data, calculation model and collected data are verified and corrected.

可以理解的是，如图2所示，对于多级冗余数据校验修正，在冗余数据清洗、异常数据分析甄别的基础上，利用多级冗余数据对异常数据进行校验修正，包括设备参数、拓扑数据、计算模型、电量数据等，提升数据质量。即言，利用多级冗余数据对主网、配网、台区各级设备参数、电网拓扑数据、计算模型、采集数据进行校验修正，实现电网异常数据快速、自动识别及修正，提升数据质量，提高计算准确度。It can be understood that, as shown in Figure 2, for multi-level redundant data verification and correction, on the basis of redundant data cleaning and abnormal data analysis and screening, multi-level redundant data is used to verify and correct abnormal data, including Equipment parameters, topology data, calculation models, power data, etc., to improve data quality. In other words, use multi-level redundant data to verify and correct the main network, distribution network, and station area equipment parameters at all levels, power grid topology data, calculation models, and collected data, so as to realize rapid and automatic identification and correction of abnormal data in the power grid, and improve data quality. quality and improve calculation accuracy.

进一步地，在本发明的一个实施例中，本发明实施例的方法还包括：利用循环冗余校验方法对数据进行传输校验。Further, in an embodiment of the present invention, the method of the embodiment of the present invention further includes: performing transmission verification on the data by using a cyclic redundancy check method.

在本发明的实施例中，利用循环冗余校验(CRC)技术，实现数据传输校验功能，提升数据传输质量。具体地，利用循环冗余校验(CRC)技术，实现数据传输校验功能。数据发送方和接收方事先约好一个生产多项式G(X)，该生成多项式作为除数多项式。将要发送的数据比特序列作为一个多项式F(X)的系数，该多项式为被除数多项式，被除数多项式F(X)除以除数多项式G(X)所得的余数多项式R(X)的系数就为CRC码。CRC码被添加到要发送的二进制数据比特序列后面，形成发送码。接收方收到发送码后，同样将其看成是一个多项式的系数序列，并用同样的生成多项式来除该多项式，若余数为零，则传输无错误；否则，传输有差错。In the embodiment of the present invention, a cyclic redundancy check (CRC) technology is used to implement a data transmission verification function and improve data transmission quality. Specifically, a cyclic redundancy check (CRC) technology is used to implement a data transmission verification function. The data sender and the receiver agree on a production polynomial G(X) in advance, and the generator polynomial is used as the divisor polynomial. The data bit sequence to be sent is used as the coefficient of a polynomial F(X), which is the dividend polynomial, and the coefficient of the remainder polynomial R(X) obtained by dividing the dividend polynomial F(X) by the divisor polynomial G(X) is the CRC code . The CRC code is added to the binary data bit sequence to be sent to form the send code. After receiving the sent code, the receiver also regards it as a coefficient sequence of a polynomial, and divides the polynomial with the same generator polynomial. If the remainder is zero, the transmission is error-free; otherwise, there is an error in the transmission.

综上，如图2所示，在同期线损多源数据融合的基础上，利用大数据Kettle工具对多级融合冗余数据进行清洗、分析、修正，实现一种利用多级冗余数据对多源数据治理，提升数据质量的方法，其中，数据清洗包括：利用Kettle进行数据收集、数据识别及比较、数据整合，实现对多源数据冗余数据清洗工作；在冗余数据清洗后，进行异常数据识别、系统聚类分析、正负相关分析，利用大数据处理技术，实现对海量数据异常甄别。根据异常数据甄别结果，利用多级冗余数据校验修正功能，实现对电网设备参数、计算模型、拓扑数据、电量数据的修正，充分利用多级冗余数据提供一种对多源融合数据的治理方法，实现电网异常数据快速、自动识别及修正功能，提升数据质量，提高计算准确度。To sum up, as shown in Figure 2, on the basis of multi-source data fusion of line loss in the same period, the big data Kettle tool is used to clean, analyze, and correct the multi-level fusion redundant data, and realize a method of utilizing multi-level redundant data. Multi-source data governance is a method to improve data quality. Among them, data cleaning includes: using Kettle for data collection, data identification and comparison, and data integration to realize redundant data cleaning of multi-source data; after redundant data cleaning, carry out Abnormal data identification, system cluster analysis, positive and negative correlation analysis, using big data processing technology to realize abnormal screening of massive data. According to the abnormal data screening results, the multi-level redundant data verification and correction function is used to realize the correction of power grid equipment parameters, calculation models, topology data, and power data, and fully utilize the multi-level redundant data to provide a multi-source fusion data. The governance method realizes the rapid and automatic identification and correction function of abnormal data of the power grid, improves the data quality, and improves the calculation accuracy.

根据本发明实施例的基于同期线损数据融合的多级冗余数据融合方法，基于多级冗余数据融合，对多级融合冗余数据进行清洗、分析、应用，实现电网异常数据快速、自动识别及修正，提升数据质量，提高计算准确度，减少业务人员线下核实的工作量，有效满足使用需求。According to the multi-level redundant data fusion method based on synchronous line loss data fusion according to the embodiment of the present invention, based on the multi-level redundant data fusion, the multi-level fused redundant data is cleaned, analyzed, and applied, and the abnormal data of the power grid is quickly and automatically realized. Identify and correct, improve data quality, improve calculation accuracy, reduce the workload of business personnel for offline verification, and effectively meet usage needs.

其次参照附图描述根据本发明实施例提出的基于同期线损数据融合的多级冗余数据融合装置。Next, a multi-level redundant data fusion device based on synchronous line loss data fusion proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.

图3是本发明一个实施例的基于同期线损数据融合的多级冗余数据融合装置的方框示意图。Fig. 3 is a schematic block diagram of a multi-level redundant data fusion device based on synchronous line loss data fusion according to an embodiment of the present invention.

如图3所示，该基于同期线损数据融合的多级冗余数据融合装置10包括：清洗模块100、甄选模块200和修正模块300。As shown in FIG. 3 , the multi-level redundant data fusion device 10 based on synchronous line loss data fusion includes: a cleaning module 100 , a selection module 200 and a correction module 300 .

其中，清洗模块100用于利用Kettle工具对电网数据进进行清洗。甄选模块200用于对清洗后的电网数据进行异常数据识别、系统聚类分析与正负相关分析，得到异常数据甄别结果。修正模块300用于根据异常数据甄别结果进行多级冗余数据校验修正，以完成电网设备参数、计算模型、拓扑数据与电量数据的修正。根据本发明实施例的融合装置10可以利用多级冗余数据融合对多源数据进行治理，提升数据质量，有效满足使用需求。Wherein, the cleaning module 100 is used to clean the grid data by using the Kettle tool. The selection module 200 is used to perform abnormal data identification, system cluster analysis and positive and negative correlation analysis on the cleaned power grid data to obtain abnormal data screening results. The correction module 300 is used to perform multi-level redundant data verification and correction according to abnormal data screening results, so as to complete the correction of grid equipment parameters, calculation models, topology data and power data. The fusion device 10 according to the embodiment of the present invention can use multi-level redundant data fusion to manage multi-source data, improve data quality, and effectively meet usage requirements.

进一步地，在本发明的一个实施例中，甄选模块200包括：第一甄选单元、第二甄选单元和第三甄选单元。其中，第一甄选单元用于利用大数据处理方法对电网数据进行分析，以甄别出第一异常数据。第二甄选单元用于利用系统聚类分析法对电网数据进行分析，以甄别出第二异常数据。第三甄选单元用于利用正负相关分析法对电网数据进行分析，以甄别出第三异常数据。Further, in an embodiment of the present invention, the selection module 200 includes: a first selection unit, a second selection unit and a third selection unit. Wherein, the first selection unit is used to analyze the power grid data using a big data processing method to identify the first abnormal data. The second selection unit is used to analyze the grid data by using the system cluster analysis method to identify the second abnormal data. The third selection unit is used to analyze the power grid data by positive and negative correlation analysis to identify the third abnormal data.

进一步地，在本发明的一个实施例中，本发明实施例的融合装置10还包括：校验模块。其中，校验模块用于利用循环冗余校验方法对数据进行传输校验。Further, in an embodiment of the present invention, the fusion device 10 of the embodiment of the present invention further includes: a verification module. Wherein, the verification module is used for performing transmission verification on data by using a cyclic redundancy check method.

进一步地，在本发明的一个实施例中，清洗模块100进一步用于利用滑动窗口与集合论进行融合冗余数据清洗，和/或根据参考标签思想结合信号轻度特征进行冗余数据清洗。Further, in an embodiment of the present invention, the cleaning module 100 is further used to clean redundant data fusion by using sliding window and set theory, and/or clean redundant data according to the idea of reference labels combined with light features of signals.

进一步地，在本发明的一个实施例中，修正模块300进一步用于利用多级冗余数据对主网、配网、台区各级设备参数、电网拓扑数据、计算模型和采集数据进行校验修正。Further, in one embodiment of the present invention, the correction module 300 is further used to verify the main network, distribution network, equipment parameters at all levels in the station area, power grid topology data, calculation models and collected data by using multi-level redundant data fix.

需要说明的是，前述对基于同期线损数据融合的多级冗余数据融合方法实施例的解释说明也适用于该实施例的基于同期线损数据融合的多级冗余数据融合装置，此处不再赘述。It should be noted that the foregoing explanations for the embodiment of the multi-level redundant data fusion method based on synchronous line loss data fusion are also applicable to the multi-level redundant data fusion device based on synchronous line loss data fusion in this embodiment, here No longer.

根据本发明实施例的基于同期线损数据融合的多级冗余数据融合装置，基于多级冗余数据融合，对多级融合冗余数据进行清洗、分析、应用，实现电网异常数据快速、自动识别及修正，提升数据质量，提高计算准确度，减少业务人员线下核实的工作量，有效满足使用需求。According to the multi-level redundant data fusion device based on synchronous line loss data fusion according to the embodiment of the present invention, based on the multi-level redundant data fusion, the multi-level fused redundant data is cleaned, analyzed, and applied, and the abnormal data of the power grid is quickly and automatically realized. Identify and correct, improve data quality, improve calculation accuracy, reduce the workload of business personnel for offline verification, and effectively meet usage needs.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present invention, "plurality" means at least two, such as two, three, etc., unless otherwise specifically defined.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims

1. A multi-level redundant data fusion method based on synchronous line loss data fusion, is characterized in that, comprises the following steps:

Use the Kettle tool to clean the grid data;

Perform abnormal data identification, system cluster analysis and positive and negative correlation analysis on the cleaned power grid data to obtain abnormal data screening results; and

Perform multi-level redundant data verification and correction according to the abnormal data screening results to complete the correction of grid equipment parameters, calculation models, topology data and power data.

2. The method according to claim 1, wherein said performing abnormal data identification, system cluster analysis and positive and negative correlation analysis on the multi-level fusion redundant numbers after cleaning includes:

Analyzing the grid data by using a big data processing method to identify the first abnormal data;

Analyzing the grid data by using a systematic clustering analysis method to identify the second abnormal data;

The grid data is analyzed using a positive and negative correlation analysis method to identify the third abnormal data.

3. The method according to claim 1, further comprising:

The data is transmitted and verified by using the cyclic redundancy check method.

4. The method according to claim 1, wherein said utilizing the Kettle tool to clean said multi-level fusion redundant data further comprises:

Fusion redundant data cleaning using sliding windows and set theory; and/or

Redundant data cleaning is carried out according to the idea of reference labels combined with light features of signals.

5. The method according to any one of claims 1-4, wherein the performing multi-level redundant data verification and correction according to the abnormal data screening results includes:

Use multi-level redundant data to verify and correct the main network, distribution network, and equipment parameters at all levels in the station area, power grid topology data, calculation models, and collected data.

6. A multi-level redundant data fusion device based on synchronous line loss data fusion, characterized in that it comprises:

A cleaning module, configured to clean the grid data using the Kettle tool;

The selection module is used to perform abnormal data identification, system cluster analysis and positive and negative correlation analysis on the cleaned power grid data to obtain abnormal data screening results; and

The correction module is used to perform multi-level redundant data verification and correction according to the abnormal data screening results, so as to complete the correction of grid equipment parameters, calculation models, topology data and power data.

7. The device according to claim 6, wherein the selection module comprises:

The first selection unit is configured to use a big data processing method to analyze the grid data to identify the first abnormal data;

The second selection unit is configured to analyze the grid data by using a system cluster analysis method to identify second abnormal data;

The third selection unit is configured to analyze the grid data by using a positive and negative correlation analysis method to identify third abnormal data.

8. The device according to claim 6, further comprising:

The verification module is used for performing transmission verification on data by using a cyclic redundancy check method.

9. The device according to claim 6, characterized in that, the cleaning module is further used to perform fusion redundant data cleaning by using sliding window and set theory, and/or perform redundancy according to the idea of reference labels combined with light features of signals Data cleaning.

10. The device according to any one of claims 6-9, characterized in that the correction module is further used to use multi-level redundant data to modify the equipment parameters and power grid topology data of the main network, distribution network, and station area , Calculation model and collected data for verification and correction.