CN120578708B

CN120578708B - A method and system for power grid data integration

Info

Publication number: CN120578708B
Application number: CN202511086416.4A
Authority: CN
Inventors: 于仕; 李申逸; 母祥才; 江小辉; 李敏弘; 刘帆; 王祉
Original assignee: Jiangxi Kechen Hongxing Information Technology Co ltd
Current assignee: Jiangxi Kechen Hongxing Information Technology Co ltd
Priority date: 2025-08-05
Filing date: 2025-08-05
Publication date: 2025-12-30
Anticipated expiration: 2045-08-05
Also published as: CN120578708A

Abstract

The invention discloses a power grid data integration method and system, wherein the method comprises the steps of constructing a basic environment, configuring a cluster client and establishing a cluster storage main body, constructing a column-type database and a task channel, acquiring cluster storage main body information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the column-type database, carrying out data mapping according to the data information of the measurement table, and carrying out transfer and statistics according to a data mapping result. The invention can compare the general quantity with the high-frequency measured data after gathering, so as to ensure the uniqueness of the measured data, and rapidly locate by giving a unique mark, and simultaneously set the writing period and the upper limit of the classifying capacity, and write the measured table into the column database every other writing period or the oversubscription capacity reaches the upper limit of the classifying capacity, thereby ensuring the timeliness of the writing of the measured data and improving the writing efficiency of the measured data.

Description

Power grid data integration method and system

Technical Field

The invention relates to the field of data processing, in particular to a power grid data integration method and system.

Background

The power grid measurement data mainly refer to real-time/near real-time operation data (such as voltage, current, power and frequency), measurement data (electric energy consumption and load curve) and equipment state data collected by equipment such as a sensor, a smart meter, an SCADA system, a PMU (synchronous phasor measurement unit) and the like.

At present, in the process of integrating power grid measurement, data of each measurement system, such as a measurement automation system and an electric energy collection system, are regularly extracted through an ETL tool, are stored in a relational database after being cleaned and converted, so that a historical measurement data center is formed, and report generation and offline modeling are facilitated.

However, in the conventional ETL batch mode, the delay from acquisition to application of the metrology data is up to 5-10 minutes, and the data writing speed is slow.

Disclosure of Invention

Based on the above, the invention aims to provide a power grid data integration method and system, which aim to solve the problems that the delay from acquisition to application of measurement data reaches 5-10 minutes and the data writing speed is slower in the traditional ETL batch processing mode.

In order to achieve the above object, the present invention provides a method for integrating grid data, which includes:

constructing a basic environment, configuring a cluster client and establishing a cluster storage main body;

Building a column database and a task channel, acquiring cluster storage main body information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the column database, analyzing, classifying and enumerating information definition on the measurement table, wherein the enumerating information definition process comprises real-time enumeration and mining, predictive enumeration definition and associated weight dynamic adjustment, extracting implicit enumeration candidate values as basic characteristics through the real-time enumeration and mining, synchronizing the implicit enumeration candidate values to an input layer of a predictive model in the predictive enumeration definition, taking future enumeration probability distribution conditions generated by predictive enumeration predefining as prospective characteristics, reversing the real-time mining direction, taking cluster storage main body load characteristics in the associated weight dynamic adjustment as constraint characteristics, and limiting the storage priority of the enumeration values;

And carrying out data mapping according to the data information of the measurement table, transferring the data mapping result to an entity table, and carrying out consistency statistics on the measurement table and the entity table.

According to an aspect of the foregoing technical solution, the steps of constructing a basic environment, configuring a cluster client, and establishing a cluster storage body include:

configuring a big data basic environment, supporting Java scala development language compiling, and supporting development environment operation deployment according to a cluster client;

The cluster storage body, the data storage duration and the partition size are created based on the size of the actual data volume of the metrology data and the business rules.

According to an aspect of the above technical solution, the building a columnar database and a task channel, obtaining cluster storage subject information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the columnar database, analyzing and classifying the measurement table, and defining enumeration information, where the enumeration information defining process includes real-time enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, extracting implicit enumeration candidate values through real-time enumeration mining as basic features, synchronizing the implicit enumeration candidate values to an input layer of a predictive model in the predictive enumeration definition, taking future enumeration probability distribution generated by predictive enumeration predefining as prospective features, reflecting real-time mining directions, taking cluster storage subject load features in the associated weight dynamic adjustment as constraint features, and limiting storage priority of the enumeration values, where the steps include:

designing a table name of a columnar database aiming at measurement information of any one measurement table, setting the size of a partition in the measurement information data analysis process and the table field of the measurement table, and completing automatic creation of the measurement table;

Based on the collection information of the cluster client and the service information of the measurement table, the uniqueness of the measurement table is analyzed, and the unique identification of each row of measurement data in the measurement table is generated.

According to one aspect of the above technical solution, a serialization parser for a data parsing process is designed for data information of a measurement table, and a communication path of a cluster storage main body is constructed;

And constructing check points in the processing frame environment according to the data magnitude of the measurement table, and subscribing the cluster storage main body information in the measurement table in real time.

According to one aspect of the technical scheme, the measurement data in the measurement table is analyzed through the processing framework, the domain knowledge graph is introduced, the measurement data of the measurement table is analyzed through the domain knowledge graph, the field logic of the association table of the current measurement table is checked by utilizing the data blood-margin tracking, the matching degree of the measurement data in the current measurement table and the column database is identified by using the isolated forest algorithm, the normal data and the abnormal data are screened out, the time period to which the normal data belong is classified, and the abnormal data are merged into the same time period;

Performing enumeration information definition on the classified measurement table, writing the enumeration information into a column database of a belonging table field according to the association of the enumeration information and a cluster storage main body, analyzing a table construction rule of the measurement table, checking the measurement table written into the column database and a table field of the belonging column database, if the content of the measurement table is consistent with the table field, writing successfully, and if the content of the measurement table is inconsistent with the table field, automatically creating a column database matched with the measurement table;

Setting a writing period and a classifying capacity upper limit, and writing the measuring table into the column database every other writing period or oversubscription capacity reaching the classifying capacity upper limit.

According to an aspect of the foregoing technical solution, the step of performing data mapping according to the data information of the measurement table, transferring the data mapping result to an entity table, and performing consistency statistics on the measurement table and the entity table includes:

Based on the data information of the measurement table, generating an entity table with unique identification according to a custom rule, wherein the entity table with unique identification covers the maximum data range of the column database;

Reading the data content of the measuring table of different table fields in the column database and the custom rule by a big data processing engine so as to transfer the data content of the measuring table to the entity table with the unique identifier;

And carrying out consistency statistics on the data content of the measuring table and the data content in the entity table with the unique identification.

According to one aspect of the above technical solution, during the process of transferring and counting the data information of the measurement table, the data information of the measurement table is monitored;

At least collecting cluster storage main body information and data service information of the measuring table, counting the record number of the measuring table in real time through a processing frame to conduct data real-time monitoring, and simultaneously grabbing a time period when the data information of the measuring table is written into a column database to monitor timeliness of the data.

The invention also provides a power grid data integration system for realizing the power grid data integration method, which comprises the following steps:

The configuration module is used for constructing a basic environment, configuring a cluster client and establishing a cluster storage main body;

The system comprises an import module, a prediction type enumeration definition and an associated weight dynamic adjustment module, wherein the import module is used for constructing a column type database and a task channel, acquiring cluster storage main body information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the column type database, analyzing and classifying the measurement table and defining enumeration information, the enumeration information definition process comprises real-time enumeration and mining, the prediction type enumeration definition and the associated weight dynamic adjustment, the implicit enumeration candidate value is extracted through the real-time enumeration and mining and is used as basic characteristics, the implicit enumeration candidate value is synchronized to an input layer of a prediction model in the prediction type enumeration definition, future enumeration probability distribution generated by the prediction type enumeration predefining is used as prospective characteristics, the real-time mining direction is reversed, the cluster storage main body load characteristics in the associated weight dynamic adjustment are used as constraint characteristics, and the storage priority of the enumeration value is limited;

and the transferring module is used for carrying out data mapping according to the data information of the measuring table, transferring the data mapping result to the entity table, and carrying out consistency statistics on the measuring table and the entity table.

The invention also proposes a computer readable storage medium on which a computer program is stored which, when executed by a processor, implements a grid data integration method as described above.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor realizes the power grid data integration method when executing the computer program.

In summary, according to the grid data integration method provided by the invention, a big data basic environment is configured to support compiling of various development languages, a cluster client is configured, a cluster storage main body is established, a data storage duration and a partition size are created for the cluster storage main body to accommodate general quantity data, a built-up database and a task channel are built, data information of a measurement table of the cluster storage main body is obtained through the task channel, the measurement data are converged while the uniqueness of the measurement table is maintained, the converged measurement data are written into the measurement table in the column database by means of writing-in period and classifying capacity upper limit rules, the measurement table is analyzed, classified and enumerated information is defined, the enumeration information definition process comprises enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, hidden enumeration candidate values are extracted by means of real-time enumeration mining to serve as basic characteristics, the hidden enumeration candidate values are synchronized to an input layer of a predictive model in the predictive enumeration definition, the distribution situation of the predictive enumeration probability is taken as a forward-looking characteristic, the clustered dynamic enumeration load dynamic adjustment is taken as a pre-defining direction, the measurement table in the associated dynamic weight adjustment is mapped to the load dynamic measurement table, the measurement table is stored in a constraint map to the accuracy, and the data is mapped to the data of a constraint and the statistical entity is stored in a proper process.

The invention realizes feature sharing through data intercommunication, uses current value, future value and storage rule decision linkage to generate a complete enumeration information system, and continuously optimizes precision and efficiency through feedback closed loop. The enumeration information definition of the measurement table not only can realize static field marking, but also can dynamically respond to service change and adapt to future demands, and meanwhile, the general quantity and the high-frequency measurement data can be subjected to the collected chronology difference comparison so as to ensure the uniqueness of the measurement data, and the unique identification is given to enable the measurement data to be positioned quickly, meanwhile, the writing period and the classification capacity upper limit are set, the classification capacity upper limit is reached every other writing period or the oversubstance, the measurement table is written into the column database, the timeliness of the writing of the measurement data is ensured, and the writing efficiency of the measurement data is improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a flowchart of a method for integrating grid data according to a first embodiment of the present invention;

fig. 2 is a schematic structural diagram of a grid data integration system according to a second embodiment of the present invention;

Fig. 3 is a block diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Example 1

Fig. 1 is a flowchart of a power grid data integration method according to a first embodiment of the present invention, wherein the power grid data integration method includes steps S01-S03, in which:

s01, constructing a basic environment, configuring a cluster client and establishing a cluster storage main body.

The build base environment includes a big data base environment configuration, a cluster client configuration, and a cluster storage body configuration. The big data base environment basic configuration comprises supporting Java, scala development language compiling, supporting MRS-HBASE, MRS-KAFKA, MRS-HDFS, MRS-YARN, MRS-KLINK, DWS and other services, and the like, wherein access must be dependent on configuration of packages and the like, supporting development environment operation deployment and debugging application according to an MRS cluster client, and creating a cluster storage main body, data storage duration and partition size according to the size of the actual data volume of measured data and business rules.

S02, building a column database and a task channel, acquiring cluster storage main body information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the column database, analyzing and classifying the measurement table and defining enumeration information, wherein the enumeration information defining process comprises real-time enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, extracting hidden enumeration candidate values through the real-time enumeration mining as basic characteristics, synchronizing the hidden enumeration candidate values to an input layer of a predictive model in the predictive enumeration definition, taking future enumeration probability distribution generated by predictive enumeration predefining as prospective characteristics, reversing the real-time mining direction, taking cluster storage main body load characteristics in the associated weight dynamic adjustment as constraint characteristics, and limiting the storage priority of the enumeration values.

And (3) building a list database, designing the table name of the list database according to the measurement information on each measurement table, setting the size of the measurement information data analysis process partition and the table field of the measurement table, and completing automatic measurement table creation.

The method comprises the steps that a cluster storage main body is collected to obtain information of a corresponding cluster client, and service information of a measuring table of the corresponding cluster storage main body is combined.

The uniqueness of the measurement data of the current measurement table is analyzed to follow the uniqueness and discreteness of the columnar database, and a unique identification of each row of measurement data in the measurement table is generated. In the process of generating the unique identifier of each line of measurement data, a 'ciphertext+plaintext' mode can be adopted, and plaintext design is designed by tracking and analyzing the measurement data of each measurement table and selecting a proper service field combination mode. The method of ciphertext and plaintext can ensure the uniqueness, discreteness and readability of the columnar database, avoid the problem of hot spots and improve the service usability.

Aiming at the data information of the measurement table, a serialization analyzer which accords with the actual service condition in the data analysis process is designed, so that when the data processing is carried out subsequently, the data deserialization and serialization time is reduced, and the data processing rate is improved.

And constructing a communication path of the cluster storage main body, constructing a check point in the processing frame environment according to the data magnitude of the measurement table, periodically checking the state of the measurement data of the measurement table in the column database according to the check point, and subscribing the cluster storage main body information in the measurement table in real time.

And analyzing the measurement data in the measurement table through the processing frame, screening out abnormal data based on the business rule, and performing special processing, wherein the business time is data before 1 year, and the business time format is not right or null. And classifying the time period to which the normal data belongs, merging the abnormal data in the same time period to ensure that all the data belongs to a certain day, preparing for merging and landing of the subsequent data, and improving the concurrent processing speed of the data. More specifically, in the analysis process of the measurement table, a domain knowledge graph is introduced, the domain knowledge graph can be a business rule of an electric power system and intelligent manufacturing, analysis is carried out on measurement data of the measurement table through the domain knowledge graph, field logic of an association table of the current measurement table is checked through data blood-edge tracking, and the matching degree of the measurement data in the current measurement table and a column database is identified by using an isolated forest algorithm, so that abnormal data is prevented from being classified as normal data.

And performing enumeration information definition on the classified measurement table, writing the enumeration information into a column database of a belonging table field according to the association of the enumeration information and a cluster storage main body, analyzing a table construction rule of the measurement table, checking the measurement table written into the column database and a table field of the belonging column database, if the content of the measurement table is consistent with the table field, writing successfully, and if the content of the measurement table is inconsistent with the table field, automatically creating a column database matched with the measurement table.

In the process of performing enumeration information definition on the measurement table, the enumeration information definition can be expanded into a dynamic sensing combined prediction type enumeration mode, and the method is specifically:

And by combining NLP and time sequence analysis, extracting implicit enumeration values from the classified data such as historical data, equipment logs, service documents and the like (for example, the unlabeled anomaly of the equipment fault code can be automatically classified as a new enumeration item), and constructing semantic association of the enumeration values (for example, strong correlation of 'temperature sudden rise' and 'sensor fault') through a knowledge graph so as to realize real-time enumeration mining. Specifically, the text enumeration extraction process based on NLP comprises the steps of identifying a status type entity (such as voltage overload and communication interruption) of text data after classification by using a named entity identification model, merging similar entities (such as signal loss and communication interruption are classified into the same enumeration candidate value) by Word vector clustering (such as Word2 Vec+K-Means), detecting abnormal fluctuation (such as temperature rise by 20 ℃ by using an isolated forest or DBSCAN) of time sequence data (such as temperature value recorded every 5 minutes) of a measurement table, judging as continuous abnormality by combining a time window (such as abnormality lasts for more than 10 minutes), and generating a new enumeration value (such as temperature rise abnormality), and simultaneously automatically marking high-frequency repetition codes in a device log as high-frequency undefined enumeration.

Using predictive enumeration definition, new enumeration values (e.g., new states predicted by fluctuating features of seasonal metrology data) that may occur in the future are predicted based on a model of LSTM or the like, and field expansion space is reserved in advance in the columnar database, reducing the frequency of creating new tables when subsequent "inconsistencies". Specifically, a bidirectional LSTM (Bi-LSTM) +attribute mechanism is adopted, an input layer is an extracted feature, the extracted feature comprises a time feature, a fluctuation feature and a correlation feature, an output layer is a probability distribution of new enumeration values which can appear in the future for 1-3 months (such as probability 0.85 of occurrence of winter low-temperature protection in 12 months), a basic model is trained by using full historical data, a universal time sequence mode (such as annual seasonal fluctuation) is learned, and a prediction model is evaluated by using accuracy (the proportion of the actual occurrence of the predicted enumeration value in the future) and recall (the proportion of the actual occurrence of the new enumeration value predicted), and model retraining is triggered when the accuracy is smaller than a preset accurate threshold. For the high-probability new enumeration value, setting a dynamic field in the reserved field expansion space, wherein the field type can be set as a variable-length character string for being compatible with the new enumeration values in different formats, automatically renaming the reserved field when the new enumeration values are formally put into a library, and recording the field mapping relation to avoid the abnormal inquiry of the historical data. When the reserved field conflicts with an existing field, a field renaming rule is automatically triggered.

And (3) dynamically adjusting the association weights, namely updating the association weights of the enumeration information-cluster storage main bodies (such as the improvement of the association degree of a certain type of measurement table and the workshop A storage cluster along with the change of a production plan) in real time by reinforcement learning, preferentially matching the high-weight storage main bodies, and optimizing the writing efficiency.

In the process of enumeration information definition, an implicit enumeration candidate value is extracted through real-time enumeration mining and is used as a basic feature, the implicit enumeration candidate value is synchronized to an input layer in a prediction model, future enumeration probability distribution conditions generated by predictive enumeration predefining are used as prospective features, the key direction of real-time mining is fed back, and finally, cluster storage main body load features in association weight dynamic adjustment are used as constraint features, so that the storage priority of the enumeration value is limited. The application realizes feature sharing through data intercommunication, uses current value, future value and storage rule decision linkage to generate a complete enumeration information system, and continuously optimizes precision and efficiency through feedback closed loop. The enumeration information definition of the measurement table not only can realize static field labeling, but also can dynamically respond to service change and adapt to future requirements.

And setting a writing period and a classification capacity upper limit, and writing the measurement table into the column database every other writing period or oversubscription capacity reaching the classification capacity upper limit so as to improve the writing speed of the measurement data. In this embodiment, the writing period may be 20s, and the classification capacity upper limit may be 3000 pieces.

In addition, the program for writing the measurement table into the columnar database in real time can establish connection with the columnar database every 30 minutes so as to ensure that the data writing is operated normally.

It should be noted that, for the problems of full garbage collection and the like which may occur in the basic unit (region) corresponding to the column database, a task data retry mechanism is set, and after the equivalent measurement data writing program automatically captures an exception, the task data retry mechanism is executed. If the data still fails, the data is written into the high-performance key value database, and the Java background program reads the data written into the high-performance key value database at intervals and rewrites the data into the column database.

S03, carrying out data mapping according to the data information of the measurement table, transferring the data mapping result to an entity table, and carrying out consistency statistics on the measurement table and the entity table.

Monitoring the data information of the measuring table in the process of transferring and counting the data information of the measuring table;

Example two

In another aspect, please refer to fig. 2, which is a schematic structural diagram of a grid data integration system according to a second embodiment of the present invention, the grid data integration system includes:

A configuration module 11, configured to construct a basic environment, configure a cluster client, and establish a cluster storage body;

The importing module 12 is configured to build a columnar database and a task channel, obtain cluster storage subject information according to the task channel, preprocess measurement data, import the preprocessed measurement data into a measurement table in the columnar database, parse and classify the measurement table, and define enumeration information, where the enumeration information defining process includes real-time enumeration and mining, predictive enumeration definition and associated weight dynamic adjustment, extract an implicit enumeration candidate value through the real-time enumeration and mining as a basic feature, synchronize the implicit enumeration candidate value to an input layer of a predictive model in the predictive enumeration definition, use future enumeration probability distribution generated by predictive enumeration predefining as a prospective feature, reverse feed a real-time mining direction, use a cluster storage subject load feature in the associated weight dynamic adjustment as a constraint feature, and limit storage priority of the enumeration value;

And the transferring module 13 is used for performing data mapping according to the data information of the measurement table, transferring the data mapping result to the entity table, and performing consistency statistics on the measurement table and the entity table.

The uniqueness of the measurement data of the current measurement table is analyzed to follow the uniqueness and discreteness of the columnar database, and a unique identification of each row of measurement data in the measurement table is generated. In the process of generating the unique identifier of each line of measurement data, a 'ciphertext+plaintext' mode can be adopted, and plaintext design is designed by tracking and analyzing the measurement data of each measurement table and selecting a proper service field combination mode. The method of ciphertext and plaintext can ensure the uniqueness, discreteness and readability of the columnar database, avoid the problem of centralized resource access and improve the service usability.

And analyzing the measurement data in the measurement table through the processing frame, screening out abnormal data based on the business rule, and performing special processing, wherein the business time is data before 1 year, and the business time format is not right or null. And classifying the time period to which the normal data belongs, and merging the abnormal data into the same time period. All data are guaranteed to belong to a certain day, preparation is made for subsequent data merging and landing, and meanwhile the concurrent processing speed of the data is improved. More specifically, in the analysis process of the measurement table, a domain knowledge graph is introduced, the domain knowledge graph can be a business rule of an electric power system and intelligent manufacturing, analysis is carried out on measurement data of the measurement table through the domain knowledge graph, field logic of an association table of the current measurement table is checked through data blood-edge tracking, and the matching degree of the measurement data in the current measurement table and a column database is identified by using an isolated forest algorithm, so that abnormal data is prevented from being classified as normal data.

In summary, according to the grid data integration system provided by the invention, a big data basic environment is configured to support compiling of various development languages, a cluster client is configured, a cluster storage main body is established, a data storage duration and a partition size are created for the cluster storage main body to accommodate general quantity data, a built-up database and a task channel are built, data information of a measurement table of the cluster storage main body is obtained through the task channel, the measurement data are converged while the uniqueness of the measurement table is maintained, the converged measurement data are written into the measurement table in the column database by means of writing-in period and classifying capacity upper limit rules, the measurement table is analyzed, classified and enumerated information is defined, the enumeration information definition process comprises enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, hidden enumeration candidate values are extracted by means of real-time enumeration mining to serve as basic characteristics, the hidden enumeration candidate values are synchronized to an input layer of a predictive model in the predictive enumeration definition, the distribution situation of the predictive enumeration probability is taken as a forward-looking characteristic, the clustered dynamic enumeration load dynamic adjustment is taken as a pre-defining direction, the measurement table in the associated dynamic weight adjustment is mapped to the load dynamic measurement table, the measurement table is stored in a constraint map to the accuracy, and the data is mapped to the data of a constraint and the statistical entity is stored in a proper process.

Example III

In another aspect, the present invention also proposes a computer readable storage medium, on which one or more computer programs are stored, which when executed by a processor implement the grid data integration method described above.

Those of skill in the art will appreciate that the logic or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable storage medium would include an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer-readable storage medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

Example IV

Fig. 3 is a block diagram of an electronic device according to a fourth embodiment. The electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement the grid data integration method in the above embodiments. The electronic device 30 shown in fig. 3 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 3, the electronic device 30 may be embodied in the form of a general purpose computing device, which may be a server device, for example. The components of the electronic device 30 may include, but are not limited to, the at least one processor 31, the at least one memory 32, and a bus 33 that connects the various system components, including the memory 32 and the processor 31.

The bus 33 includes a data bus, an address bus, and a control bus.

Memory 32 may include volatile memory such as RAM321 (random access memory), and/or cache memory 322, and may further include ROM323 (read only memory).

Memory 32 may also include a program tool 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The processor 31 executes various functional applications and data processing, such as the grid data integration method of the present invention as described above, by running a computer program stored in the memory 32.

The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through an I/O interface 35 (input/output interface). Also, electronic device 30 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 36. As shown in fig. 3, network adapter 36 communicates with other modules of model-generated electronic device 30 via bus 33. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with the model-generating electronic device 30, including, but not limited to, microcode, device drivers, redundant processors, disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, among others.

It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A method for integrating power grid data, characterized in that the method includes:

Build the basic environment, configure the cluster client, and establish the cluster storage entity;

A columnar database and task channel are established, and cluster storage entity information is obtained based on the task channel. Measurement data is preprocessed and imported into a measurement table in the columnar database. The measurement table is then parsed, categorized, and its enumeration information is defined. The enumeration information definition process includes real-time enumeration mining, predictive enumeration definition, and dynamic adjustment of association weights. Combining NLP and time series analysis, unlabeled anomalies with enumeration attributes are extracted from the measurement table as basic features. Semantic associations of the enumeration values are constructed using a knowledge graph to achieve real-time enumeration. The implicit enumeration candidate value is extracted and synchronized to the input layer of the prediction model to extract the temporal, fluctuation, and correlation features of the implicit enumeration candidate value. Predictive enumeration is defined, and the future enumeration probability distribution is output to show the probability distribution of new enumeration values in future time periods. The future enumeration probability distribution is used as a forward-looking feature to indicate the new enumeration direction and feed back to the real-time enumeration mining direction. The cluster storage main load feature in the dynamic adjustment of correlation weight is used as a constraint feature to prioritize matching high-weight storage mains, optimize write efficiency, and limit the storage priority of enumeration values.

Data mapping is performed based on the data information in the measurement table, the data mapping results are transferred to the entity table, and consistency statistics are performed on the measurement table and the entity table.

2. The power grid data integration method according to claim 1, characterized in that the steps of constructing the basic environment, configuring the cluster client, and establishing the cluster storage entity include:

Configure a big data infrastructure environment, support compilation of Java and Scala development languages, and support deployment of development environment jobs according to cluster clients;

Based on the actual data volume of the measurement data and business rules, create the cluster storage entity, data storage duration, and partition size.

3. The power grid data integration method according to claim 1, characterized in that: a columnar database and task channel are established, and cluster storage entity information is obtained according to the task channel; measurement data is preprocessed, and the preprocessed measurement data is imported into the measurement table in the columnar database; the measurement table is parsed, classified, and enumerated information is defined; the enumeration information definition process includes real-time enumeration mining, predictive enumeration definition, and dynamic adjustment of association weights; combined with NLP and time series analysis, implicit enumeration candidate values with enumeration attributes that are not labeled as anomalies are extracted from the measurement table as basic features; and enumeration is constructed through a knowledge graph. The semantic association of values enables real-time enumeration mining, and the implicit enumeration candidate value is synchronized to the input layer of the prediction model to extract the temporal, fluctuation, and association features of the implicit enumeration candidate value. Predictive enumeration is defined, and the future enumeration probability distribution is output to show the probability distribution of new enumeration values appearing in future time periods. This future enumeration probability distribution is used as a forward-looking feature to indicate new enumeration directions, feeding back into the real-time enumeration mining direction. The cluster storage main load characteristics in the dynamic adjustment of association weights are used as constraint features to prioritize matching high-weight storage mains, optimize write efficiency, and limit the storage priority of enumeration values. The steps include:

Design the table name of the columnar database for the measurement information of any measurement table, set the partition size of the measurement information data parsing process and the table fields of the measurement table, and complete the automatic creation of the measurement table.

Based on the information collected by the cluster client and the business information of the measurement table, the uniqueness of the current measurement table is analyzed, and a unique identifier for each row of measurement data in the measurement table is generated.

4. The power grid data integration method according to claim 3, characterized in that, for the data information of the measuring meter, a series of parsers for the data parsing process are designed, and the communication path of the cluster storage entity is constructed;

Based on the data volume of the measurement table, checkpoints are constructed in the processing framework environment, and the cluster storage entity information in the measurement table is subscribed to in real time.

5. The power grid data integration method according to claim 4, characterized in that: the measurement data in the measurement table is parsed through a processing framework, a domain knowledge graph is introduced, the measurement data in the measurement table is parsed through the domain knowledge graph, the field logic of the related tables of the current measurement table is verified by data lineage tracing, and the degree of matching between the measurement data in the current measurement table and the columnar database is identified by the isolated forest algorithm, normal data and abnormal data are filtered out, the time period to which the normal data belongs is classified, and the abnormal data is merged into the same time period.

The enumeration information of the categorized measurement tables is defined, and the information is written into the columnar database of the corresponding table fields based on the association between the enumeration information and the cluster storage entity. The table creation rules of the measurement tables are analyzed, and the measurement tables written into the columnar database and the table fields of the corresponding columnar database are verified. If the content of the measurement table is consistent with the table fields, the writing is successful; otherwise, a columnar database matching the measurement table is automatically created.

Set the write cycle and the classification capacity limit. Write the measurement table to the columnar database every write cycle or when the classification capacity reaches the classification capacity limit.

6. The power grid data integration method according to claim 1, characterized in that the steps of performing data mapping based on the data information of the measurement table, transferring the data mapping result to the entity table, and performing consistency statistics on the measurement table and the entity table include:

Based on the data information of the measurement table and according to the custom rules, a uniquely identified entity table is generated, which covers the maximum data range that appears in the columnar database.

The big data processing engine reads the data content of the measurement table from different table fields in the columnar database and uses custom rules to transfer the data content of the measurement table to the entity table with a unique identifier.

Consistency statistics are performed on the data content of the measurement table and the data content of the uniquely identified entity table.

7. The power grid data integration method according to claim 6, characterized in that, during the process of transferring and statistically analyzing the data information of the measuring meters, the data information of the measuring meters is monitored;

At a minimum, the cluster storage entity information and data service information of the measurement table should be collected. The number of records in the measurement table should be counted in real time through the processing framework for real-time data monitoring. At the same time, the time period for writing the data information of the measurement table into the columnar database should be captured to monitor the timeliness of the data.

8. A power grid data integration system, characterized in that the power grid data integration system is used to implement the power grid data integration method according to any one of claims 1-7, the system comprising:

The configuration module is used to build the basic environment, configure cluster clients, and establish the cluster storage entity.

The import module is used to build a columnar database and task channels, and obtain cluster storage entity information based on the task channels. It preprocesses the measurement data and imports the preprocessed measurement data into the measurement table in the columnar database. The measurement table is then parsed, categorized, and its enumeration information is defined. The enumeration information definition process includes real-time enumeration mining, predictive enumeration definition, and dynamic adjustment of association weights. Combining NLP and time series analysis, it extracts unlabeled anomalies with enumeration attributes as implicit enumeration candidate values from the measurement table as basic features, and constructs semantic associations for the enumeration values using a knowledge graph. Real-time enumeration mining is performed, and the implicit enumeration candidate value is synchronized to the input layer of the prediction model to extract the time features, fluctuation features, and correlation features of the implicit enumeration candidate value. Predictive enumeration definition is performed, and the future enumeration probability distribution is output to show the probability distribution of new enumeration values appearing in future time periods. The future enumeration probability distribution is used as a forward-looking feature to indicate the new enumeration direction and feed back to the real-time enumeration mining direction. The cluster storage main load feature in the dynamic adjustment of correlation weight is used as a constraint feature to prioritize matching high-weight storage mains, optimize write efficiency, and limit the storage priority of enumeration values.

The transfer module is used to perform data mapping based on the data information of the measurement table, transfer the data mapping result to the entity table, and perform consistency statistics on the measurement table and the entity table.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, when executed by a processor, the program implements the power grid data integration method as described in any one of claims 1-7.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that the processor executes the computer program to implement the power grid data integration method as described in any one of claims 1-7.