[go: up one dir, main page]

CN120578708B - A method and system for power grid data integration - Google Patents

A method and system for power grid data integration

Info

Publication number
CN120578708B
CN120578708B CN202511086416.4A CN202511086416A CN120578708B CN 120578708 B CN120578708 B CN 120578708B CN 202511086416 A CN202511086416 A CN 202511086416A CN 120578708 B CN120578708 B CN 120578708B
Authority
CN
China
Prior art keywords
data
enumeration
measurement
measurement table
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202511086416.4A
Other languages
Chinese (zh)
Other versions
CN120578708A (en
Inventor
于仕
李申逸
母祥才
江小辉
李敏弘
刘帆
王祉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Kechen Hongxing Information Technology Co ltd
Original Assignee
Jiangxi Kechen Hongxing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Kechen Hongxing Information Technology Co ltd filed Critical Jiangxi Kechen Hongxing Information Technology Co ltd
Priority to CN202511086416.4A priority Critical patent/CN120578708B/en
Publication of CN120578708A publication Critical patent/CN120578708A/en
Application granted granted Critical
Publication of CN120578708B publication Critical patent/CN120578708B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a power grid data integration method and system, wherein the method comprises the steps of constructing a basic environment, configuring a cluster client and establishing a cluster storage main body, constructing a column-type database and a task channel, acquiring cluster storage main body information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the column-type database, carrying out data mapping according to the data information of the measurement table, and carrying out transfer and statistics according to a data mapping result. The invention can compare the general quantity with the high-frequency measured data after gathering, so as to ensure the uniqueness of the measured data, and rapidly locate by giving a unique mark, and simultaneously set the writing period and the upper limit of the classifying capacity, and write the measured table into the column database every other writing period or the oversubscription capacity reaches the upper limit of the classifying capacity, thereby ensuring the timeliness of the writing of the measured data and improving the writing efficiency of the measured data.

Description

Power grid data integration method and system
Technical Field
The invention relates to the field of data processing, in particular to a power grid data integration method and system.
Background
The power grid measurement data mainly refer to real-time/near real-time operation data (such as voltage, current, power and frequency), measurement data (electric energy consumption and load curve) and equipment state data collected by equipment such as a sensor, a smart meter, an SCADA system, a PMU (synchronous phasor measurement unit) and the like.
At present, in the process of integrating power grid measurement, data of each measurement system, such as a measurement automation system and an electric energy collection system, are regularly extracted through an ETL tool, are stored in a relational database after being cleaned and converted, so that a historical measurement data center is formed, and report generation and offline modeling are facilitated.
However, in the conventional ETL batch mode, the delay from acquisition to application of the metrology data is up to 5-10 minutes, and the data writing speed is slow.
Disclosure of Invention
Based on the above, the invention aims to provide a power grid data integration method and system, which aim to solve the problems that the delay from acquisition to application of measurement data reaches 5-10 minutes and the data writing speed is slower in the traditional ETL batch processing mode.
In order to achieve the above object, the present invention provides a method for integrating grid data, which includes:
constructing a basic environment, configuring a cluster client and establishing a cluster storage main body;
Building a column database and a task channel, acquiring cluster storage main body information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the column database, analyzing, classifying and enumerating information definition on the measurement table, wherein the enumerating information definition process comprises real-time enumeration and mining, predictive enumeration definition and associated weight dynamic adjustment, extracting implicit enumeration candidate values as basic characteristics through the real-time enumeration and mining, synchronizing the implicit enumeration candidate values to an input layer of a predictive model in the predictive enumeration definition, taking future enumeration probability distribution conditions generated by predictive enumeration predefining as prospective characteristics, reversing the real-time mining direction, taking cluster storage main body load characteristics in the associated weight dynamic adjustment as constraint characteristics, and limiting the storage priority of the enumeration values;
And carrying out data mapping according to the data information of the measurement table, transferring the data mapping result to an entity table, and carrying out consistency statistics on the measurement table and the entity table.
According to an aspect of the foregoing technical solution, the steps of constructing a basic environment, configuring a cluster client, and establishing a cluster storage body include:
configuring a big data basic environment, supporting Java scala development language compiling, and supporting development environment operation deployment according to a cluster client;
The cluster storage body, the data storage duration and the partition size are created based on the size of the actual data volume of the metrology data and the business rules.
According to an aspect of the above technical solution, the building a columnar database and a task channel, obtaining cluster storage subject information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the columnar database, analyzing and classifying the measurement table, and defining enumeration information, where the enumeration information defining process includes real-time enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, extracting implicit enumeration candidate values through real-time enumeration mining as basic features, synchronizing the implicit enumeration candidate values to an input layer of a predictive model in the predictive enumeration definition, taking future enumeration probability distribution generated by predictive enumeration predefining as prospective features, reflecting real-time mining directions, taking cluster storage subject load features in the associated weight dynamic adjustment as constraint features, and limiting storage priority of the enumeration values, where the steps include:
designing a table name of a columnar database aiming at measurement information of any one measurement table, setting the size of a partition in the measurement information data analysis process and the table field of the measurement table, and completing automatic creation of the measurement table;
Based on the collection information of the cluster client and the service information of the measurement table, the uniqueness of the measurement table is analyzed, and the unique identification of each row of measurement data in the measurement table is generated.
According to one aspect of the above technical solution, a serialization parser for a data parsing process is designed for data information of a measurement table, and a communication path of a cluster storage main body is constructed;
And constructing check points in the processing frame environment according to the data magnitude of the measurement table, and subscribing the cluster storage main body information in the measurement table in real time.
According to one aspect of the technical scheme, the measurement data in the measurement table is analyzed through the processing framework, the domain knowledge graph is introduced, the measurement data of the measurement table is analyzed through the domain knowledge graph, the field logic of the association table of the current measurement table is checked by utilizing the data blood-margin tracking, the matching degree of the measurement data in the current measurement table and the column database is identified by using the isolated forest algorithm, the normal data and the abnormal data are screened out, the time period to which the normal data belong is classified, and the abnormal data are merged into the same time period;
Performing enumeration information definition on the classified measurement table, writing the enumeration information into a column database of a belonging table field according to the association of the enumeration information and a cluster storage main body, analyzing a table construction rule of the measurement table, checking the measurement table written into the column database and a table field of the belonging column database, if the content of the measurement table is consistent with the table field, writing successfully, and if the content of the measurement table is inconsistent with the table field, automatically creating a column database matched with the measurement table;
Setting a writing period and a classifying capacity upper limit, and writing the measuring table into the column database every other writing period or oversubscription capacity reaching the classifying capacity upper limit.
According to an aspect of the foregoing technical solution, the step of performing data mapping according to the data information of the measurement table, transferring the data mapping result to an entity table, and performing consistency statistics on the measurement table and the entity table includes:
Based on the data information of the measurement table, generating an entity table with unique identification according to a custom rule, wherein the entity table with unique identification covers the maximum data range of the column database;
Reading the data content of the measuring table of different table fields in the column database and the custom rule by a big data processing engine so as to transfer the data content of the measuring table to the entity table with the unique identifier;
And carrying out consistency statistics on the data content of the measuring table and the data content in the entity table with the unique identification.
According to one aspect of the above technical solution, during the process of transferring and counting the data information of the measurement table, the data information of the measurement table is monitored;
At least collecting cluster storage main body information and data service information of the measuring table, counting the record number of the measuring table in real time through a processing frame to conduct data real-time monitoring, and simultaneously grabbing a time period when the data information of the measuring table is written into a column database to monitor timeliness of the data.
The invention also provides a power grid data integration system for realizing the power grid data integration method, which comprises the following steps:
The configuration module is used for constructing a basic environment, configuring a cluster client and establishing a cluster storage main body;
The system comprises an import module, a prediction type enumeration definition and an associated weight dynamic adjustment module, wherein the import module is used for constructing a column type database and a task channel, acquiring cluster storage main body information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the column type database, analyzing and classifying the measurement table and defining enumeration information, the enumeration information definition process comprises real-time enumeration and mining, the prediction type enumeration definition and the associated weight dynamic adjustment, the implicit enumeration candidate value is extracted through the real-time enumeration and mining and is used as basic characteristics, the implicit enumeration candidate value is synchronized to an input layer of a prediction model in the prediction type enumeration definition, future enumeration probability distribution generated by the prediction type enumeration predefining is used as prospective characteristics, the real-time mining direction is reversed, the cluster storage main body load characteristics in the associated weight dynamic adjustment are used as constraint characteristics, and the storage priority of the enumeration value is limited;
and the transferring module is used for carrying out data mapping according to the data information of the measuring table, transferring the data mapping result to the entity table, and carrying out consistency statistics on the measuring table and the entity table.
The invention also proposes a computer readable storage medium on which a computer program is stored which, when executed by a processor, implements a grid data integration method as described above.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor realizes the power grid data integration method when executing the computer program.
In summary, according to the grid data integration method provided by the invention, a big data basic environment is configured to support compiling of various development languages, a cluster client is configured, a cluster storage main body is established, a data storage duration and a partition size are created for the cluster storage main body to accommodate general quantity data, a built-up database and a task channel are built, data information of a measurement table of the cluster storage main body is obtained through the task channel, the measurement data are converged while the uniqueness of the measurement table is maintained, the converged measurement data are written into the measurement table in the column database by means of writing-in period and classifying capacity upper limit rules, the measurement table is analyzed, classified and enumerated information is defined, the enumeration information definition process comprises enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, hidden enumeration candidate values are extracted by means of real-time enumeration mining to serve as basic characteristics, the hidden enumeration candidate values are synchronized to an input layer of a predictive model in the predictive enumeration definition, the distribution situation of the predictive enumeration probability is taken as a forward-looking characteristic, the clustered dynamic enumeration load dynamic adjustment is taken as a pre-defining direction, the measurement table in the associated dynamic weight adjustment is mapped to the load dynamic measurement table, the measurement table is stored in a constraint map to the accuracy, and the data is mapped to the data of a constraint and the statistical entity is stored in a proper process.
The invention realizes feature sharing through data intercommunication, uses current value, future value and storage rule decision linkage to generate a complete enumeration information system, and continuously optimizes precision and efficiency through feedback closed loop. The enumeration information definition of the measurement table not only can realize static field marking, but also can dynamically respond to service change and adapt to future demands, and meanwhile, the general quantity and the high-frequency measurement data can be subjected to the collected chronology difference comparison so as to ensure the uniqueness of the measurement data, and the unique identification is given to enable the measurement data to be positioned quickly, meanwhile, the writing period and the classification capacity upper limit are set, the classification capacity upper limit is reached every other writing period or the oversubstance, the measurement table is written into the column database, the timeliness of the writing of the measurement data is ensured, and the writing efficiency of the measurement data is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flowchart of a method for integrating grid data according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a grid data integration system according to a second embodiment of the present invention;
Fig. 3 is a block diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
It will be understood that when an element is referred to as being "mounted" on another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like are used herein for illustrative purposes only.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Example 1
Fig. 1 is a flowchart of a power grid data integration method according to a first embodiment of the present invention, wherein the power grid data integration method includes steps S01-S03, in which:
s01, constructing a basic environment, configuring a cluster client and establishing a cluster storage main body.
The build base environment includes a big data base environment configuration, a cluster client configuration, and a cluster storage body configuration. The big data base environment basic configuration comprises supporting Java, scala development language compiling, supporting MRS-HBASE, MRS-KAFKA, MRS-HDFS, MRS-YARN, MRS-KLINK, DWS and other services, and the like, wherein access must be dependent on configuration of packages and the like, supporting development environment operation deployment and debugging application according to an MRS cluster client, and creating a cluster storage main body, data storage duration and partition size according to the size of the actual data volume of measured data and business rules.
S02, building a column database and a task channel, acquiring cluster storage main body information according to the task channel, preprocessing measurement data, importing the preprocessed measurement data into a measurement table in the column database, analyzing and classifying the measurement table and defining enumeration information, wherein the enumeration information defining process comprises real-time enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, extracting hidden enumeration candidate values through the real-time enumeration mining as basic characteristics, synchronizing the hidden enumeration candidate values to an input layer of a predictive model in the predictive enumeration definition, taking future enumeration probability distribution generated by predictive enumeration predefining as prospective characteristics, reversing the real-time mining direction, taking cluster storage main body load characteristics in the associated weight dynamic adjustment as constraint characteristics, and limiting the storage priority of the enumeration values.
And (3) building a list database, designing the table name of the list database according to the measurement information on each measurement table, setting the size of the measurement information data analysis process partition and the table field of the measurement table, and completing automatic measurement table creation.
The method comprises the steps that a cluster storage main body is collected to obtain information of a corresponding cluster client, and service information of a measuring table of the corresponding cluster storage main body is combined.
The uniqueness of the measurement data of the current measurement table is analyzed to follow the uniqueness and discreteness of the columnar database, and a unique identification of each row of measurement data in the measurement table is generated. In the process of generating the unique identifier of each line of measurement data, a 'ciphertext+plaintext' mode can be adopted, and plaintext design is designed by tracking and analyzing the measurement data of each measurement table and selecting a proper service field combination mode. The method of ciphertext and plaintext can ensure the uniqueness, discreteness and readability of the columnar database, avoid the problem of hot spots and improve the service usability.
Aiming at the data information of the measurement table, a serialization analyzer which accords with the actual service condition in the data analysis process is designed, so that when the data processing is carried out subsequently, the data deserialization and serialization time is reduced, and the data processing rate is improved.
And constructing a communication path of the cluster storage main body, constructing a check point in the processing frame environment according to the data magnitude of the measurement table, periodically checking the state of the measurement data of the measurement table in the column database according to the check point, and subscribing the cluster storage main body information in the measurement table in real time.
And analyzing the measurement data in the measurement table through the processing frame, screening out abnormal data based on the business rule, and performing special processing, wherein the business time is data before 1 year, and the business time format is not right or null. And classifying the time period to which the normal data belongs, merging the abnormal data in the same time period to ensure that all the data belongs to a certain day, preparing for merging and landing of the subsequent data, and improving the concurrent processing speed of the data. More specifically, in the analysis process of the measurement table, a domain knowledge graph is introduced, the domain knowledge graph can be a business rule of an electric power system and intelligent manufacturing, analysis is carried out on measurement data of the measurement table through the domain knowledge graph, field logic of an association table of the current measurement table is checked through data blood-edge tracking, and the matching degree of the measurement data in the current measurement table and a column database is identified by using an isolated forest algorithm, so that abnormal data is prevented from being classified as normal data.
And performing enumeration information definition on the classified measurement table, writing the enumeration information into a column database of a belonging table field according to the association of the enumeration information and a cluster storage main body, analyzing a table construction rule of the measurement table, checking the measurement table written into the column database and a table field of the belonging column database, if the content of the measurement table is consistent with the table field, writing successfully, and if the content of the measurement table is inconsistent with the table field, automatically creating a column database matched with the measurement table.
In the process of performing enumeration information definition on the measurement table, the enumeration information definition can be expanded into a dynamic sensing combined prediction type enumeration mode, and the method is specifically:
And by combining NLP and time sequence analysis, extracting implicit enumeration values from the classified data such as historical data, equipment logs, service documents and the like (for example, the unlabeled anomaly of the equipment fault code can be automatically classified as a new enumeration item), and constructing semantic association of the enumeration values (for example, strong correlation of 'temperature sudden rise' and 'sensor fault') through a knowledge graph so as to realize real-time enumeration mining. Specifically, the text enumeration extraction process based on NLP comprises the steps of identifying a status type entity (such as voltage overload and communication interruption) of text data after classification by using a named entity identification model, merging similar entities (such as signal loss and communication interruption are classified into the same enumeration candidate value) by Word vector clustering (such as Word2 Vec+K-Means), detecting abnormal fluctuation (such as temperature rise by 20 ℃ by using an isolated forest or DBSCAN) of time sequence data (such as temperature value recorded every 5 minutes) of a measurement table, judging as continuous abnormality by combining a time window (such as abnormality lasts for more than 10 minutes), and generating a new enumeration value (such as temperature rise abnormality), and simultaneously automatically marking high-frequency repetition codes in a device log as high-frequency undefined enumeration.
Using predictive enumeration definition, new enumeration values (e.g., new states predicted by fluctuating features of seasonal metrology data) that may occur in the future are predicted based on a model of LSTM or the like, and field expansion space is reserved in advance in the columnar database, reducing the frequency of creating new tables when subsequent "inconsistencies". Specifically, a bidirectional LSTM (Bi-LSTM) +attribute mechanism is adopted, an input layer is an extracted feature, the extracted feature comprises a time feature, a fluctuation feature and a correlation feature, an output layer is a probability distribution of new enumeration values which can appear in the future for 1-3 months (such as probability 0.85 of occurrence of winter low-temperature protection in 12 months), a basic model is trained by using full historical data, a universal time sequence mode (such as annual seasonal fluctuation) is learned, and a prediction model is evaluated by using accuracy (the proportion of the actual occurrence of the predicted enumeration value in the future) and recall (the proportion of the actual occurrence of the new enumeration value predicted), and model retraining is triggered when the accuracy is smaller than a preset accurate threshold. For the high-probability new enumeration value, setting a dynamic field in the reserved field expansion space, wherein the field type can be set as a variable-length character string for being compatible with the new enumeration values in different formats, automatically renaming the reserved field when the new enumeration values are formally put into a library, and recording the field mapping relation to avoid the abnormal inquiry of the historical data. When the reserved field conflicts with an existing field, a field renaming rule is automatically triggered.
And (3) dynamically adjusting the association weights, namely updating the association weights of the enumeration information-cluster storage main bodies (such as the improvement of the association degree of a certain type of measurement table and the workshop A storage cluster along with the change of a production plan) in real time by reinforcement learning, preferentially matching the high-weight storage main bodies, and optimizing the writing efficiency.
In the process of enumeration information definition, an implicit enumeration candidate value is extracted through real-time enumeration mining and is used as a basic feature, the implicit enumeration candidate value is synchronized to an input layer in a prediction model, future enumeration probability distribution conditions generated by predictive enumeration predefining are used as prospective features, the key direction of real-time mining is fed back, and finally, cluster storage main body load features in association weight dynamic adjustment are used as constraint features, so that the storage priority of the enumeration value is limited. The application realizes feature sharing through data intercommunication, uses current value, future value and storage rule decision linkage to generate a complete enumeration information system, and continuously optimizes precision and efficiency through feedback closed loop. The enumeration information definition of the measurement table not only can realize static field labeling, but also can dynamically respond to service change and adapt to future requirements.
And setting a writing period and a classification capacity upper limit, and writing the measurement table into the column database every other writing period or oversubscription capacity reaching the classification capacity upper limit so as to improve the writing speed of the measurement data. In this embodiment, the writing period may be 20s, and the classification capacity upper limit may be 3000 pieces.
In addition, the program for writing the measurement table into the columnar database in real time can establish connection with the columnar database every 30 minutes so as to ensure that the data writing is operated normally.
It should be noted that, for the problems of full garbage collection and the like which may occur in the basic unit (region) corresponding to the column database, a task data retry mechanism is set, and after the equivalent measurement data writing program automatically captures an exception, the task data retry mechanism is executed. If the data still fails, the data is written into the high-performance key value database, and the Java background program reads the data written into the high-performance key value database at intervals and rewrites the data into the column database.
S03, carrying out data mapping according to the data information of the measurement table, transferring the data mapping result to an entity table, and carrying out consistency statistics on the measurement table and the entity table.
Based on the data information of the measurement table, generating an entity table with unique identification according to a custom rule, wherein the entity table with unique identification covers the maximum data range of the column database;
Reading the data content of the measuring table of different table fields in the column database and the custom rule by a big data processing engine so as to transfer the data content of the measuring table to the entity table with the unique identifier;
And carrying out consistency statistics on the data content of the measuring table and the data content in the entity table with the unique identification.
Monitoring the data information of the measuring table in the process of transferring and counting the data information of the measuring table;
At least collecting cluster storage main body information and data service information of the measuring table, counting the record number of the measuring table in real time through a processing frame to conduct data real-time monitoring, and simultaneously grabbing a time period when the data information of the measuring table is written into a column database to monitor timeliness of the data.
In summary, according to the grid data integration method provided by the invention, a big data basic environment is configured to support compiling of various development languages, a cluster client is configured, a cluster storage main body is established, a data storage duration and a partition size are created for the cluster storage main body to accommodate general quantity data, a built-up database and a task channel are built, data information of a measurement table of the cluster storage main body is obtained through the task channel, the measurement data are converged while the uniqueness of the measurement table is maintained, the converged measurement data are written into the measurement table in the column database by means of writing-in period and classifying capacity upper limit rules, the measurement table is analyzed, classified and enumerated information is defined, the enumeration information definition process comprises enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, hidden enumeration candidate values are extracted by means of real-time enumeration mining to serve as basic characteristics, the hidden enumeration candidate values are synchronized to an input layer of a predictive model in the predictive enumeration definition, the distribution situation of the predictive enumeration probability is taken as a forward-looking characteristic, the clustered dynamic enumeration load dynamic adjustment is taken as a pre-defining direction, the measurement table in the associated dynamic weight adjustment is mapped to the load dynamic measurement table, the measurement table is stored in a constraint map to the accuracy, and the data is mapped to the data of a constraint and the statistical entity is stored in a proper process.
The invention realizes feature sharing through data intercommunication, uses current value, future value and storage rule decision linkage to generate a complete enumeration information system, and continuously optimizes precision and efficiency through feedback closed loop. The enumeration information definition of the measurement table not only can realize static field marking, but also can dynamically respond to service change and adapt to future demands, and meanwhile, the general quantity and the high-frequency measurement data can be subjected to the collected chronology difference comparison so as to ensure the uniqueness of the measurement data, and the unique identification is given to enable the measurement data to be positioned quickly, meanwhile, the writing period and the classification capacity upper limit are set, the classification capacity upper limit is reached every other writing period or the oversubstance, the measurement table is written into the column database, the timeliness of the writing of the measurement data is ensured, and the writing efficiency of the measurement data is improved.
Example two
In another aspect, please refer to fig. 2, which is a schematic structural diagram of a grid data integration system according to a second embodiment of the present invention, the grid data integration system includes:
A configuration module 11, configured to construct a basic environment, configure a cluster client, and establish a cluster storage body;
The importing module 12 is configured to build a columnar database and a task channel, obtain cluster storage subject information according to the task channel, preprocess measurement data, import the preprocessed measurement data into a measurement table in the columnar database, parse and classify the measurement table, and define enumeration information, where the enumeration information defining process includes real-time enumeration and mining, predictive enumeration definition and associated weight dynamic adjustment, extract an implicit enumeration candidate value through the real-time enumeration and mining as a basic feature, synchronize the implicit enumeration candidate value to an input layer of a predictive model in the predictive enumeration definition, use future enumeration probability distribution generated by predictive enumeration predefining as a prospective feature, reverse feed a real-time mining direction, use a cluster storage subject load feature in the associated weight dynamic adjustment as a constraint feature, and limit storage priority of the enumeration value;
And the transferring module 13 is used for performing data mapping according to the data information of the measurement table, transferring the data mapping result to the entity table, and performing consistency statistics on the measurement table and the entity table.
The build base environment includes a big data base environment configuration, a cluster client configuration, and a cluster storage body configuration. The big data base environment basic configuration comprises supporting Java, scala development language compiling, supporting MRS-HBASE, MRS-KAFKA, MRS-HDFS, MRS-YARN, MRS-KLINK, DWS and other services, and the like, wherein access must be dependent on configuration of packages and the like, supporting development environment operation deployment and debugging application according to an MRS cluster client, and creating a cluster storage main body, data storage duration and partition size according to the size of the actual data volume of measured data and business rules.
And (3) building a list database, designing the table name of the list database according to the measurement information on each measurement table, setting the size of the measurement information data analysis process partition and the table field of the measurement table, and completing automatic measurement table creation.
The method comprises the steps that a cluster storage main body is collected to obtain information of a corresponding cluster client, and service information of a measuring table of the corresponding cluster storage main body is combined.
The uniqueness of the measurement data of the current measurement table is analyzed to follow the uniqueness and discreteness of the columnar database, and a unique identification of each row of measurement data in the measurement table is generated. In the process of generating the unique identifier of each line of measurement data, a 'ciphertext+plaintext' mode can be adopted, and plaintext design is designed by tracking and analyzing the measurement data of each measurement table and selecting a proper service field combination mode. The method of ciphertext and plaintext can ensure the uniqueness, discreteness and readability of the columnar database, avoid the problem of centralized resource access and improve the service usability.
Aiming at the data information of the measurement table, a serialization analyzer which accords with the actual service condition in the data analysis process is designed, so that when the data processing is carried out subsequently, the data deserialization and serialization time is reduced, and the data processing rate is improved.
And constructing a communication path of the cluster storage main body, constructing a check point in the processing frame environment according to the data magnitude of the measurement table, periodically checking the state of the measurement data of the measurement table in the column database according to the check point, and subscribing the cluster storage main body information in the measurement table in real time.
And analyzing the measurement data in the measurement table through the processing frame, screening out abnormal data based on the business rule, and performing special processing, wherein the business time is data before 1 year, and the business time format is not right or null. And classifying the time period to which the normal data belongs, and merging the abnormal data into the same time period. All data are guaranteed to belong to a certain day, preparation is made for subsequent data merging and landing, and meanwhile the concurrent processing speed of the data is improved. More specifically, in the analysis process of the measurement table, a domain knowledge graph is introduced, the domain knowledge graph can be a business rule of an electric power system and intelligent manufacturing, analysis is carried out on measurement data of the measurement table through the domain knowledge graph, field logic of an association table of the current measurement table is checked through data blood-edge tracking, and the matching degree of the measurement data in the current measurement table and a column database is identified by using an isolated forest algorithm, so that abnormal data is prevented from being classified as normal data.
And performing enumeration information definition on the classified measurement table, writing the enumeration information into a column database of a belonging table field according to the association of the enumeration information and a cluster storage main body, analyzing a table construction rule of the measurement table, checking the measurement table written into the column database and a table field of the belonging column database, if the content of the measurement table is consistent with the table field, writing successfully, and if the content of the measurement table is inconsistent with the table field, automatically creating a column database matched with the measurement table.
In the process of performing enumeration information definition on the measurement table, the enumeration information definition can be expanded into a dynamic sensing combined prediction type enumeration mode, and the method is specifically:
And by combining NLP and time sequence analysis, extracting implicit enumeration values from the classified data such as historical data, equipment logs, service documents and the like (for example, the unlabeled anomaly of the equipment fault code can be automatically classified as a new enumeration item), and constructing semantic association of the enumeration values (for example, strong correlation of 'temperature sudden rise' and 'sensor fault') through a knowledge graph so as to realize real-time enumeration mining. Specifically, the text enumeration extraction process based on NLP comprises the steps of identifying a status type entity (such as voltage overload and communication interruption) of text data after classification by using a named entity identification model, merging similar entities (such as signal loss and communication interruption are classified into the same enumeration candidate value) by Word vector clustering (such as Word2 Vec+K-Means), detecting abnormal fluctuation (such as temperature rise by 20 ℃ by using an isolated forest or DBSCAN) of time sequence data (such as temperature value recorded every 5 minutes) of a measurement table, judging as continuous abnormality by combining a time window (such as abnormality lasts for more than 10 minutes), and generating a new enumeration value (such as temperature rise abnormality), and simultaneously automatically marking high-frequency repetition codes in a device log as high-frequency undefined enumeration.
Using predictive enumeration definition, new enumeration values (e.g., new states predicted by fluctuating features of seasonal metrology data) that may occur in the future are predicted based on a model of LSTM or the like, and field expansion space is reserved in advance in the columnar database, reducing the frequency of creating new tables when subsequent "inconsistencies". Specifically, a bidirectional LSTM (Bi-LSTM) +attribute mechanism is adopted, an input layer is an extracted feature, the extracted feature comprises a time feature, a fluctuation feature and a correlation feature, an output layer is a probability distribution of new enumeration values which can appear in the future for 1-3 months (such as probability 0.85 of occurrence of winter low-temperature protection in 12 months), a basic model is trained by using full historical data, a universal time sequence mode (such as annual seasonal fluctuation) is learned, and a prediction model is evaluated by using accuracy (the proportion of the actual occurrence of the predicted enumeration value in the future) and recall (the proportion of the actual occurrence of the new enumeration value predicted), and model retraining is triggered when the accuracy is smaller than a preset accurate threshold. For the high-probability new enumeration value, setting a dynamic field in the reserved field expansion space, wherein the field type can be set as a variable-length character string for being compatible with the new enumeration values in different formats, automatically renaming the reserved field when the new enumeration values are formally put into a library, and recording the field mapping relation to avoid the abnormal inquiry of the historical data. When the reserved field conflicts with an existing field, a field renaming rule is automatically triggered.
And (3) dynamically adjusting the association weights, namely updating the association weights of the enumeration information-cluster storage main bodies (such as the improvement of the association degree of a certain type of measurement table and the workshop A storage cluster along with the change of a production plan) in real time by reinforcement learning, preferentially matching the high-weight storage main bodies, and optimizing the writing efficiency.
In the process of enumeration information definition, an implicit enumeration candidate value is extracted through real-time enumeration mining and is used as a basic feature, the implicit enumeration candidate value is synchronized to an input layer in a prediction model, future enumeration probability distribution conditions generated by predictive enumeration predefining are used as prospective features, the key direction of real-time mining is fed back, and finally, cluster storage main body load features in association weight dynamic adjustment are used as constraint features, so that the storage priority of the enumeration value is limited. The application realizes feature sharing through data intercommunication, uses current value, future value and storage rule decision linkage to generate a complete enumeration information system, and continuously optimizes precision and efficiency through feedback closed loop. The enumeration information definition of the measurement table not only can realize static field labeling, but also can dynamically respond to service change and adapt to future requirements.
And setting a writing period and a classification capacity upper limit, and writing the measurement table into the column database every other writing period or oversubscription capacity reaching the classification capacity upper limit so as to improve the writing speed of the measurement data. In this embodiment, the writing period may be 20s, and the classification capacity upper limit may be 3000 pieces.
In addition, the program for writing the measurement table into the columnar database in real time can establish connection with the columnar database every 30 minutes so as to ensure that the data writing is operated normally.
It should be noted that, for the problems of full garbage collection and the like which may occur in the basic unit (region) corresponding to the column database, a task data retry mechanism is set, and after the equivalent measurement data writing program automatically captures an exception, the task data retry mechanism is executed. If the data still fails, the data is written into the high-performance key value database, and the Java background program reads the data written into the high-performance key value database at intervals and rewrites the data into the column database.
Based on the data information of the measurement table, generating an entity table with unique identification according to a custom rule, wherein the entity table with unique identification covers the maximum data range of the column database;
Reading the data content of the measuring table of different table fields in the column database and the custom rule by a big data processing engine so as to transfer the data content of the measuring table to the entity table with the unique identifier;
And carrying out consistency statistics on the data content of the measuring table and the data content in the entity table with the unique identification.
Monitoring the data information of the measuring table in the process of transferring and counting the data information of the measuring table;
At least collecting cluster storage main body information and data service information of the measuring table, counting the record number of the measuring table in real time through a processing frame to conduct data real-time monitoring, and simultaneously grabbing a time period when the data information of the measuring table is written into a column database to monitor timeliness of the data.
In summary, according to the grid data integration system provided by the invention, a big data basic environment is configured to support compiling of various development languages, a cluster client is configured, a cluster storage main body is established, a data storage duration and a partition size are created for the cluster storage main body to accommodate general quantity data, a built-up database and a task channel are built, data information of a measurement table of the cluster storage main body is obtained through the task channel, the measurement data are converged while the uniqueness of the measurement table is maintained, the converged measurement data are written into the measurement table in the column database by means of writing-in period and classifying capacity upper limit rules, the measurement table is analyzed, classified and enumerated information is defined, the enumeration information definition process comprises enumeration mining, predictive enumeration definition and associated weight dynamic adjustment, hidden enumeration candidate values are extracted by means of real-time enumeration mining to serve as basic characteristics, the hidden enumeration candidate values are synchronized to an input layer of a predictive model in the predictive enumeration definition, the distribution situation of the predictive enumeration probability is taken as a forward-looking characteristic, the clustered dynamic enumeration load dynamic adjustment is taken as a pre-defining direction, the measurement table in the associated dynamic weight adjustment is mapped to the load dynamic measurement table, the measurement table is stored in a constraint map to the accuracy, and the data is mapped to the data of a constraint and the statistical entity is stored in a proper process.
The invention realizes feature sharing through data intercommunication, uses current value, future value and storage rule decision linkage to generate a complete enumeration information system, and continuously optimizes precision and efficiency through feedback closed loop. The enumeration information definition of the measurement table not only can realize static field marking, but also can dynamically respond to service change and adapt to future demands, and meanwhile, the general quantity and the high-frequency measurement data can be subjected to the collected chronology difference comparison so as to ensure the uniqueness of the measurement data, and the unique identification is given to enable the measurement data to be positioned quickly, meanwhile, the writing period and the classification capacity upper limit are set, the classification capacity upper limit is reached every other writing period or the oversubstance, the measurement table is written into the column database, the timeliness of the writing of the measurement data is ensured, and the writing efficiency of the measurement data is improved.
Example III
In another aspect, the present invention also proposes a computer readable storage medium, on which one or more computer programs are stored, which when executed by a processor implement the grid data integration method described above.
Those of skill in the art will appreciate that the logic or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable storage medium would include an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer-readable storage medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
Example IV
Fig. 3 is a block diagram of an electronic device according to a fourth embodiment. The electronic device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to implement the grid data integration method in the above embodiments. The electronic device 30 shown in fig. 3 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 3, the electronic device 30 may be embodied in the form of a general purpose computing device, which may be a server device, for example. The components of the electronic device 30 may include, but are not limited to, the at least one processor 31, the at least one memory 32, and a bus 33 that connects the various system components, including the memory 32 and the processor 31.
The bus 33 includes a data bus, an address bus, and a control bus.
Memory 32 may include volatile memory such as RAM321 (random access memory), and/or cache memory 322, and may further include ROM323 (read only memory).
Memory 32 may also include a program tool 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 31 executes various functional applications and data processing, such as the grid data integration method of the present invention as described above, by running a computer program stored in the memory 32.
The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through an I/O interface 35 (input/output interface). Also, electronic device 30 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 36. As shown in fig. 3, network adapter 36 communicates with other modules of model-generated electronic device 30 via bus 33. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with the model-generating electronic device 30, including, but not limited to, microcode, device drivers, redundant processors, disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, among others.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1.一种电网数据整合方法,其特征在于,所述电网数据整合方法包括:1. A method for integrating power grid data, characterized in that the method includes: 构建基础环境,配置集群客户端和建立集群存储主体;Build the basic environment, configure the cluster client, and establish the cluster storage entity; 搭建列式数据库和任务通道,并根据所述任务通道获取集群存储主体信息,对量测数据进行预处理,并将预处理后的量测数据导入所述列式数据库中的量测表,对所述量测表进行解析归类和枚举信息定义,所述枚举信息定义过程包括实时枚举挖掘,预测式枚举定义和关联权重动态调整,结合NLP和时序分析,从量测表中提取未被标注异常具备枚举属性的隐性枚举候选值作为基础特征,并通过知识图谱构建枚举值的语义关联,实现实时枚举挖掘,并将该隐性枚举候选值同步至预测模型的输入层,以提取隐性枚举候选值的时间特征,波动特征和关联特征,进行预测式枚举定义,并输出未来枚举概率分布情况以显示未来时段内出现的新枚举值概率分布,将未来枚举概率分布情况作为前瞻性特征以用于指示新的枚举方向,反哺实时枚举挖掘方向,将关联权重动态调整中的集群存储主体负载特征作为约束特征,优先匹配高权重存储主体,优化写入效率,限制枚举值的存储优先级;A columnar database and task channel are established, and cluster storage entity information is obtained based on the task channel. Measurement data is preprocessed and imported into a measurement table in the columnar database. The measurement table is then parsed, categorized, and its enumeration information is defined. The enumeration information definition process includes real-time enumeration mining, predictive enumeration definition, and dynamic adjustment of association weights. Combining NLP and time series analysis, unlabeled anomalies with enumeration attributes are extracted from the measurement table as basic features. Semantic associations of the enumeration values are constructed using a knowledge graph to achieve real-time enumeration. The implicit enumeration candidate value is extracted and synchronized to the input layer of the prediction model to extract the temporal, fluctuation, and correlation features of the implicit enumeration candidate value. Predictive enumeration is defined, and the future enumeration probability distribution is output to show the probability distribution of new enumeration values in future time periods. The future enumeration probability distribution is used as a forward-looking feature to indicate the new enumeration direction and feed back to the real-time enumeration mining direction. The cluster storage main load feature in the dynamic adjustment of correlation weight is used as a constraint feature to prioritize matching high-weight storage mains, optimize write efficiency, and limit the storage priority of enumeration values. 根据所述量测表的数据信息进行数据映射,将数据映射结果转存至实体表,并对所述量测表和所述实体表进行一致性统计。Data mapping is performed based on the data information in the measurement table, the data mapping results are transferred to the entity table, and consistency statistics are performed on the measurement table and the entity table. 2.根据权利要求1所述的电网数据整合方法,其特征在于,所述构建基础环境,配置集群客户端和建立集群存储主体的步骤包括:2. The power grid data integration method according to claim 1, characterized in that the steps of constructing the basic environment, configuring the cluster client, and establishing the cluster storage entity include: 配置大数据基础环境,支持Java,scala开发语言编译,根据集群客户端,支持开发环境作业部署;Configure a big data infrastructure environment, support compilation of Java and Scala development languages, and support deployment of development environment jobs according to cluster clients; 基于量测数据的实际数据量的大小和业务规则创建集群存储主体、数据存储时长和分区大小。Based on the actual data volume of the measurement data and business rules, create the cluster storage entity, data storage duration, and partition size. 3.根据权利要求1所述的电网数据整合方法,其特征在于,所述搭建列式数据库和任务通道,并根据所述任务通道获取集群存储主体信息,对量测数据进行预处理,并将预处理后的量测数据导入所述列式数据库中的量测表,对所述量测表进行解析归类和枚举信息定义,所述枚举信息定义过程包括实时枚举挖掘,预测式枚举定义和关联权重动态调整,结合NLP和时序分析,从量测表中提取未被标注异常具备枚举属性的隐性枚举候选值作为基础特征,并通过知识图谱构建枚举值的语义关联,实现实时枚举挖掘,并将该隐性枚举候选值同步至预测模型的输入层,以提取隐性枚举候选值的时间特征,波动特征和关联特征,进行预测式枚举定义,并输出未来枚举概率分布情况以显示未来时段内出现的新枚举值概率分布,将未来枚举概率分布情况作为前瞻性特征以用于指示新的枚举方向,反哺实时枚举挖掘方向,将关联权重动态调整中的集群存储主体负载特征作为约束特征,优先匹配高权重存储主体,优化写入效率,限制枚举值的存储优先级的步骤包括:3. The power grid data integration method according to claim 1, characterized in that: a columnar database and task channel are established, and cluster storage entity information is obtained according to the task channel; measurement data is preprocessed, and the preprocessed measurement data is imported into the measurement table in the columnar database; the measurement table is parsed, classified, and enumerated information is defined; the enumeration information definition process includes real-time enumeration mining, predictive enumeration definition, and dynamic adjustment of association weights; combined with NLP and time series analysis, implicit enumeration candidate values with enumeration attributes that are not labeled as anomalies are extracted from the measurement table as basic features; and enumeration is constructed through a knowledge graph. The semantic association of values enables real-time enumeration mining, and the implicit enumeration candidate value is synchronized to the input layer of the prediction model to extract the temporal, fluctuation, and association features of the implicit enumeration candidate value. Predictive enumeration is defined, and the future enumeration probability distribution is output to show the probability distribution of new enumeration values appearing in future time periods. This future enumeration probability distribution is used as a forward-looking feature to indicate new enumeration directions, feeding back into the real-time enumeration mining direction. The cluster storage main load characteristics in the dynamic adjustment of association weights are used as constraint features to prioritize matching high-weight storage mains, optimize write efficiency, and limit the storage priority of enumeration values. The steps include: 针对任意一张量测表的量测信息设计列式数据库的表名称,并设置量测信息数据解析过程分区的大小和量测表的表字段,并完成自动创建量测表;Design the table name of the columnar database for the measurement information of any measurement table, set the partition size of the measurement information data parsing process and the table fields of the measurement table, and complete the automatic creation of the measurement table. 基于集群客户端的采集信息结合所述量测表的业务信息,分析当前所述量测表的唯一性,并生成所述量测表中每行量测数据的唯一标识。Based on the information collected by the cluster client and the business information of the measurement table, the uniqueness of the current measurement table is analyzed, and a unique identifier for each row of measurement data in the measurement table is generated. 4.根据权利要求3所述的电网数据整合方法,其特征在于,针对量测表的数据信息,设计数据解析过程的系列化解析器,构建集群存储主体的通讯路径;4. The power grid data integration method according to claim 3, characterized in that, for the data information of the measuring meter, a series of parsers for the data parsing process are designed, and the communication path of the cluster storage entity is constructed; 根据量测表的数据量级,构建处理框架环境下的检查点,并实时订阅量测表中的集群存储主体信息。Based on the data volume of the measurement table, checkpoints are constructed in the processing framework environment, and the cluster storage entity information in the measurement table is subscribed to in real time. 5.根据权利要求4所述的电网数据整合方法,其特征在于,通过处理框架对量测表中的量测数据进行解析,引入领域知识图谱,通过领域知识图谱对量测表的量测数据进行解析,利用数据血缘追踪,校验当前量测表的关联表的字段逻辑,并使用孤立森林算法识别当前量测表中的量测数据与列式数据库的匹配程度,筛选出正常数据和异常数据,对正常数据所属时间段进行归类处理,对异常数据进行归并于同一时间段处理;5. The power grid data integration method according to claim 4, characterized in that: the measurement data in the measurement table is parsed through a processing framework, a domain knowledge graph is introduced, the measurement data in the measurement table is parsed through the domain knowledge graph, the field logic of the related tables of the current measurement table is verified by data lineage tracing, and the degree of matching between the measurement data in the current measurement table and the columnar database is identified by the isolated forest algorithm, normal data and abnormal data are filtered out, the time period to which the normal data belongs is classified, and the abnormal data is merged into the same time period. 对归类后的量测表进行枚举信息定义,并根据枚举信息与集群存储主体的关联性写入所属表字段的列式数据库,分析量测表的建表规则,对写入列式数据库的量测表和所属列式数据库的表字段进行校验,若量测表内容与所述表字段一致,则写入成功,若不一致,则自动创建与量测表匹配的列式数据库;The enumeration information of the categorized measurement tables is defined, and the information is written into the columnar database of the corresponding table fields based on the association between the enumeration information and the cluster storage entity. The table creation rules of the measurement tables are analyzed, and the measurement tables written into the columnar database and the table fields of the corresponding columnar database are verified. If the content of the measurement table is consistent with the table fields, the writing is successful; otherwise, a columnar database matching the measurement table is automatically created. 设定写入周期和归类容量上限,每隔一个写入周期或者过归类容量达到归类容量上限,将量测表写入列式数据库。Set the write cycle and the classification capacity limit. Write the measurement table to the columnar database every write cycle or when the classification capacity reaches the classification capacity limit. 6.根据权利要求1所述的电网数据整合方法,其特征在于,所述根据所述量测表的数据信息进行数据映射,将数据映射结果转存至实体表,并对所述量测表和所述实体表进行一致性统计的步骤包括:6. The power grid data integration method according to claim 1, characterized in that the steps of performing data mapping based on the data information of the measurement table, transferring the data mapping result to the entity table, and performing consistency statistics on the measurement table and the entity table include: 基于量测表的数据信息,并根据自定义规则,生成具有唯一标识的实体表,所述具有唯一标识的实体表覆盖列式数据库出现的最大数据范围;Based on the data information of the measurement table and according to the custom rules, a uniquely identified entity table is generated, which covers the maximum data range that appears in the columnar database. 通过大数据处理引擎读取列式数据库中不同表字段的量测表的数据内容,和自定义规则,以将量测表的数据内容转存至所述具有唯一标识的实体表;The big data processing engine reads the data content of the measurement table from different table fields in the columnar database and uses custom rules to transfer the data content of the measurement table to the entity table with a unique identifier. 针对量测表的数据内容和唯一标识的实体表中的数据内容,进行一致性统计。Consistency statistics are performed on the data content of the measurement table and the data content of the uniquely identified entity table. 7.根据权利要求6所述的电网数据整合方法,其特征在于,在量测表的数据信息的转存和统计的过程中,对量测表的数据信息进行监控;7. The power grid data integration method according to claim 6, characterized in that, during the process of transferring and statistically analyzing the data information of the measuring meters, the data information of the measuring meters is monitored; 至少采集量测表的集群存储主体信息、数据业务信息,通过处理框架实时统计量测表的记录数量以进行数据实时监控,同时抓取量测表的数据信息写入列式数据库的时间段,以监测数据的时效性。At a minimum, the cluster storage entity information and data service information of the measurement table should be collected. The number of records in the measurement table should be counted in real time through the processing framework for real-time data monitoring. At the same time, the time period for writing the data information of the measurement table into the columnar database should be captured to monitor the timeliness of the data. 8.一种电网数据整合系统,其特征在于,所述电网数据整合系统用于实现权利要求1-7任一项所述的电网数据整合方法,所述系统包括:8. A power grid data integration system, characterized in that the power grid data integration system is used to implement the power grid data integration method according to any one of claims 1-7, the system comprising: 配置模块,用于构建基础环境,配置集群客户端和建立集群存储主体;The configuration module is used to build the basic environment, configure cluster clients, and establish the cluster storage entity. 导入模块,用于搭建列式数据库和任务通道,并根据所述任务通道获取集群存储主体信息,对量测数据进行预处理,并将预处理后的量测数据导入所述列式数据库中的量测表,对所述量测表进行解析归类和枚举信息定义,所述枚举信息定义过程包括实时枚举挖掘,预测式枚举定义和关联权重动态调整,结合NLP和时序分析,从量测表中提取未被标注异常具备枚举属性的隐性枚举候选值作为基础特征,并通过知识图谱构建枚举值的语义关联,实现实时枚举挖掘,并将该隐性枚举候选值同步至预测模型的输入层,以提取隐性枚举候选值的时间特征,波动特征和关联特征,进行预测式枚举定义,并输出未来枚举概率分布情况以显示未来时段内出现的新枚举值概率分布,将未来枚举概率分布情况作为前瞻性特征以用于指示新的枚举方向,反哺实时枚举挖掘方向,将关联权重动态调整中的集群存储主体负载特征作为约束特征,优先匹配高权重存储主体,优化写入效率,限制枚举值的存储优先级;The import module is used to build a columnar database and task channels, and obtain cluster storage entity information based on the task channels. It preprocesses the measurement data and imports the preprocessed measurement data into the measurement table in the columnar database. The measurement table is then parsed, categorized, and its enumeration information is defined. The enumeration information definition process includes real-time enumeration mining, predictive enumeration definition, and dynamic adjustment of association weights. Combining NLP and time series analysis, it extracts unlabeled anomalies with enumeration attributes as implicit enumeration candidate values from the measurement table as basic features, and constructs semantic associations for the enumeration values using a knowledge graph. Real-time enumeration mining is performed, and the implicit enumeration candidate value is synchronized to the input layer of the prediction model to extract the time features, fluctuation features, and correlation features of the implicit enumeration candidate value. Predictive enumeration definition is performed, and the future enumeration probability distribution is output to show the probability distribution of new enumeration values appearing in future time periods. The future enumeration probability distribution is used as a forward-looking feature to indicate the new enumeration direction and feed back to the real-time enumeration mining direction. The cluster storage main load feature in the dynamic adjustment of correlation weight is used as a constraint feature to prioritize matching high-weight storage mains, optimize write efficiency, and limit the storage priority of enumeration values. 转存模块,用于根据所述量测表的数据信息进行数据映射,将数据映射结果转存至实体表,并对所述量测表和所述实体表进行一致性统计。The transfer module is used to perform data mapping based on the data information of the measurement table, transfer the data mapping result to the entity table, and perform consistency statistics on the measurement table and the entity table. 9.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7任一项所述的电网数据整合方法。9. A computer-readable storage medium having a computer program stored thereon, characterized in that, when executed by a processor, the program implements the power grid data integration method as described in any one of claims 1-7. 10.一种电子设备,包括存储器、处理器及存储在存储器上并在处理器上运行的计算机程序,其特征在于,所述处理器执行计算机程序时实现如权利要求1-7中任一项所述的电网数据整合方法。10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that the processor executes the computer program to implement the power grid data integration method as described in any one of claims 1-7.
CN202511086416.4A 2025-08-05 2025-08-05 A method and system for power grid data integration Active CN120578708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511086416.4A CN120578708B (en) 2025-08-05 2025-08-05 A method and system for power grid data integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511086416.4A CN120578708B (en) 2025-08-05 2025-08-05 A method and system for power grid data integration

Publications (2)

Publication Number Publication Date
CN120578708A CN120578708A (en) 2025-09-02
CN120578708B true CN120578708B (en) 2025-12-30

Family

ID=96858696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511086416.4A Active CN120578708B (en) 2025-08-05 2025-08-05 A method and system for power grid data integration

Country Status (1)

Country Link
CN (1) CN120578708B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971968A (en) * 2024-01-17 2024-05-03 广西电网有限责任公司 A storage system and data synchronization method based on multiple heterogeneous databases
CN119127921A (en) * 2024-07-26 2024-12-13 国网智能电网研究院有限公司 A method, system, device and medium for intelligent interaction of power equipment data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8086998B2 (en) * 2006-04-27 2011-12-27 International Business Machines Corporation transforming meta object facility specifications into relational data definition language structures and JAVA classes
US8321429B2 (en) * 2006-12-28 2012-11-27 Sybase, Inc. Accelerating queries using secondary semantic column enumeration
CN111046054A (en) * 2019-12-01 2020-04-21 国家电网有限公司客户服务中心 A method and system for data analysis of power marketing business
CN113342806A (en) * 2021-05-18 2021-09-03 湖北卓铸网络科技有限公司 Big data processing method and device, storage medium and processor
CN119415514A (en) * 2024-10-10 2025-02-11 浪潮云信息技术股份公司 A method and system for quickly writing into Hive partition table
CN120407554B (en) * 2025-07-04 2025-09-09 福建朴朴信息技术有限公司 Method for facilitating data blood margin collection and analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117971968A (en) * 2024-01-17 2024-05-03 广西电网有限责任公司 A storage system and data synchronization method based on multiple heterogeneous databases
CN119127921A (en) * 2024-07-26 2024-12-13 国网智能电网研究院有限公司 A method, system, device and medium for intelligent interaction of power equipment data

Also Published As

Publication number Publication date
CN120578708A (en) 2025-09-02

Similar Documents

Publication Publication Date Title
CN113760891B (en) A method, device, equipment and storage medium for generating a data table
US20240264890A1 (en) Method and system for analyzing cloud platform logs, device and medium
CN114416855A (en) Visualization platform and method based on electric power big data
CN112100149B (en) Automatic log analysis system
US20220300505A1 (en) Method, electronic device for obtaining hierarchical data structure and processing log entires
CN117171410B (en) System status assessment method, device, equipment and medium based on photovoltaic power generation data
CN112559641A (en) Processing method and device of pull chain table, readable storage medium and electronic equipment
WO2021052168A1 (en) Disk fault prediction method and apparatus, computer-readable storage medium, and server
Cavallaro et al. Identifying anomaly detection patterns from log files: A dynamic approach
CN114764701A (en) Data processing method, device, medium and electronic equipment
CN117544482A (en) AI-based operation and maintenance fault determination methods, devices, equipment and storage media
Wu et al. An Auxiliary Decision‐Making System for Electric Power Intelligent Customer Service Based on Hadoop
CN118796794A (en) Heterogeneous database migration method, device and electronic device
CN117763144A (en) A log anomaly detection method and terminal
CN120578708B (en) A method and system for power grid data integration
Sun et al. Design and development of a log management system based on cloud native architecture
US20220343115A1 (en) Unsupervised classification by converting unsupervised data to supervised data
CN117453690A (en) Data processing method, device and computer medium for power grid data warehouse
CN114185957B (en) Intelligent mining method suitable for power big data service requirements
Luo et al. [Retracted] Design of Data Classification and Classification Management System for Big Data of Hydropower Enterprises Based on Data Standards
CN116578612A (en) Lithium battery product testing data asset construction method
CN116192976A (en) A multi-stimulus concurrent protocol interoperability testing method and system
CN113779215A (en) Data processing platform
CN112199475A (en) Content analysis method based on electricity price clause data
Liu et al. An anomaly detection method for provincial-side base operation logs based on vae and transformer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant