Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a method and a system for checking the change relation of a platform region based on an integrated learning LSCP (Local Selective Combination IN PARALLEL Outlier Ensembles) algorithm, which use a parallel integrated anomaly detection algorithm LSCP of local selection combination, and use 4 classical anomaly detection algorithms as basic anomaly detectors to accurately detect the difference of user voltage curves in the platform region so as to check the change relation of the platform region. Meanwhile, the key input parameters in the LSCP algorithm are determined by calculating the line loss rate, so that the practicability of the algorithm is improved.
The invention adopts the following technical scheme.
A method for checking the change relation of a station area user based on an integrated learning LSCP algorithm comprises the following steps:
step 1, collecting platform region operation data in a plurality of platform regions for N days;
step 2, calculating the anomaly scores of users in the platform area based on a plurality of base anomaly detectors and platform area operation data to construct a user transformation relation correction model;
Step 3, screening normal user of user change relation in the station area by the user change relation correction model, and calculating the line loss rate of the station area according to the operation data of the normal user of the user change relation to determine the optimal input parameters of the model;
and step 4, inputting the optimal input parameters into the user-change relation correction model to obtain the user with wrong user-change relation in the platform area.
The operation data of the transformer area comprises daily electricity quantity and voltage readings of all users in the transformer area and daily electricity quantity and voltage readings of a power supply transformer in the transformer area.
The user change relation correction model comprises a plurality of base anomaly detectors, wherein the base anomaly detectors are used for judging the user change relation of users in the platform region.
The construction process of the user-variable relation correction model comprises the following steps:
S1, inputting voltage readings of all users in a platform area, and calculating abnormal scores of the voltage readings of all the users by using a plurality of base anomaly detectors;
s2, the user with the highest abnormal score is judged to be the user with the wrong user-to-user relationship, and the other users are judged to be the users with normal user-to-user relationship.
S1, training the base anomaly detectors, namely respectively inputting the voltage readings of all users in the platform area into R base anomaly detectors to calculate to obtain the local anomaly scores of the R user voltage readings, establishing a local anomaly score matrix, respectively calculating the correlation between the data in the matrix and the maximum value of the local anomaly score, and screening out x base anomaly detectors with the maximum correlation.
The construction process further comprises the steps of calculating x anomaly scores of the users x i in the ith platform area by using the screened x base anomaly detectors, and taking the average value of the x anomaly scores as the anomaly score of the users.
The abnormal scores of all users in the platform region form an abnormal score set S, the first Y data with high scores in the abnormal score set S are regarded as the number of users with wrong user-changing relations in the platform region, wherein Y=n×c, n is the total number of users in the platform region, c is the abnormal data proportion, and the number Y of users with wrong user-changing relations in the platform region is controlled through the abnormal data proportion c.
The screened basic anomaly detectors comprise isolated forests, OC-SVMs, COPODs and outlier factor detection algorithms LOF;
and calculating the correlation between the data in the matrix and the maximum value of the local anomaly score by using the pearson correlation coefficient.
Step 3 further comprises:
Step 301, traversing the abnormal data proportion in intervals (0,0.1) every 1/n to obtain a plurality of abnormal data proportion values, wherein n is the total number of users in the station area;
Step 302, inputting the abnormal data proportion value into a user change relation correction model to obtain a user with normal user change relation in the platform area and a user with wrong user change relation in the platform area;
Step 303, selecting the sum of the daily electricity consumption of the normal users of the user change relation in the transformer area, calculating the line loss rate of the transformer area, and setting the threshold value of the lowest line loss rate;
Step 304, judging whether the line loss rate of the station area is lower than the lowest line loss threshold, if not, considering that the user with the user-to-user relationship error is missed and executing step 305, if so, taking the next abnormal data proportion value and circularly executing step 302;
Step 305, determining whether the line loss rate of the station area is the minimum line loss rate of the station area, if not, taking the next abnormal data proportion value and executing step 302 in a circulating manner, and if so, taking the abnormal data proportion value corresponding to the line loss rate of the station area as the optimal input parameter.
The calculation mode of the line loss rate of the station area is as follows:
The lowest line loss threshold value is within the range of [ -2%,5% ].
After obtaining the user-change relation error user in the platform area, the user-change relation error user is manually verified, and the system user-change relation file is updated.
A district user change relation correction method based on an integrated learning LSCP algorithm comprises the following steps:
The data acquisition module is used for acquiring the running data of the transformer area, namely, the daily electricity quantity and voltage readings of all users in the transformer area and the daily electricity quantity and voltage readings of a power supply transformer in the transformer area;
The model building module is used for building a user-variable relation correction model according to various anomaly detection algorithms and platform region operation data;
the data calculation module is used for calculating the local anomaly score, the line loss rate of the transformer area and the correlation between the data in the matrix and the maximum value of the local anomaly score of the user voltage reading in the transformer area;
The algorithm screening module is used for screening the detector for the basis abnormality according to the calculation result of the data calculation module;
And the output module is used for obtaining the user change relation error user in the platform area according to the output result of the model.
Compared with the prior art, the invention combines a plurality of abnormal detection models, realizes detection of different angles of data, strengthens learning of data characteristics, obtains better performance and reliability, strengthens stability and accuracy of an error correction result, and applies an abnormal detection method with an integrated framework to identification and error correction of user-variable relations. The invention can utilize a parallel integrated anomaly detection algorithm based on local selection combination to carry out user change relation check based on the actual platform region operation data of the existing electricity consumption system, marketing system and the like, calculate users with wrong user change relation in a certain platform region and clearly check the direction on site at the system level.
The beneficial effects of the invention also include:
(1) According to the invention, key input parameters of the algorithm are optimized and determined from the line loss rate, so that the influence of manually setting parameters on the user-variable relationship correction result is avoided, meanwhile, the reliability and accuracy of the result are increased, and the practicability of the user-variable relationship correction algorithm provided by the invention is further improved.
(2) Based on the method in the invention, the number of abnormal data, namely the number of users with wrong user-to-user relationship can be accurately judged by controlling the abnormal data proportion for a large number of areas, and then specific wrong users can be checked based on the user-by-user check of actual engineering maintenance personnel, thereby effectively reducing the work load of field check, improving the efficiency of field check and saving the check cost.
Detailed Description
The application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and are not intended to limit the scope of the present application.
Fig. 1 is a schematic flow chart of a method for checking a change relation of a station area user based on an ensemble learning LSCP algorithm in the present invention. The method comprises the following steps as shown in fig. 1:
And step 1, collecting platform region operation data in N days of a plurality of platform regions, wherein the platform region operation data comprise daily electricity quantity and voltage readings of all users in the platform region and daily electricity quantity and voltage readings of a power supply transformer in the platform region.
Step 2, calculating the anomaly scores of users in the platform area based on a plurality of base anomaly detectors and platform area operation data to construct a user transformation relation correction model;
The user change relation correction model comprises a plurality of base anomaly detectors, wherein the base anomaly detectors are used for judging the user change relation of users in a platform region.
The construction process comprises the following steps:
S1, inputting voltage readings of all users in a platform area, and calculating abnormal scores of the voltage readings of all the users by using a plurality of base anomaly detectors;
s2, the user with the highest abnormal score is judged to be the user with the wrong user-to-user relationship, and the other users are judged to be the users with normal user-to-user relationship.
The method comprises the steps of training base anomaly detectors, respectively inputting voltage readings of all users in a platform area into R base anomaly detectors to calculate local anomaly scores of the R user voltage readings, establishing a local anomaly score matrix, respectively calculating correlation between data in the matrix and the maximum value of the local anomaly scores, and screening out x base anomaly detectors with maximum correlation.
And calculating x anomaly scores of the user x i in the ith platform area by using the screened x base anomaly detectors, and taking the average value of the x anomaly scores as the anomaly score of the user. The screened base anomaly detector comprises an isolated forest, an OC-SVM, a COPOD and an outlier factor detection algorithm LOF, and the pearson correlation coefficient is adopted to calculate the correlation between the data in the matrix and the maximum value of the local anomaly score.
The abnormal scores of all users in the platform region form an abnormal score set S, the first Y data with high scores in the abnormal score set S are regarded as the number of users with wrong user-changing relations in the platform region, wherein Y=n×c, n is the total number of users in the platform region, c is the abnormal data proportion, and the number Y of users with wrong user-changing relations in the platform region is controlled through the abnormal data proportion c.
Step 3, screening normal user of user-to-user relation in the transformer area by using the user-to-user relation correction model, and calculating the line loss rate of the transformer area according to the operation data of the normal user of the user-to-user relation to determine the optimal input parameters of the model, wherein the method specifically comprises the following steps:
Step 301, traversing the abnormal data proportion in intervals (0,0.1) every 1/n to obtain a plurality of abnormal data proportion values, wherein n is the total number of users in the station area;
Step 302, inputting the abnormal data proportion value into a user change relation correction model to obtain a user with normal user change relation in the platform area and a user with wrong user change relation in the platform area;
Step 303, selecting the sum of the daily electricity consumption of the normal users of the user change relation in the transformer area, calculating the line loss rate of the transformer area, and setting the threshold value of the lowest line loss rate;
Step 304, judging whether the line loss rate of the station area is lower than the lowest line loss threshold, if not, considering that the user with the user-to-user relationship error is missed and executing step 305, if so, taking the next abnormal data proportion value and circularly executing step 202;
Step 305, determining whether the line loss rate of the station area is the minimum line loss rate of the station area, if not, taking the next abnormal data proportion value and executing step 202 in a circulating manner, and if so, taking the abnormal data proportion value corresponding to the line loss rate of the station area as the optimal input parameter. The calculation mode of the line loss rate of the station area is as follows:
The lowest line loss threshold value is within the range of [ -2%,5% ].
And step 4, inputting the optimal input parameters into the user-change relation correction model to obtain the user with wrong user-change relation in the platform area.
After obtaining the user-change relation error user in the platform area, the user-change relation error user is manually verified, and the system user-change relation file is updated.
Fig. 2 is an overall framework of the integrated anomaly detection algorithm LSCP. LSCP is an anomaly detection algorithm that is integrated in parallel from multiple anomaly detection algorithms, completely unsupervised. The LSCP algorithm is characterized in that ① multiple base anomaly detection models are independently calculated and judged in parallel, ② dynamically selects a combined base model according to the performance of the base anomaly detection models, ③ randomly samples subsequences of different dimensions of data to construct a characteristic sample subspace to emphasize local anomalies of the data, ④ supports the combined heterogeneous base anomaly detection models and also supports the combined isomorphic base anomaly detection models. The heterogeneous models can be combined to enable the models to have diversity, and the isomorphic models can be combined to realize diversity by setting different super-parameters. The specific flow of the LSCP algorithm for detecting abnormal data comprises the following steps.
Inputting a data set X epsilon R n×d to be detected, wherein the data set to be detected is the voltage readings of all users in a platform area, a base anomaly detector is C= { C 1,C2,…CR }, and training R base anomaly detectors to obtain an anomaly score matrix O (X) epsilon R n×R.
O(X)=[C1(X),C2(X),…,CR(X)] (1)
For each data x i∈Xxi∈R1×d, random sampling is performed to constructThe feature sub-sequence X i' of the dimension results in a set of feature subspaces of the dataset X.
In the feature subspace, the Euclidean distance between x i and x i 'is calculated, and K nearest neighbor samples with the distance of x i' are obtained by using a K nearest neighbor method (KNN). And (3) carrying out t groups of random sampling to obtain t groups of characteristic subspaces, and taking samples with the occurrence times exceeding t/2 as local space psi i of x i in neighbor samples of the characteristic subspaces of the data x i.
The radix b of the local space of each data x i is different, i.e. the number of samples contained therein is different.
After obtaining the local space of the data x i, using a base anomaly detector to obtain the local anomaly score O (ψ i)∈Rb ×R) of x i, assuming that the local space of x i contains 10 data samples, calculating the anomaly score of the local space by using R base anomaly detectors to obtain a local anomaly score matrix O (ψ i) of 10×R dimensions.
Taking the maximum value of the R base anomaly detector scores as the pseudo target of the data sample, the pseudo target of data x i is the target (x i), which is calculated as follows:
Target(xi)=f(O(ψi))max (3)
The pearson correlation coefficient (Pearson Correlation Coefficient) between the local anomaly score matrix O (psi i) and the pseudo-target (psi i) of the data x i is calculated, x local anomaly scores most similar to the pseudo-target in the local anomaly score matrix O (psi i) are selected, and a base anomaly detector for which the x local anomaly scores are calculated is used as a base anomaly detector for a user change relation correction model.
Calculating anomaly scores of data x i using the selected x base anomaly detectors to obtain an anomaly score matrix O (x i)∈Rx×1 and taking the average of the x anomaly scores as anomaly score s of x i, wherein
s=favg(O(xi)) (4)
After the anomaly scores of all the data in the data set X to be detected are calculated by the method, and an anomaly score set S (S epsilon R n×1) is obtained, the number Y of the anomaly data in the data set is controlled by inputting the anomaly data proportion c, wherein Y is the number of users with wrong user-to-user variation relations in the platform region, and the first Y data with high scores in the anomaly score S are regarded as the anomaly data.
Y=n×c (5)
Wherein n is the number of samples in the data set, i.e. the total number of users in the area, and c is the abnormal data proportion.
In order to improve the reliability and stability of the integrated model, the LSCP algorithm is combined with the heterogeneous base anomaly detection model, four classical anomaly detection algorithms are used as base anomaly detectors, detection of different angles of data anomalies is realized for the isolated forest (Isolation Forest)、OC-SVM(One-Class Support Vector Machine)、COPOD(Copula-Based Outlier Detection)、LOF(Local Outlier Factor). respectively, and the study of the data difference characteristics is enhanced.
The isolated forest is based on the idea of division to realize the abnormality detection of data, and the data set is divided, wherein the smaller the number of times of sample division is, the more easily it is isolated, and the higher the abnormality degree of the sample is. The OC-SVM is an abnormal data detection algorithm provided for unbalanced samples, normal data and abnormal data are obviously different by mapping original data to a high-dimensional space through a kernel function, and a hyperplane is constructed to separate the normal data and the abnormal data. COPOD is to implement anomaly detection based on a statistical method, calculate tail end probability of each point of data for multidimensional data and diversity of data distribution, calculate bias of distribution to correct tail end probability of data, and implement estimation of anomaly degree of data. The LOF is based on the density idea, the anomaly detection is carried out by comparing the density of each point with the density of the neighborhood point, and the LOF algorithm calculates the density through the K neighborhood of the point, so that the anomaly erroneous judgment with different scattering conditions of the data density can not be caused.
In order to avoid the influence of random electricity utilization behavior of users on user change relation correction and the difference amplification between normal user and wrong user voltage curves, the application selects voltage data of users in a transformer area for two days as input.
The voltage data among the users in different areas have different trends and characteristics, the user ratio of the user change relation errors in the areas is small, and the area user change relation check problem can be defined as abnormal data detection of unbalanced samples. For the accuracy and reliability of the results, the user-variable relationship is checked by adopting an LSCP anomaly detection algorithm with an integrated framework.
The principle of the method for correcting the district user change relation based on the LSCP algorithm is that voltage data of district users are input, abnormal trends of the voltage data are accurately detected by the LSCP algorithm, the abnormal degree of the voltage data is judged, and abnormal scores of the voltage data of each user are obtained. And judging the individual user with the highest data anomaly score as the user with the user change relation error, and outputting the user with the highest data anomaly score as 1. The remaining users are judged as normal users with user change relation, and the output is 0. When the LSCP algorithm is used for user change relation correction, the adjustable parameters are shown in the following table 1:
Table 1LSCP algorithm parameter specification
The number of the KNN neighbor samples, the random sampling times, the random sampling minimum dimension, the random sampling maximum dimension and the characteristic space sample times threshold are adjusted, and the local space of the data can be changed.
The number of abnormal data output by the abnormal data proportion control, namely the number of users with wrong user-to-user relationship, is a key input parameter of the user-to-user relationship correction algorithm. When the change relation of the station area user is checked, the proportion of users with wrong change relation of the station area user is unknown. For a mass area, if the abnormal data proportion is set to be a fixed value, the accuracy and efficiency of a calculation result are affected. The optimization determination of the key input parameters of the algorithm is tried from the line loss rate angle, the influence of the manual setting parameters on the user-variable relation correction result is avoided, meanwhile, the reliability and the accuracy of the result are also improved, and the practicability of the user-variable relation correction algorithm is further improved.
The electric quantity relation between the station area and the user is theoretically:
Wherein P is total power consumption of the area, P (i) is power consumption of the user i, and n is the number of users of the area.
In fact, due to the online loss of the cell memory:
According to the regulations of the power supply company, the line loss rate of the transformer area fluctuates within a normal range. If there is a user with wrong user-changing relationship in the system file of the platform, the total power consumption platform of the platform in the marketing metering system is smaller than the sum of the power consumption of the users, as shown in the formula (8).
Therefore, in this embodiment, the best input parameters of the user-variable relationship correction model are determined by using the line loss rate of the transformer area, and the determining process is as shown in fig. 3:
Step 301, traversing the abnormal data proportion in intervals (0,0.1) every 1/n to obtain a plurality of abnormal data proportion values, wherein n is the total number of users in the station area;
Step 302, inputting the abnormal data proportion value into a user change relation correction model to obtain a user with normal user change relation in the platform area and a user with wrong user change relation in the platform area;
Step 303, selecting the sum of the daily electricity consumption of the normal users of the user change relation in the transformer area, calculating the line loss rate of the transformer area, and setting the threshold value of the lowest line loss rate;
Step 304, judging whether the line loss rate of the station area is lower than the lowest line loss threshold, if not, considering that the user with the user-to-user relationship error is missed and executing step 305, if so, taking the next abnormal data proportion value and circularly executing step 202;
Step 305, determining whether the line loss rate of the station area is the minimum line loss rate of the station area, if not, taking the next abnormal data proportion value and executing step 202 in a circulating manner, and if so, taking the abnormal data proportion value corresponding to the line loss rate of the station area as the optimal input parameter. The calculation mode of the line loss rate of the station area is as follows:
The lowest line loss threshold value is within the range of [ -2%,5% ].
Different abnormal data proportion values are input, the LSCP algorithm is utilized to calculate, the number of the user with wrong user-to-user relationship is obtained, and the user with normal user-to-user relationship of the station area is determined. In the embodiment, the abnormal data proportion is traversed to take values every 0.01 in the interval (0,0.1), because the number of users in the transformer area in the simulation scene is about 100, and when the interval is 0.01, the number of users with wrong output user-to-user relationship is increased by about 1 each time.
Considering that the line loss exists in the normal operation of the station area, setting a minimum line loss rate threshold, wherein the minimum line loss rate threshold can be set according to the historical line loss record of the system or the characteristics of the actual station area.
And if the line loss rate obtained by calculation according to different input parameters is lower than the lowest line loss threshold line loss, the user is considered to have the missing of the user with the user-changing relation error. When the proportion value of the input abnormal data is increased, the number of users with wrong output user-to-user relationship is increased, and the user-to-user relationship is possibly misjudged as the wrong user, so that the sum of the power consumption of the users in the area is further reduced, and the line loss rate of the area is increased. Therefore, on the basis of the minimum line loss threshold value, an input value corresponding to the minimum line loss rate is selected as an optimal parameter.
The specific implementation steps of the platform-region user-change relation correction method based on the ensemble learning LSCP algorithm in this embodiment may be summarized as follows:
And collecting the operation data of the platform region from the mining system, wherein the operation data comprise voltage data and daily total power consumption data. Traversing the abnormal data proportion every 0.01 in the interval (0,0.1), performing user-change relation correction by using an LSCP algorithm to obtain users with normal user-change relation, calculating the line loss rate of the station area, determining the value meeting the condition to obtain the best input parameters, inputting the best parameters, performing user-change relation correction by using the LSCP algorithm to obtain users with wrong user-change relation in the station area, checking the user-change relation in situ by using staff, and updating the system user-change relation file in time.
The embodiment also provides a system for checking the district user changing relation based on the LSCP algorithm based on the district user changing relation checking method, which specifically comprises the following steps:
The data acquisition module is used for acquiring the running data of the transformer area, namely, the daily electricity quantity and voltage readings of all users in the transformer area and the daily electricity quantity and voltage readings of a power supply transformer in the transformer area;
the model building module is used for building a user-variable relation correction model according to various anomaly detection algorithms and platform region operation data;
The data calculation module is used for calculating the correlation among the local anomaly score, the line loss rate of the transformer area and the maximum value of the data and the local anomaly score in the matrix of the user voltage readings in the transformer area;
The algorithm screening module is used for screening the detector for the basis abnormality according to the calculation result of the data calculation module;
and the output module is used for obtaining the user change relation error user in the platform area according to the output result of the model.
In order to verify the universality and reliability of the method, under a simulation scene, the user change relation of the areas is checked by utilizing two-day voltage data for 6 areas, and the test results are shown in table 2.
Table 2 test results in simulation scenario
In table 2, the recall ratio of the area 5 is 80%, the precision ratio is 98.71%, the users with 1 user change relation error are missed, and the recall ratio and the precision ratio of the remaining 5 areas are 100%. However, if the voltage data of more than 2 days is used for calculation for the area 5, the recall and precision can still reach 100%. In practical application, for higher accuracy and reliability, voltage data over 2 days can be used for calculation under the condition that the voltage data of the transformer area are sufficient. Therefore, the method has high accuracy and applicability in the identification and correction of the change relation of the platform region, and can be well suitable for the practical application of the correction of the change relation of the platform region.
Compared with the prior art, the invention combines a plurality of abnormal detection models, realizes detection of different angles of data, strengthens learning of data characteristics, obtains better performance and reliability, strengthens stability and accuracy of an error correction result, and applies an abnormal detection method with an integrated framework to identification and error correction of user-variable relations. The invention can utilize a parallel integrated anomaly detection algorithm based on local selection combination to carry out user change relation check based on the actual platform region operation data of the existing electricity consumption system, marketing system and the like, calculate users with wrong user change relation in a certain platform region and clearly check the direction on site at the system level.
The beneficial effects of the invention also include:
(1) According to the invention, key input parameters of the algorithm are optimized and determined from the line loss rate, so that the influence of manually setting parameters on the user-variable relationship correction result is avoided, meanwhile, the reliability and accuracy of the result are increased, and the practicability of the user-variable relationship correction algorithm provided by the invention is further improved.
(2) Based on the method in the invention, the number of abnormal data, namely the number of users with wrong user-to-user relationship can be accurately judged by controlling the abnormal data proportion for a large number of areas, and then specific wrong users can be checked based on the user-by-user check of actual engineering maintenance personnel, thereby effectively reducing the work load of field check, improving the efficiency of field check and saving the check cost. While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.