CN115409134B

CN115409134B - User electricity utilization safety detection method, system, equipment and storage medium

Info

Publication number: CN115409134B
Application number: CN202211359242.0A
Authority: CN
Inventors: 慕静茹; 刘冬寅
Original assignee: Hunan 123 Intelligent Technology Co ltd
Current assignee: Hunan 123 Intelligent Technology Co ltd
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2023-02-03
Anticipated expiration: 2042-11-02
Also published as: CN115409134A

Abstract

The application discloses user power consumption safety detection method, system, equipment and storage medium, is applied to machine learning technical field, and includes: constructing an initial random forest model; when a user side normally uses electricity and generates electricity leakage, electric arc and short circuit, collecting electricity utilization information, extracting characteristics, and setting corresponding labels to obtain initial training samples; filtering the feature items based on the information gain, and selecting an optimal feature subset according to the principle of maximizing the classification accuracy of the initial random forest model; obtaining a target training sample based on the optimal feature subset and obtaining a target random forest model through training; and detecting the current power utilization information of the user side, selecting the characteristics, and inputting the characteristics into the target random forest model to obtain a user power utilization safety detection result. By applying the scheme, the power utilization safety detection of the user can be efficiently and accurately carried out, the natural semantics of the characteristic items are reserved, and the conditions of local extreme values, overfitting and poor generalization capability are avoided.

Description

User electricity utilization safety detection method, system, equipment and storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a user electricity safety detection method, a system, equipment and a storage medium.

Background

With the progress of science and technology, various new technologies are continuously emerging, the application of electrified products is greatly popularized, various low-voltage apparatuses are distributed in various fields of production and life of people, and certain potential safety hazards are brought. The electrical safety accidents caused by aging of electrical lines, improper configuration and the like tend to rise year by year, and low-voltage fault arcs are one of the main causes of electrical fire. The low-voltage arc fault in the form of short circuit in the low-voltage distribution device still exists, so that the loss caused by the fault is very serious, and great hidden danger is brought to the safety of life and property of people. In addition, the gas explosion accident caused by electric leakage causes serious casualties of workers, which not only causes great pain to families, but also generates serious impact to the society. Therefore, there is a need for a method for power safety sensing for low voltage users, so as to detect fault arcs, leakage and short circuit faults in the low voltage user system in real time.

At present, some methods can perform intelligent sensing on the electricity utilization safety of low-voltage users based on feature selection and artificial intelligence classification models. Such as a feature selection algorithm based on a genetic algorithm, a feature selection algorithm based on a particle swarm algorithm, a feature selection algorithm based on evolutionary computation, and the like. At present, most of research focuses on the aspect of optimization design of an artificial intelligence classification model, when electrical short circuit, electric leakage and fault arc occur, the quantity of real-time data is difficult to estimate for a data acquisition system with high sampling frequency, the calculation cost consumed by the adopted feature selection algorithm is high, the accuracy and efficiency of fault identification are influenced, and the task of accurately and effectively selecting features of a large amount of data cannot be completed in a short time.

In addition, some traditional feature selection methods, such as a high-dimensional matrix dimension reduction method, can also change the natural semantics of the original feature items, and in some occasions, the accuracy of the operation result of the artificial intelligence classification model can be influenced. In addition, the currently used algorithm model may fall into a local extremum condition when solving all extrema of the complex nonlinear function, resulting in a failed model training. The current artificial intelligence model is also easy to have the situation of poor overfitting or generalization capability.

In conclusion, how to efficiently and accurately perform user power utilization safety detection, the natural semantics of the feature items are retained, and the situation that a traditional artificial intelligence model is easy to have local extremum, overfitting and poor generalization capability is avoided, which is a technical problem urgently needed to be solved by a person skilled in the art.

Disclosure of Invention

The invention aims to provide a user electricity utilization safety detection method, a system, equipment and a storage medium, so that the user electricity utilization safety detection can be efficiently and accurately carried out, the natural semantics of characteristic items are reserved, and the situations that a traditional artificial intelligent model is easy to have local extreme values, overfitting and poor generalization capability are avoided.

In order to solve the technical problems, the invention provides the following technical scheme:

a user electricity utilization safety detection method comprises the following steps:

constructing an initial random forest model for outputting a user electricity utilization safety detection result aiming at input information;

collecting power utilization information of a user side when the user side has a leakage fault, an arc fault, a short-circuit fault and normal power utilization;

extracting features based on the collected power utilization information, and setting corresponding labels according to the user side state when the power utilization information is collected to obtain an initial training sample;

for each label, determining the information gain between each feature item after feature extraction and the label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the remaining filtered feature items according to the principle of enabling the classification accuracy of the initial random forest model to be maximum;

based on the optimal feature subset corresponding to each label, performing feature selection on the initial training sample to obtain a target training sample;

training the initial random forest model through the target training sample to obtain a trained target random forest model;

and detecting current power utilization information of a user side, selecting characteristics, and inputting the characteristics into the target random forest model to obtain a user power utilization safety detection result output by the target random forest model.

Preferably, the selecting an optimal feature subset from the feature items remaining after filtering according to a principle that the classification accuracy of the initial random forest model is the maximum includes:

for each label, filtering each feature item with the information gain lower than a preset threshold value, and traversing a feature space through an SFS algorithm to obtain a plurality of feature sets;

and respectively determining the classification accuracy of the initial random forest model under the condition of each feature set, and taking the feature set adopted when the classification accuracy of the initial random forest model is the highest as an optimal feature subset.

Preferably, after determining the information gain between each feature item and the tag, the method further includes:

punishment is carried out on the information gain of each characteristic item based on the information entropy of each characteristic item to obtain an information gain rate;

correspondingly, the filtering the feature items of which the information gain is lower than the preset threshold includes:

and comparing each information gain rate after punishment is finished with a preset threshold value, and filtering the characteristic items corresponding to each information gain rate lower than the preset threshold value.

Preferably, the constructed initial random forest model uses samples with put back, the parameter random _ state is set as a fixed value, and the parameter oob _ score is set as True.

Preferably, the collecting of the electricity consumption information of the user side includes:

the method comprises the steps of collecting current of a user side, voltage of the user side and residual current of the user side.

Preferably, the feature extraction based on the collected power utilization information includes:

based on the collected power utilization information, aiming at each sampling time point, extracting a current wave form factor, a current pulse factor, a current peak value factor, a current margin factor, a current kurtosis factor, a current energy index, a voltage wave form factor, a voltage pulse factor, a voltage peak value factor, a voltage margin factor, a voltage kurtosis factor, a voltage energy index, a residual current wave form factor, a residual current pulse factor, a residual current peak value factor, a residual current margin factor, a residual current kurtosis factor and a residual current energy index corresponding to the sampling time point.

Preferably, for the ith sampling time point, the extracted current form factor corresponding to the ith sampling time point

Voltage wave form factor

And residual current form factor

Respectively expressed as:

，

，

；

aiming at the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time point

Voltage pulse factor

And residual current pulse factor

Respectively expressed as:

，

，

；

aiming at the ith sampling time point, extracting a current peak value factor corresponding to the ith sampling time point

Peak value of voltageFactor(s)

And residual current peak factor

Respectively expressed as:

，

，

；

aiming at the ith sampling time point, the extracted current margin factor corresponding to the ith sampling time point

Factor of voltage margin

And a residual current margin factor

Respectively expressed as:

，

，

；

aiming at the ith sampling time point, extracting a current kurtosis factor corresponding to the ith sampling time point

Voltage kurtosis factor

And a residual current kurtosis factor

Respectively expressed as:

，

，

；

aiming at the ith sampling time point, the extracted current energy index corresponding to the ith sampling time point

Voltage energy index

And residual current energy index

Respectively expressed as:

，

，

；

wherein,

the remaining current value of the ue at the ith sampling time point,

the current value of the ue at the ith sampling time point,

the voltage value of the ue at the ith sampling time point,

in order to acquire the electricity consumption information of the user terminal, the number of sampling time points in a single power frequency period,

is the current peak value in nearly one power frequency period,

is close to the voltage peak value in a power frequency period,

the residual current peak value in nearly one power frequency period,

the maximum value of the current energy in nearly one power frequency period,

the maximum value of the voltage energy in nearly one power frequency period,

the maximum value of the residual current energy in nearly one power frequency period.

A consumer electricity safety detection system, comprising:

the initial random forest model construction module is used for constructing an initial random forest model for outputting a user electricity utilization safety detection result aiming at input information;

the power consumption information acquisition module is used for acquiring the power consumption information of the user side when the user side has an electric leakage fault, an electric arc fault, a short-circuit fault and normal power consumption;

the characteristic extraction module is used for extracting characteristics based on the collected power utilization information, and setting corresponding labels according to the user side state when the power utilization information is collected to obtain an initial training sample;

the optimal feature subset selection module is used for determining information gain between each feature item after feature extraction and each label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the remaining filtered feature items according to the principle that the classification accuracy of the initial random forest model is the maximum;

the target training sample determining module is used for performing feature selection on the initial training sample based on the optimal feature subset corresponding to each label to obtain a target training sample;

the training module is used for training the initial random forest model through the target training sample to obtain a trained target random forest model;

and the execution module is used for detecting the current power utilization information of the user side, inputting the current power utilization information into the target random forest model after the characteristics are selected, and obtaining the user power utilization safety detection result output by the target random forest model.

A consumer electricity safety detection device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the user electricity safety detection method as described above.

A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the user electricity safety detection method as set forth above.

By applying the technical scheme provided by the embodiment of the invention, when the user side has an electric leakage fault, when the user side has an electric arc fault, when the user side has a short-circuit fault and when the user side normally uses electricity, the electricity utilization information of the user side is collected, the feature extraction is further carried out based on the collected electricity utilization information, and the corresponding label is set according to the state of the user side when the electricity utilization information is collected to obtain the initial training sample.

In addition, the scheme of the application adopts a random forest model to detect the electricity utilization safety of the user, the random forest model is based on the output of all decision trees in the random forest model, and a few principles of obeying majority are adopted as the final electricity utilization sensing result of the user, so that the probability of over-fitting can be greatly reduced, the generalization capability is enhanced, and the situation of local extreme value is not easy to fall into.

Furthermore, the method and the device consider that the accuracy of fault identification detection is high and low, not only is related to the quality of the model, but also the quality of the feature data is crucial to the accuracy of fault identification, and proper feature selection can provide semantics and information of the original feature data, so that a good classification effect can be obtained even through a simple classification model. Therefore, for each label, the information gain between each feature item after feature extraction and the label is determined, and each feature item with the information gain lower than the preset threshold is filtered, so that the number of irrelevant redundant information in the feature set is effectively reduced, and the detection efficiency of the target random forest model in the application process is improved. Furthermore, according to the principle that the classification accuracy of the initial random forest model is the maximum, the optimal feature subset is selected from the remaining feature items after filtering, the number of irrelevant redundant information in the feature set is further reduced, the selected optimal feature subset is favorably ensured to have good user electricity safety detection performance, and the detection accuracy is improved. And the scheme of the application is to screen the feature items, so that the natural semantics of the original feature items can not be damaged.

To sum up, the scheme of this application can carry out user's power consumption safety inspection high-efficiently, accurately, has kept the natural semantics of characteristic item, and can avoid appearing traditional artificial intelligence model and appear local extremum easily, overfitting, the poor condition of generalization ability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of a method for detecting user electricity utilization safety in the present invention;

FIG. 2a is a schematic diagram of Bootstrap with sample put back in accordance with the present invention;

FIG. 2b is a schematic diagram of a multi-decision tree principle of a random forest model according to the present invention;

fig. 3 is a schematic structural diagram of a user electricity utilization safety detection system according to the present invention.

Detailed Description

The core of the invention is to provide an implementation flow chart of the user electricity utilization safety detection method, which can efficiently and accurately detect the user electricity utilization safety, retain the natural semantics of the characteristic items and avoid the situations that the traditional artificial intelligence model is easy to have local extreme values, overfitting and poor generalization capability.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a user electricity consumption safety detection method according to the present invention, where the user electricity consumption safety detection method may include the following steps:

step S101: and constructing an initial random forest model for outputting the power utilization safety detection result of the user aiming at the input information.

Specifically, in the scheme of the application, user electricity utilization safety detection is performed based on a random forest model, and the model which is constructed and not trained is called an initial random forest model.

In a specific occasion, when constructing the initial Random Forest model, random Forest Regressor can be used for constructing the model, and the model is led into a relevant packet to be subjected to parameter setting.

When constructing the initial random forest model, specific parameters may be set and adjusted according to actual needs, for example, in a specific case, n _ estimators may be set to 15, and n \/u estimators parameter represents the maximum number of weak learners (decision trees). Generally, if n _ estimators is too small, the model is prone to under-fitting, and if n _ estimators is too large, it is prone to over-fitting, and a moderate value is generally selected.

The parameter oob _ score is set to False by default. However, the present application contemplates that this parameter can be set to True because the out-of-bag score reflects the generalization ability of a model after fitting. The parameter oob _ score represents whether the out-of-bag samples are used to evaluate the model. With the samples put back, approximately 36.8% Of the data that was not sampled, called Out Of Bag data (Out Of Bag), did not participate in the fitting Of the training set model and therefore could be used to detect the generalization capability Of the model.

The parameter criterion may be set to the kini coefficient gini. The parameter criterion represents the evaluation criterion of the characteristics when the decision tree is divided.

The parameter random _ state may be set to some fixed value to ensure that the same random forest model is generated each time the algorithm is run. random _ state is the random number seed of randomly selected samples when each tree is sampled by using Bootstrap in the Bagging strategy (i.e. there is a back-put out-of-bag random sampling).

The parameter max _ depth may be set to a default state of no input, and when the parameter is set to no input, the decision tree will not limit the depth of the subtree when it is built.

The parameter max _ features may not be set and default values are used. The parameter max _ features determines the randomness size of each tree, and smaller can reduce overfitting. If max _ features is large, then the trees in the random forest will be very similar. If max _ features is small, the trees in the random forest will be very different. The depth of each tree is large in order to fit the data better.

The Bootstrap parameter may be set to the default state of True, i.e. the initial random forest model being constructed takes samples with put back. The Bootstrap parameter indicates whether there are replacement samples, a bagging method can be applied from the dataset, and the subdata sets can be extracted randomly and with replacement for training of the single decision trees in the forest. That is, each decision tree is guaranteed to use different data sets, and the data sets are similar but different after training.

Referring to fig. 2a, a schematic diagram of boottrap with a sample put back is shown. When there is a sample-back using boottrap, step (1) may be performed to extract a certain number n of samples from the original samples M using a resampling technique, which allows for resampling. (2) calculating a predetermined statistic T from the extracted samples. (3) Repeating the above (1) and (2) N times (generally more than 1000), and obtaining N statistics T. (4) And calculating the sample variance of the N statistics T to obtain the variance of the statistics. And (5) the remaining statistics such as the mean of the population can be estimated.

Step S102: when the user side has an electric leakage fault, an arc fault, a short-circuit fault and normal power consumption, the power consumption information of the user side is collected.

The scheme of this application needs to carry out user power consumption safety inspection, consequently, when carrying out the collection of the power consumption information of user side, the power consumption information when needing to gather the normal power consumption of user side to and the power consumption information when the user side trouble. In consideration of the current common fault types of the leakage fault, the arc fault and the short-circuit fault, in the scheme of the application, the power utilization information of the user side is collected when the user side has the leakage fault, the user side has the arc fault, the user side has the short-circuit fault and the user side normally utilizes power.

The leakage fault is that the phase line of the protected line is connected to the earth directly or through an unexpected load, and a residual current which is approximately sinusoidal and has a slowly changing effective value is generated. The fault arc is a gas free discharge phenomenon caused by air breakdown or electrical connection looseness caused by line insulation aging, breakage, air moisture and the like. A short-circuit fault is a fault that occurs when a circuit or a part of a circuit is shorted.

In practical application, the conditions of user side leakage faults, arc faults and short-circuit faults can be simulated through experiments, and then in the experiment process, the power utilization information of the user side when corresponding faults occur is collected through the sensing device. Of course, besides the experiment, the power utilization information of the user terminal when the corresponding fault occurs can be obtained in a field monitoring mode.

In addition, when step S102 is executed, the specific collected power consumption information items may be set and adjusted according to actual needs, and it can be understood that power consumption information capable of reflecting the power consumption fault of the user terminal may be selected for collection.

For example, in an embodiment of the present invention, considering that the current, the voltage and the residual current of the user terminal can effectively reflect whether the user terminal has the power failure, the collecting of the power consumption information of the user terminal described in step S102 may specifically include:

and collecting the current of the user terminal, the voltage of the user terminal and the residual current of the user terminal.

The current at the user end is the total current at the user end, and in the case of low-voltage users, the current is usually the current on the live wire. In addition, when three-phase power is used for a small portion, one phase may be selected arbitrarily to detect the phase current. Correspondingly, the voltage of the user terminal is the total voltage of the user, namely the voltage between the live wire and the zero line. The residual current refers to the current with the vector sum of the current of each phase (including neutral line) in the low-voltage distribution line being not zero. Generally speaking, when an accident occurs on the electricity utilization side, current flows from an electrified body to the ground through a human body, and the effective value of instantaneous vector synthesis of the current is called residual current and commonly called electric leakage.

In addition, in other occasions, the collected electricity utilization information can also comprise other types, such as cable temperature and the like, according to actual needs.

Step S103: and performing feature extraction based on the collected power utilization information, and setting a corresponding label according to the state of the user side when the power utilization information is collected to obtain an initial training sample.

The power consumption information of the user side acquired in step S102 is the original acquired data, and in order to effectively train the initial random forest model and obtain a high-performance target random forest model, the feature extraction is performed based on the acquired power consumption information. Of course, there are many specific feature extraction items, which can be set according to actual situations. And it can be understood that, in the same way as selecting the specific item of the collected power consumption information, when the feature extraction is performed, the feature capable of effectively reflecting the power consumption fault condition of the user terminal can be extracted.

In addition, it should be noted that the user electricity utilization safety detection result output by the target random forest model can reflect a specific fault type, so that when the electricity utilization information of the user terminal is collected in step S102, the collection of the electricity utilization information needs to be performed when an electric leakage fault, an arc fault, a short circuit fault and normal electricity utilization of the user terminal occur at the user terminal, and similarly, after the feature extraction is performed based on the collected electricity utilization information, a corresponding tag needs to be set according to the state of the user terminal when the electricity utilization information is collected, so as to obtain an initial training sample.

Therefore, the scheme of the present application has 4 kinds of labels, which respectively represent "leakage fault", "arc fault", "short-circuit fault" and "normal power consumption". For example, a label 000 indicates "normal power usage", a label 100 indicates "leakage fault", a label 010 indicates "arc fault", and a label 001 indicates "short-circuit fault". It can be understood that which label is specifically set for a training sample depends on the power consumption information of the training sample is collected under what conditions.

In addition, when the power consumption information of the user terminal is collected, the sampling frequency can be set according to the requirement, for example, 10kHz is set in one occasion, the power frequency is 50Hz, and each power frequency period comprises 200 sampling time points, which are 200 sampling points for short. In practical application, for each situation that a user side has an electric leakage fault, an arc fault, a short-circuit fault and the user side normally uses electricity, training samples under the situation can be numbered, namely, i in the ith sampling time point is a positive integer.

In some cases, when feature extraction is performed on the power consumption information acquired at the ith sampling point, data before the ith sampling point, for example, power consumption information acquired in one power frequency cycle closest to the ith sampling point, is used, and therefore, in practical applications, feature extraction may be performed not from the 1 st sampling point when feature extraction is performed based on the acquired power consumption information. For example, in one case, feature extraction may be performed from the second power frequency cycle, that is, from the nth sampling time point. And it can be understood that classification processing is required for the collected electricity consumption information under 4 different conditions.

In one embodiment of the invention. The feature extraction based on the collected power consumption information in step S103 may specifically include:

based on the collected power utilization information, aiming at each sampling time point, extracting a current form factor, a current pulse factor, a current peak factor, a current margin factor, a current kurtosis factor, a current energy index, a voltage form factor, a voltage pulse factor, a voltage peak factor, a voltage margin factor, a voltage kurtosis factor, a voltage energy index, a residual current form factor, a residual current pulse factor, a residual current peak factor, a residual current margin factor, a residual current kurtosis factor and a residual current energy index corresponding to the sampling time point.

In the above feature extraction, the specific calculation method may be set according to actual conditions.

The form factor is also called a form index, which is the ratio of the root mean square value to the absolute mean value of the signal (i-th sampling point time), and can reflect the difference and distortion degree of the actual form compared with the standard sine wave. Therefore, the formula for calculating the form factor can be expressed as:

. Therein

Is the root-mean-square value of the signal,

is the absolute mean of the signal. Since the signal of the present application can be specifically current, voltage, total current, i.e. signal of ith sampling point moment

Can be embodied as current of user terminal

Voltage of user terminal

And residual current of the user terminal

。

Therefore, for the ith sampling time point, the extracted current form factor corresponding to the ith sampling time point

Voltage wave form factor

And residual current form factor

Respectively expressed as:

，

，

. Wherein,

the remaining current value of the ue at the ith sampling time point,

the current value of the ue at the ith sampling time point,

the voltage value of the ue at the ith sampling time point,

in order to count the sampling time points of a single power frequency cycle when collecting the electricity consumption information of the user terminal, in the above embodiment, each power frequency cycle includes 200 sampling time points, then

Is 200.

The pulse factor is also called pulse index, which is the ratio of the signal peak value to the absolute mean value (rectified mean value), and reflects the impulse property of the signal, therefore, the calculation formula of the pulse factor can be expressed as:

。

i.e. the peak value of the signal,

the signal peak in a power frequency period before the ith sampling time point is indicated. Therefore, in an embodiment of the present invention, for the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time point

Voltage pulse factor

And residual current pulse factor

Respectively expressed as:

，

，

. Wherein,

is the current peak value in nearly one power frequency period,

is the voltage peak value in nearly one power frequency period,

the residual current peak value in nearly one power frequency period is obtained.

The peak factor is also called peak index, which is the ratio of the signal peak value to the root mean square value, and can reflect the extreme degree of the peak value compared with the whole waveform. When the impact signal occurs, the peak value of the waveform is increased, and the index is increased accordingly. The peak index calculation formula can be expressed as:

。

therefore, in an embodiment of the present invention, for the ith sampling time point, the extracted current crest factor corresponding to the ith sampling time point

Voltage peak factor

And residual current crest factor

Respectively expressed as:

，

，

。

the margin factor is also called a margin index, which is the ratio of the signal peak value and the square root amplitude value, and can reflect the fullness degree of the waveform. The margin index calculation formula can be expressed as:

。

therefore, in an embodiment of the present invention, for the ith sampling time point, the extracted current margin factor corresponding to the ith sampling time point

Factor of voltage margin

And a residual current margin factor

Respectively expressed as:

，

，

。

the kurtosis factor, also called kurtosis index, is defined as normalized 4-order central moments, which reflect the smoothness of the waveform. The kurtosis index is very sensitive to impact components in the signal, and the larger the energy of the impact components is, the larger the kurtosis value is, and the gentler the waveform is. The kurtosis index calculation formula can be expressed as:

. Therein

A kurtosis value is indicated.

Therefore, in an embodiment of the present invention, for the ith sampling time point, the extracted current kurtosis factor corresponding to the ith sampling time point is extracted

Voltage kurtosis factor

And a residual current kurtosis factor

Respectively expressed as:

，

，

。

considering that after an arc fault occurs, part of electric energy can be converted into energy in other forms to be dissipated, energy calculation is carried out on a current waveform, and an energy index is obtained by carrying out non-dimensionalization according to the maximum value of the energy in the current period, so that the change condition of the energy is explored. The calculation formula defining the energy index may be expressed as:

. Therein

In order to be able to do so,

is the period energy maximum.

In one embodiment of the invention,aiming at the ith sampling time point, extracting a current energy index corresponding to the ith sampling time point

Index of voltage energy

And residual current energy index

Respectively expressed as:

，

，

(ii) a Wherein,

the maximum value of the current energy in nearly one power frequency period,

the maximum value of the voltage energy in nearly one power frequency period,

Step S104: and determining the information gain between each feature item after feature extraction and the label for each label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the filtered remaining feature items according to the principle of maximizing the classification accuracy of the initial random forest model.

According to the method and the device, the accuracy rate of fault identification and detection is considered to be high and low and is not only related to the quality of the model, but also the quality of the feature data is crucial to the accuracy rate of fault identification, the proper feature selection can provide the semantics and information of the original feature data, and even a simple classification model can obtain a good classification effect.

For example, after feature extraction is performed according to the above embodiment, each set of initial training samples obtained may include 18-dimensional features, and in other situations, when more features are extracted, the dimensions may be higher, so that a large amount of data needs to be analyzed in real time when the target random forest model is subsequently used. In contrast, the method and the device have the advantages that the features are screened through the operation of the step S104, the optimal feature subset is obtained, namely irrelevant features are removed according to a certain evaluation criterion, and the most effective features are reserved, so that the fluctuation of the features is effectively reduced, the natural semantics of original feature data are reserved, the quantity of irrelevant redundant information in the feature set is reduced, and further the target random forest model can efficiently and accurately perform user electricity utilization safety detection.

Specifically, the method adopts a Filter + Wrapper mixed mode to select the features and selects the optimal feature subset. The Filter filtering method is to calculate the intrinsic characteristics of a certain characteristic subset, such as the association degree, information amount, sample distance and the like of a characteristic item and a class label, so as to judge whether the characteristic subset expresses and distinguishes data to the greatest extent. The Wrapper encapsulation method directly uses a target data processing task as an evaluation system, namely similar to black box test, and does not know the characteristics of the selected subset, and only how good the data processing effect is based on the subset. Therefore, when evaluating the subset, the operation required to run the specific data processing task is a process of continuously and circularly improving feedback according to the classification result.

When a Filter filtering method is adopted, the specific scheme of the method is to determine the information gain between each feature item after feature extraction and the label, and Filter each feature item of which the information gain is lower than a preset threshold value.

Specifically, if the random variables X and Y are used separately

And

to indicate that the user is not in a normal position,

and

as a probability density function, the entropy H (X) of the random variable X can then be defined as:

. Likewise, the entropy H (Y) of the random variable Y can be defined as:

。

the conditional entropy of the random variables X and Y is defined as:

。

the Information Gain (IG) is the amount of information that can be used to measure the correlation between two variables, the larger the value, the greater the correlation between the variables. The information gain has no symmetry, and the correlation among the characteristics can be measured from the nonlinear angle, so the relationship among the information gain, the entropy and the conditional entropy is

. That is, when

When =0, it is described that the variable X and the variable Y are not correlated. Whereas if the variables X and Y are more correlated, the value of IG (X | Y) is larger.

The information gain can be used to measure how much a feature contributes to the current system classification, which can contribute to a reduction in noise sensitivity in the samples.

In the scheme of the application, each feature item with information gain lower than a preset threshold needs to be filtered. The preset threshold may be set as needed, for example, when the threshold is set to 0, the feature item will be filtered only when the feature item is completely unrelated to the tag.

It should be noted that, in the present application, filtering is performed on different tags, that is, filtering is performed on different tags. For example, if the correlation between the feature a and the tag 1 (for example, the tag 1 is an arc fault) is 0, the feature item a in the tag 1 is filtered, and for example, for the remaining 3 tags, the information gain between the feature a and the corresponding tag exceeds a preset threshold, and for the remaining 3 tags, the feature a is not filtered. That is to say, in this example, when the target training sample is obtained by performing feature selection on the initial training sample subsequently, the feature a in the initial training sample labeled as the arc fault is deleted, and the features a under the remaining labels are not deleted.

Further, in an embodiment of the present invention, the information gain is biased toward the more selective feature, which may cause the over-fitting phenomenon. Thus, more branching features may be penalized.

That is, in an embodiment of the present invention, after determining the information gain between each feature item and the tag, the method may further include:

and punishing the respective information gain of each characteristic item based on the information entropy of each characteristic item to obtain an information gain rate.

And the penalized information gain is in negative correlation with the information entropy of the characteristic item;

correspondingly, filtering each feature item of which the information gain is lower than a preset threshold value, including:

In this embodiment, the information gain of each feature item is increasedThe rows are penalized, and it will be appreciated that the more branches of a feature, the higher the penalty. For example, in one embodiment of the invention by

Penalty for information gain, calculated

I.e. the information gain rate of the feature item X.

Wherein,

representing the information gain between the feature item X and the label Y,

the penalty factor corresponding to the feature item X under the label Y is represented, namely the value entropy of the feature X,

representing the information gain ratio between the feature item X and the label Y.

It can be seen that the information gain rate of the random variable X is positively correlated with its information gain, and negatively correlated with its entropy, i.e., the branch of the feature. Therefore, if the random variable X takes a larger value, the information gain rate of X is reduced, which is beneficial to reduce the selection preference.

It can be understood that, if no punishment is performed, each calculated information gain is directly compared with a preset threshold, and in this embodiment, punishment of the information gain needs to be performed according to different branch numbers after the information gain is calculated, so that each information gain rate after the punishment is completed is respectively compared with the preset threshold, and then the feature items corresponding to each information gain rate lower than the preset threshold are filtered.

When Wrapper packaging is carried out, the optimal feature subset is selected from the feature items remaining after filtering according to the principle that the classification accuracy of the initial random forest model is the maximum, and the specific algorithm can be various.

For example, in an embodiment of the present invention, according to a principle that the classification accuracy of the initial random forest model is maximized, the selecting the optimal feature subset from the feature items remaining after filtering may specifically include:

In this embodiment, the feature items may be sorted in descending order according to the information gain calculated in the Filter filtering stage. It should be noted that, if the scheme with information gain penalty is adopted, the feature items may be sorted in a descending order according to the information gain rate calculated in the Filter filtering stage. A plurality of feature sets can be obtained by traversing the feature space through an SFS (Sequential Forward Selection) algorithm. Then, the classification accuracy of the initial random forest model under the condition of each feature set can be respectively determined, for example, the classification accuracy of each feature set is calculated by a random forest algorithm. When one feature set is selected, the classification accuracy is the highest, and the feature set is used as the required optimal feature subset.

The SFS algorithm is a bottom-up method, wherein the first feature selects a single optimal feature, the second feature selects a feature which is combined with the first feature to be optimal from all the other features, and each of the latter features selects a feature which is combined with the selected feature to be optimal. The advantage is that certain factors of the combination between the features are considered.

Step S105: and performing feature selection on the initial training sample based on the optimal feature subset corresponding to each label to obtain a target training sample.

For each label, through the feature selection of the Filter + Wrapper in the step S104, the optimal feature subset corresponding to the label can be obtained, so as to perform feature selection on the initial training sample, and obtain the target training sample.

Step S106: and training the initial random forest model through the target training sample to obtain a trained target random forest model.

After the target training sample is obtained, the initial random forest model can be trained through the target training sample, and a trained target random forest model is obtained.

In practice, most of the target training samples may be used for training and a small part for testing, for example, 80% of the sample data may be used for training and 20% of the sample data may be used for testing.

The random forest model is a classifier that contains a plurality of decision trees and whose output classes are dependent on the mode of the class output by the individual trees. Referring to fig. 2b, the schematic diagram of the multi-decision tree principle of the random forest model is shown, after the optimal feature subset corresponding to each label is determined, feature selection may be performed on an initial training sample to obtain a target training sample, the target training sample is split into a training set and a test set, then a plurality of training sample subsets are randomly extracted from the training set by using a boottrap method, decision tree modeling is performed on each subset, decision results of a plurality of trees are integrated, and a final model for user power consumption safety detection is obtained by voting. Namely, each decision tree outputs a fault state discrimination result, and the recognition results of all decision trees are counted according to the principle of 'few obeys majority', so that the perception state with the largest result proportion is taken as a final result.

Step S107: and detecting the current power utilization information of the user side, selecting the characteristics, and inputting the characteristics into the target random forest model to obtain a user power utilization safety detection result output by the target random forest model.

After the target random forest model is obtained through training, the target random forest model can be used, namely data are input into the target random forest model, the target random forest model can output a user electricity utilization safety detection result, for example, the output 000 represents normal electricity utilization, the output 100 represents leakage fault, the output 010 represents arc fault, and the output 001 represents short-circuit fault.

In addition, it can be understood that, when step S107 is executed, the current power consumption information of the user terminal is detected, and then the current power consumption information is input into the target random forest model after the feature selection, where the feature selection is performed, that is, the feature selection is performed according to the determined optimal feature subset corresponding to each label.

Furthermore, the method considers that the accuracy rate of fault identification detection is not only related to the quality of the model, but also the quality of the feature data is crucial to the accuracy rate of fault identification, and proper feature selection can provide semantics and information of the original feature data, so that a good classification effect can be obtained even through a simple classification model. Therefore, for each label, the information gain between each feature item after feature extraction and the label is determined, and each feature item with the information gain lower than the preset threshold value is filtered, so that the quantity of irrelevant redundant information in the feature set is effectively reduced, and the detection efficiency of the target random forest model in the application process is improved. Furthermore, according to the principle of maximizing the classification accuracy of the initial random forest model, the optimal feature subset is selected from the feature items left after filtering, so that the number of irrelevant redundant information in the feature set is further reduced, the selected optimal feature subset is ensured to have good user electricity safety detection performance, and the detection accuracy is improved. And the scheme of the application is to screen the feature items, so that the natural semantics of the original feature items can not be damaged.

Corresponding to the above method embodiment, the embodiment of the invention also provides a user electricity utilization safety detection system, which can be referred to in correspondence with the above.

Referring to fig. 3, a schematic structural diagram of a user electricity utilization safety detection system in the present invention is shown, including:

an initial random forest model construction module 301, configured to construct an initial random forest model that outputs a user power consumption safety detection result for input information;

the power consumption information acquisition module 302 is used for acquiring power consumption information of the user side when the user side has a leakage fault, an arc fault, a short-circuit fault and normal power consumption;

the feature extraction module 303 is configured to perform feature extraction based on the collected power consumption information, and set a corresponding tag according to a user side state when the power consumption information is collected, to obtain an initial training sample;

an optimal feature subset selection module 304, configured to determine, for each type of label, information gains between each feature item after feature extraction and the label, filter each feature item whose information gain is lower than a preset threshold, and select an optimal feature subset from the remaining feature items after filtering according to a principle that the initial random forest model classification accuracy is maximized;

a target training sample determining module 305, configured to perform feature selection on the initial training sample based on the optimal feature subset corresponding to each label, to obtain a target training sample;

the training module 306 is used for training the initial random forest model through the target training sample to obtain a trained target random forest model;

and the execution module 307 is configured to detect current power consumption information of the user side, input the current power consumption information to the target random forest model after the characteristics are selected, and obtain a user power consumption safety detection result output by the target random forest model.

In a specific embodiment of the present invention, the selecting an optimal feature subset from the feature items remaining after filtering according to a principle of maximizing the classification accuracy of the initial random forest model includes:

In an embodiment of the present invention, after determining the information gain between each feature item and the tag, the optimal feature subset selection module 304 is further configured to:

In a specific embodiment of the present invention, the constructed initial random forest model uses samples with put back, the parameter random _ state is set to a fixed value, and the parameter oob _ score is set to True.

In a specific embodiment of the present invention, the collecting of the electricity consumption information at the user side includes:

In a specific embodiment of the present invention, the feature extraction based on the collected power consumption information includes:

In an embodiment of the present invention, for the ith sampling time point, the extracted current form factor corresponding to the ith sampling time point

Voltage wave form factor

And residual current form factor

Respectively expressed as:

，

，

；

Voltage pulse factor

And residual current pulse factor

Respectively expressed as:

，

，

；

extracting a current peak factor corresponding to the ith sampling time point for the ith sampling time point

Voltage peak factor

And residual current peak factor

Respectively expressed as:

，

，

；

Factor of voltage margin

And a residual current margin factor

Respectively expressed as:

，

，

；

Voltage kurtosis factor

And a residual current kurtosis factor

Respectively expressed as:

，

，

；

Index of voltage energy

And residual current energy index

Respectively expressed as:

，

，

；

wherein,

the remaining current value of the ue at the ith sampling time point,

the current value at the ue side at the ith sampling time point,

the voltage value of the ue at the ith sampling time point,

in order to collect the electricity consumption information of the user terminal, the number of sampling time points in a single power frequency period,

is the current peak value in nearly one power frequency period,

is the voltage peak value in nearly one power frequency period,

the residual current peak value in nearly one power frequency period,

the maximum value of the current energy in nearly one power frequency period,

the maximum value of the voltage energy in nearly one power frequency period,

Corresponding to the above method and system embodiments, the embodiments of the present invention further provide a user electricity utilization safety detection device and a computer readable storage medium, which can be referred to in correspondence with the above.

The user electricity safety detection device may include:

a memory for storing a computer program;

a processor for executing a computer program to implement the steps of the user electricity safety detection method in any of the above embodiments.

The computer readable storage medium has a computer program stored thereon, and the computer program, when executed by a processor, implements the steps of the user electricity safety detection method in any of the above embodiments. The computer-readable storage medium referred to herein may include Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, without departing from the principle of the present invention, several improvements and modifications can be made to the present invention, and these improvements and modifications also fall into the protection scope of the present invention.

Claims

1. A user electricity utilization safety detection method is characterized by comprising the following steps:

extracting features based on the collected power utilization information, and setting corresponding labels according to the state of the user side when the power utilization information is collected to obtain an initial training sample;

detecting current power utilization information of a user side, selecting characteristics, and inputting the characteristics to the target random forest model to obtain a user power utilization safety detection result output by the target random forest model;

the collection of the power utilization information of the user side comprises the following steps:

collecting current of a user side, voltage of the user side and residual current of the user side;

the power consumption information based on collection carries out feature extraction, including:

extracting a current wave form factor, a current pulse factor, a current peak factor, a current margin factor, a current kurtosis factor, a current energy index, a voltage wave form factor, a voltage pulse factor, a voltage peak factor, a voltage margin factor, a voltage peak factor, a voltage kurtosis factor, a voltage energy index, a residual current wave form factor, a residual current pulse factor, a residual current peak factor, a residual current margin factor, a residual current kurtosis factor and a residual current energy index corresponding to each sampling time point based on the collected power utilization information;

aiming at the ith sampling time point, the extracted current form factor corresponding to the ith sampling time pointS _Iaf Voltage wave form factorS _Uf And residual current form factorS _Idf Respectively expressed as:

；

aiming at the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time pointC _Iaf Factor of voltage pulseC _Uf And residual current pulse factorC _Idf Respectively expressed as:

；

aiming at the ith sampling time point, extracting a current peak value factor corresponding to the ith sampling time pointI _Iaf Voltage peak factorI _Uf And residual current peak factorI _Idf Respectively expressed as:

；

aiming at the ith sampling time point, the extracted current margin factor corresponding to the ith sampling time pointCL _Iaf Factor of voltage marginCL _Uf And a residual current margin factorCL _Idf Respectively expressed as:

；

aiming at the ith sampling time point, extracting a current kurtosis factor corresponding to the ith sampling time pointK _Iav Voltage kurtosis factorK _Uv And a residual current kurtosis factorK _Idv Respectively expressed as:

；

aiming at the ith sampling time point, the extracted current energy index corresponding to the ith sampling time pointE _Iaf Index of voltage energyE _Uf And residual current energy indexE _Idf Respectively expressed as:

；

wherein,Id _i the remaining current value of the ue at the ith sampling time point,Ia _i the current value of the ue at the ith sampling time point,U _i the voltage value of the ue at the ith sampling time point,Nthe number of sampling time points of a single power frequency cycle when collecting the power consumption information of the user side, max: (Ia _i ) Is the current peak value in nearly one power frequency period,max（U _i ) Is the voltage peak value, max (in nearly one power frequency period)Id _i ) Is the residual current peak value in nearly one power frequency period, max: (E _Ia ) Is the maximum value of current energy in nearly one power frequency period, max: (E _U ) Is the maximum voltage energy value, max (in nearly one power frequency period)E _Id ) The maximum value of the residual current energy in nearly one power frequency period.

2. The user electricity utilization safety detection method according to claim 1, wherein the extracting an optimal feature subset from the feature items remaining after filtering according to a principle that the classification accuracy of the initial random forest model is the maximum comprises:

3. The method for detecting the power consumption safety of the user according to claim 1, further comprising, after determining the information gain between each feature item and the tag:

4. The user power safety detection method according to claim 1, wherein the constructed initial random forest model uses samples with a set back, a parameter random _ state is set to a fixed value, and a parameter oob _ score is set to True.

5. A safety detection system for electricity consumption of a user is characterized by comprising:

the optimal feature subset selection module is used for determining the information gain between each feature item after feature extraction and each label aiming at each label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the remaining filtered feature items according to the principle of enabling the classification accuracy of the initial random forest model to be maximum;

the execution module is used for detecting the current power utilization information of the user side, inputting the current power utilization information to the target random forest model after the characteristics are selected, and obtaining a user power utilization safety detection result output by the target random forest model;

the collection of the power consumption information of the user side comprises the following steps:

；

aiming at the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time pointC _Iaf Voltage pulse factorC _Uf And residual current pulse factorC _Idf Respectively expressed as:

；

；

；

；

；

wherein,Id _i the remaining current value at the ue side at the ith sampling time point,Ia _i the current value of the ue at the ith sampling time point,U _i the voltage value of the ue at the ith sampling time point,Nthe number of sampling time points of a single power frequency cycle when collecting the electricity consumption information of the user end, max: (Ia _i ) Is the current peak value in nearly one power frequency period，max（U _i ) Is the voltage peak value, max (in nearly one power frequency period)Id _i ) Is the residual current peak value in nearly one power frequency period, max: (E _Ia ) Is the maximum value of current energy in nearly one power frequency period, max: (E _U ) Is the maximum voltage energy value, max (in nearly one power frequency period)E _Id ) The maximum value of the residual current energy in nearly one power frequency period.

6. A user electricity safety detection device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the user electricity safety detection method according to any one of claims 1 to 4.

7. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the user electricity safety detection method according to any one of claims 1 to 4.