[go: up one dir, main page]

CN115409134B - User electricity utilization safety detection method, system, equipment and storage medium - Google Patents

User electricity utilization safety detection method, system, equipment and storage medium Download PDF

Info

Publication number
CN115409134B
CN115409134B CN202211359242.0A CN202211359242A CN115409134B CN 115409134 B CN115409134 B CN 115409134B CN 202211359242 A CN202211359242 A CN 202211359242A CN 115409134 B CN115409134 B CN 115409134B
Authority
CN
China
Prior art keywords
factor
current
sampling time
time point
voltage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211359242.0A
Other languages
Chinese (zh)
Other versions
CN115409134A (en
Inventor
慕静茹
刘冬寅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan 123 Intelligent Technology Co ltd
Original Assignee
Hunan 123 Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan 123 Intelligent Technology Co ltd filed Critical Hunan 123 Intelligent Technology Co ltd
Priority to CN202211359242.0A priority Critical patent/CN115409134B/en
Publication of CN115409134A publication Critical patent/CN115409134A/en
Application granted granted Critical
Publication of CN115409134B publication Critical patent/CN115409134B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses user power consumption safety detection method, system, equipment and storage medium, is applied to machine learning technical field, and includes: constructing an initial random forest model; when a user side normally uses electricity and generates electricity leakage, electric arc and short circuit, collecting electricity utilization information, extracting characteristics, and setting corresponding labels to obtain initial training samples; filtering the feature items based on the information gain, and selecting an optimal feature subset according to the principle of maximizing the classification accuracy of the initial random forest model; obtaining a target training sample based on the optimal feature subset and obtaining a target random forest model through training; and detecting the current power utilization information of the user side, selecting the characteristics, and inputting the characteristics into the target random forest model to obtain a user power utilization safety detection result. By applying the scheme, the power utilization safety detection of the user can be efficiently and accurately carried out, the natural semantics of the characteristic items are reserved, and the conditions of local extreme values, overfitting and poor generalization capability are avoided.

Description

User electricity utilization safety detection method, system, equipment and storage medium
Technical Field
The invention relates to the technical field of machine learning, in particular to a user electricity safety detection method, a system, equipment and a storage medium.
Background
With the progress of science and technology, various new technologies are continuously emerging, the application of electrified products is greatly popularized, various low-voltage apparatuses are distributed in various fields of production and life of people, and certain potential safety hazards are brought. The electrical safety accidents caused by aging of electrical lines, improper configuration and the like tend to rise year by year, and low-voltage fault arcs are one of the main causes of electrical fire. The low-voltage arc fault in the form of short circuit in the low-voltage distribution device still exists, so that the loss caused by the fault is very serious, and great hidden danger is brought to the safety of life and property of people. In addition, the gas explosion accident caused by electric leakage causes serious casualties of workers, which not only causes great pain to families, but also generates serious impact to the society. Therefore, there is a need for a method for power safety sensing for low voltage users, so as to detect fault arcs, leakage and short circuit faults in the low voltage user system in real time.
At present, some methods can perform intelligent sensing on the electricity utilization safety of low-voltage users based on feature selection and artificial intelligence classification models. Such as a feature selection algorithm based on a genetic algorithm, a feature selection algorithm based on a particle swarm algorithm, a feature selection algorithm based on evolutionary computation, and the like. At present, most of research focuses on the aspect of optimization design of an artificial intelligence classification model, when electrical short circuit, electric leakage and fault arc occur, the quantity of real-time data is difficult to estimate for a data acquisition system with high sampling frequency, the calculation cost consumed by the adopted feature selection algorithm is high, the accuracy and efficiency of fault identification are influenced, and the task of accurately and effectively selecting features of a large amount of data cannot be completed in a short time.
In addition, some traditional feature selection methods, such as a high-dimensional matrix dimension reduction method, can also change the natural semantics of the original feature items, and in some occasions, the accuracy of the operation result of the artificial intelligence classification model can be influenced. In addition, the currently used algorithm model may fall into a local extremum condition when solving all extrema of the complex nonlinear function, resulting in a failed model training. The current artificial intelligence model is also easy to have the situation of poor overfitting or generalization capability.
In conclusion, how to efficiently and accurately perform user power utilization safety detection, the natural semantics of the feature items are retained, and the situation that a traditional artificial intelligence model is easy to have local extremum, overfitting and poor generalization capability is avoided, which is a technical problem urgently needed to be solved by a person skilled in the art.
Disclosure of Invention
The invention aims to provide a user electricity utilization safety detection method, a system, equipment and a storage medium, so that the user electricity utilization safety detection can be efficiently and accurately carried out, the natural semantics of characteristic items are reserved, and the situations that a traditional artificial intelligent model is easy to have local extreme values, overfitting and poor generalization capability are avoided.
In order to solve the technical problems, the invention provides the following technical scheme:
a user electricity utilization safety detection method comprises the following steps:
constructing an initial random forest model for outputting a user electricity utilization safety detection result aiming at input information;
collecting power utilization information of a user side when the user side has a leakage fault, an arc fault, a short-circuit fault and normal power utilization;
extracting features based on the collected power utilization information, and setting corresponding labels according to the user side state when the power utilization information is collected to obtain an initial training sample;
for each label, determining the information gain between each feature item after feature extraction and the label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the remaining filtered feature items according to the principle of enabling the classification accuracy of the initial random forest model to be maximum;
based on the optimal feature subset corresponding to each label, performing feature selection on the initial training sample to obtain a target training sample;
training the initial random forest model through the target training sample to obtain a trained target random forest model;
and detecting current power utilization information of a user side, selecting characteristics, and inputting the characteristics into the target random forest model to obtain a user power utilization safety detection result output by the target random forest model.
Preferably, the selecting an optimal feature subset from the feature items remaining after filtering according to a principle that the classification accuracy of the initial random forest model is the maximum includes:
for each label, filtering each feature item with the information gain lower than a preset threshold value, and traversing a feature space through an SFS algorithm to obtain a plurality of feature sets;
and respectively determining the classification accuracy of the initial random forest model under the condition of each feature set, and taking the feature set adopted when the classification accuracy of the initial random forest model is the highest as an optimal feature subset.
Preferably, after determining the information gain between each feature item and the tag, the method further includes:
punishment is carried out on the information gain of each characteristic item based on the information entropy of each characteristic item to obtain an information gain rate;
correspondingly, the filtering the feature items of which the information gain is lower than the preset threshold includes:
and comparing each information gain rate after punishment is finished with a preset threshold value, and filtering the characteristic items corresponding to each information gain rate lower than the preset threshold value.
Preferably, the constructed initial random forest model uses samples with put back, the parameter random _ state is set as a fixed value, and the parameter oob _ score is set as True.
Preferably, the collecting of the electricity consumption information of the user side includes:
the method comprises the steps of collecting current of a user side, voltage of the user side and residual current of the user side.
Preferably, the feature extraction based on the collected power utilization information includes:
based on the collected power utilization information, aiming at each sampling time point, extracting a current wave form factor, a current pulse factor, a current peak value factor, a current margin factor, a current kurtosis factor, a current energy index, a voltage wave form factor, a voltage pulse factor, a voltage peak value factor, a voltage margin factor, a voltage kurtosis factor, a voltage energy index, a residual current wave form factor, a residual current pulse factor, a residual current peak value factor, a residual current margin factor, a residual current kurtosis factor and a residual current energy index corresponding to the sampling time point.
Preferably, for the ith sampling time point, the extracted current form factor corresponding to the ith sampling time point
Figure 632473DEST_PATH_IMAGE001
Voltage wave form factor
Figure 284034DEST_PATH_IMAGE002
And residual current form factor
Figure 532613DEST_PATH_IMAGE003
Respectively expressed as:
Figure 770565DEST_PATH_IMAGE004
Figure 647254DEST_PATH_IMAGE005
Figure 548345DEST_PATH_IMAGE006
aiming at the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time point
Figure 284220DEST_PATH_IMAGE007
Voltage pulse factor
Figure 217541DEST_PATH_IMAGE008
And residual current pulse factor
Figure 479895DEST_PATH_IMAGE009
Respectively expressed as:
Figure 207680DEST_PATH_IMAGE010
Figure 828190DEST_PATH_IMAGE011
Figure 299622DEST_PATH_IMAGE012
aiming at the ith sampling time point, extracting a current peak value factor corresponding to the ith sampling time point
Figure 291849DEST_PATH_IMAGE013
Peak value of voltageFactor(s)
Figure 580748DEST_PATH_IMAGE014
And residual current peak factor
Figure 291215DEST_PATH_IMAGE015
Respectively expressed as:
Figure 300759DEST_PATH_IMAGE016
Figure 288438DEST_PATH_IMAGE017
Figure 92446DEST_PATH_IMAGE018
aiming at the ith sampling time point, the extracted current margin factor corresponding to the ith sampling time point
Figure 680422DEST_PATH_IMAGE019
Factor of voltage margin
Figure 228078DEST_PATH_IMAGE020
And a residual current margin factor
Figure 460476DEST_PATH_IMAGE021
Respectively expressed as:
Figure 340445DEST_PATH_IMAGE022
Figure 759925DEST_PATH_IMAGE023
Figure 376852DEST_PATH_IMAGE024
aiming at the ith sampling time point, extracting a current kurtosis factor corresponding to the ith sampling time point
Figure 322811DEST_PATH_IMAGE025
Voltage kurtosis factor
Figure 734201DEST_PATH_IMAGE026
And a residual current kurtosis factor
Figure 516343DEST_PATH_IMAGE027
Respectively expressed as:
Figure 405802DEST_PATH_IMAGE028
Figure 471847DEST_PATH_IMAGE029
Figure 54138DEST_PATH_IMAGE030
aiming at the ith sampling time point, the extracted current energy index corresponding to the ith sampling time point
Figure 822111DEST_PATH_IMAGE031
Voltage energy index
Figure 780840DEST_PATH_IMAGE032
And residual current energy index
Figure 435812DEST_PATH_IMAGE033
Respectively expressed as:
Figure 454584DEST_PATH_IMAGE034
Figure 335952DEST_PATH_IMAGE035
Figure 973738DEST_PATH_IMAGE036
wherein,
Figure 624162DEST_PATH_IMAGE037
the remaining current value of the ue at the ith sampling time point,
Figure 548256DEST_PATH_IMAGE038
the current value of the ue at the ith sampling time point,
Figure 41554DEST_PATH_IMAGE039
the voltage value of the ue at the ith sampling time point,
Figure 342085DEST_PATH_IMAGE040
in order to acquire the electricity consumption information of the user terminal, the number of sampling time points in a single power frequency period,
Figure 486497DEST_PATH_IMAGE041
is the current peak value in nearly one power frequency period,
Figure 581492DEST_PATH_IMAGE042
is close to the voltage peak value in a power frequency period,
Figure 437452DEST_PATH_IMAGE043
the residual current peak value in nearly one power frequency period,
Figure 666308DEST_PATH_IMAGE044
the maximum value of the current energy in nearly one power frequency period,
Figure 25745DEST_PATH_IMAGE045
the maximum value of the voltage energy in nearly one power frequency period,
Figure 557221DEST_PATH_IMAGE046
the maximum value of the residual current energy in nearly one power frequency period.
A consumer electricity safety detection system, comprising:
the initial random forest model construction module is used for constructing an initial random forest model for outputting a user electricity utilization safety detection result aiming at input information;
the power consumption information acquisition module is used for acquiring the power consumption information of the user side when the user side has an electric leakage fault, an electric arc fault, a short-circuit fault and normal power consumption;
the characteristic extraction module is used for extracting characteristics based on the collected power utilization information, and setting corresponding labels according to the user side state when the power utilization information is collected to obtain an initial training sample;
the optimal feature subset selection module is used for determining information gain between each feature item after feature extraction and each label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the remaining filtered feature items according to the principle that the classification accuracy of the initial random forest model is the maximum;
the target training sample determining module is used for performing feature selection on the initial training sample based on the optimal feature subset corresponding to each label to obtain a target training sample;
the training module is used for training the initial random forest model through the target training sample to obtain a trained target random forest model;
and the execution module is used for detecting the current power utilization information of the user side, inputting the current power utilization information into the target random forest model after the characteristics are selected, and obtaining the user power utilization safety detection result output by the target random forest model.
A consumer electricity safety detection device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the user electricity safety detection method as described above.
A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the user electricity safety detection method as set forth above.
By applying the technical scheme provided by the embodiment of the invention, when the user side has an electric leakage fault, when the user side has an electric arc fault, when the user side has a short-circuit fault and when the user side normally uses electricity, the electricity utilization information of the user side is collected, the feature extraction is further carried out based on the collected electricity utilization information, and the corresponding label is set according to the state of the user side when the electricity utilization information is collected to obtain the initial training sample.
In addition, the scheme of the application adopts a random forest model to detect the electricity utilization safety of the user, the random forest model is based on the output of all decision trees in the random forest model, and a few principles of obeying majority are adopted as the final electricity utilization sensing result of the user, so that the probability of over-fitting can be greatly reduced, the generalization capability is enhanced, and the situation of local extreme value is not easy to fall into.
Furthermore, the method and the device consider that the accuracy of fault identification detection is high and low, not only is related to the quality of the model, but also the quality of the feature data is crucial to the accuracy of fault identification, and proper feature selection can provide semantics and information of the original feature data, so that a good classification effect can be obtained even through a simple classification model. Therefore, for each label, the information gain between each feature item after feature extraction and the label is determined, and each feature item with the information gain lower than the preset threshold is filtered, so that the number of irrelevant redundant information in the feature set is effectively reduced, and the detection efficiency of the target random forest model in the application process is improved. Furthermore, according to the principle that the classification accuracy of the initial random forest model is the maximum, the optimal feature subset is selected from the remaining feature items after filtering, the number of irrelevant redundant information in the feature set is further reduced, the selected optimal feature subset is favorably ensured to have good user electricity safety detection performance, and the detection accuracy is improved. And the scheme of the application is to screen the feature items, so that the natural semantics of the original feature items can not be damaged.
To sum up, the scheme of this application can carry out user's power consumption safety inspection high-efficiently, accurately, has kept the natural semantics of characteristic item, and can avoid appearing traditional artificial intelligence model and appear local extremum easily, overfitting, the poor condition of generalization ability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a method for detecting user electricity utilization safety in the present invention;
FIG. 2a is a schematic diagram of Bootstrap with sample put back in accordance with the present invention;
FIG. 2b is a schematic diagram of a multi-decision tree principle of a random forest model according to the present invention;
fig. 3 is a schematic structural diagram of a user electricity utilization safety detection system according to the present invention.
Detailed Description
The core of the invention is to provide an implementation flow chart of the user electricity utilization safety detection method, which can efficiently and accurately detect the user electricity utilization safety, retain the natural semantics of the characteristic items and avoid the situations that the traditional artificial intelligence model is easy to have local extreme values, overfitting and poor generalization capability.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a user electricity consumption safety detection method according to the present invention, where the user electricity consumption safety detection method may include the following steps:
step S101: and constructing an initial random forest model for outputting the power utilization safety detection result of the user aiming at the input information.
Specifically, in the scheme of the application, user electricity utilization safety detection is performed based on a random forest model, and the model which is constructed and not trained is called an initial random forest model.
In a specific occasion, when constructing the initial Random Forest model, random Forest Regressor can be used for constructing the model, and the model is led into a relevant packet to be subjected to parameter setting.
When constructing the initial random forest model, specific parameters may be set and adjusted according to actual needs, for example, in a specific case, n _ estimators may be set to 15, and n \/u estimators parameter represents the maximum number of weak learners (decision trees). Generally, if n _ estimators is too small, the model is prone to under-fitting, and if n _ estimators is too large, it is prone to over-fitting, and a moderate value is generally selected.
The parameter oob _ score is set to False by default. However, the present application contemplates that this parameter can be set to True because the out-of-bag score reflects the generalization ability of a model after fitting. The parameter oob _ score represents whether the out-of-bag samples are used to evaluate the model. With the samples put back, approximately 36.8% Of the data that was not sampled, called Out Of Bag data (Out Of Bag), did not participate in the fitting Of the training set model and therefore could be used to detect the generalization capability Of the model.
The parameter criterion may be set to the kini coefficient gini. The parameter criterion represents the evaluation criterion of the characteristics when the decision tree is divided.
The parameter random _ state may be set to some fixed value to ensure that the same random forest model is generated each time the algorithm is run. random _ state is the random number seed of randomly selected samples when each tree is sampled by using Bootstrap in the Bagging strategy (i.e. there is a back-put out-of-bag random sampling).
The parameter max _ depth may be set to a default state of no input, and when the parameter is set to no input, the decision tree will not limit the depth of the subtree when it is built.
The parameter max _ features may not be set and default values are used. The parameter max _ features determines the randomness size of each tree, and smaller can reduce overfitting. If max _ features is large, then the trees in the random forest will be very similar. If max _ features is small, the trees in the random forest will be very different. The depth of each tree is large in order to fit the data better.
The Bootstrap parameter may be set to the default state of True, i.e. the initial random forest model being constructed takes samples with put back. The Bootstrap parameter indicates whether there are replacement samples, a bagging method can be applied from the dataset, and the subdata sets can be extracted randomly and with replacement for training of the single decision trees in the forest. That is, each decision tree is guaranteed to use different data sets, and the data sets are similar but different after training.
Referring to fig. 2a, a schematic diagram of boottrap with a sample put back is shown. When there is a sample-back using boottrap, step (1) may be performed to extract a certain number n of samples from the original samples M using a resampling technique, which allows for resampling. (2) calculating a predetermined statistic T from the extracted samples. (3) Repeating the above (1) and (2) N times (generally more than 1000), and obtaining N statistics T. (4) And calculating the sample variance of the N statistics T to obtain the variance of the statistics. And (5) the remaining statistics such as the mean of the population can be estimated.
Step S102: when the user side has an electric leakage fault, an arc fault, a short-circuit fault and normal power consumption, the power consumption information of the user side is collected.
The scheme of this application needs to carry out user power consumption safety inspection, consequently, when carrying out the collection of the power consumption information of user side, the power consumption information when needing to gather the normal power consumption of user side to and the power consumption information when the user side trouble. In consideration of the current common fault types of the leakage fault, the arc fault and the short-circuit fault, in the scheme of the application, the power utilization information of the user side is collected when the user side has the leakage fault, the user side has the arc fault, the user side has the short-circuit fault and the user side normally utilizes power.
The leakage fault is that the phase line of the protected line is connected to the earth directly or through an unexpected load, and a residual current which is approximately sinusoidal and has a slowly changing effective value is generated. The fault arc is a gas free discharge phenomenon caused by air breakdown or electrical connection looseness caused by line insulation aging, breakage, air moisture and the like. A short-circuit fault is a fault that occurs when a circuit or a part of a circuit is shorted.
In practical application, the conditions of user side leakage faults, arc faults and short-circuit faults can be simulated through experiments, and then in the experiment process, the power utilization information of the user side when corresponding faults occur is collected through the sensing device. Of course, besides the experiment, the power utilization information of the user terminal when the corresponding fault occurs can be obtained in a field monitoring mode.
In addition, when step S102 is executed, the specific collected power consumption information items may be set and adjusted according to actual needs, and it can be understood that power consumption information capable of reflecting the power consumption fault of the user terminal may be selected for collection.
For example, in an embodiment of the present invention, considering that the current, the voltage and the residual current of the user terminal can effectively reflect whether the user terminal has the power failure, the collecting of the power consumption information of the user terminal described in step S102 may specifically include:
and collecting the current of the user terminal, the voltage of the user terminal and the residual current of the user terminal.
The current at the user end is the total current at the user end, and in the case of low-voltage users, the current is usually the current on the live wire. In addition, when three-phase power is used for a small portion, one phase may be selected arbitrarily to detect the phase current. Correspondingly, the voltage of the user terminal is the total voltage of the user, namely the voltage between the live wire and the zero line. The residual current refers to the current with the vector sum of the current of each phase (including neutral line) in the low-voltage distribution line being not zero. Generally speaking, when an accident occurs on the electricity utilization side, current flows from an electrified body to the ground through a human body, and the effective value of instantaneous vector synthesis of the current is called residual current and commonly called electric leakage.
In addition, in other occasions, the collected electricity utilization information can also comprise other types, such as cable temperature and the like, according to actual needs.
Step S103: and performing feature extraction based on the collected power utilization information, and setting a corresponding label according to the state of the user side when the power utilization information is collected to obtain an initial training sample.
The power consumption information of the user side acquired in step S102 is the original acquired data, and in order to effectively train the initial random forest model and obtain a high-performance target random forest model, the feature extraction is performed based on the acquired power consumption information. Of course, there are many specific feature extraction items, which can be set according to actual situations. And it can be understood that, in the same way as selecting the specific item of the collected power consumption information, when the feature extraction is performed, the feature capable of effectively reflecting the power consumption fault condition of the user terminal can be extracted.
In addition, it should be noted that the user electricity utilization safety detection result output by the target random forest model can reflect a specific fault type, so that when the electricity utilization information of the user terminal is collected in step S102, the collection of the electricity utilization information needs to be performed when an electric leakage fault, an arc fault, a short circuit fault and normal electricity utilization of the user terminal occur at the user terminal, and similarly, after the feature extraction is performed based on the collected electricity utilization information, a corresponding tag needs to be set according to the state of the user terminal when the electricity utilization information is collected, so as to obtain an initial training sample.
Therefore, the scheme of the present application has 4 kinds of labels, which respectively represent "leakage fault", "arc fault", "short-circuit fault" and "normal power consumption". For example, a label 000 indicates "normal power usage", a label 100 indicates "leakage fault", a label 010 indicates "arc fault", and a label 001 indicates "short-circuit fault". It can be understood that which label is specifically set for a training sample depends on the power consumption information of the training sample is collected under what conditions.
In addition, when the power consumption information of the user terminal is collected, the sampling frequency can be set according to the requirement, for example, 10kHz is set in one occasion, the power frequency is 50Hz, and each power frequency period comprises 200 sampling time points, which are 200 sampling points for short. In practical application, for each situation that a user side has an electric leakage fault, an arc fault, a short-circuit fault and the user side normally uses electricity, training samples under the situation can be numbered, namely, i in the ith sampling time point is a positive integer.
In some cases, when feature extraction is performed on the power consumption information acquired at the ith sampling point, data before the ith sampling point, for example, power consumption information acquired in one power frequency cycle closest to the ith sampling point, is used, and therefore, in practical applications, feature extraction may be performed not from the 1 st sampling point when feature extraction is performed based on the acquired power consumption information. For example, in one case, feature extraction may be performed from the second power frequency cycle, that is, from the nth sampling time point. And it can be understood that classification processing is required for the collected electricity consumption information under 4 different conditions.
In one embodiment of the invention. The feature extraction based on the collected power consumption information in step S103 may specifically include:
based on the collected power utilization information, aiming at each sampling time point, extracting a current form factor, a current pulse factor, a current peak factor, a current margin factor, a current kurtosis factor, a current energy index, a voltage form factor, a voltage pulse factor, a voltage peak factor, a voltage margin factor, a voltage kurtosis factor, a voltage energy index, a residual current form factor, a residual current pulse factor, a residual current peak factor, a residual current margin factor, a residual current kurtosis factor and a residual current energy index corresponding to the sampling time point.
In the above feature extraction, the specific calculation method may be set according to actual conditions.
The form factor is also called a form index, which is the ratio of the root mean square value to the absolute mean value of the signal (i-th sampling point time), and can reflect the difference and distortion degree of the actual form compared with the standard sine wave. Therefore, the formula for calculating the form factor can be expressed as:
Figure 775844DEST_PATH_IMAGE047
. Therein
Figure 418178DEST_PATH_IMAGE048
Is the root-mean-square value of the signal,
Figure 22334DEST_PATH_IMAGE049
is the absolute mean of the signal. Since the signal of the present application can be specifically current, voltage, total current, i.e. signal of ith sampling point moment
Figure 724711DEST_PATH_IMAGE050
Can be embodied as current of user terminal
Figure 289685DEST_PATH_IMAGE051
Voltage of user terminal
Figure 844032DEST_PATH_IMAGE052
And residual current of the user terminal
Figure 178061DEST_PATH_IMAGE053
Therefore, for the ith sampling time point, the extracted current form factor corresponding to the ith sampling time point
Figure 910394DEST_PATH_IMAGE001
Voltage wave form factor
Figure 962663DEST_PATH_IMAGE002
And residual current form factor
Figure 822166DEST_PATH_IMAGE003
Respectively expressed as:
Figure 276281DEST_PATH_IMAGE004
Figure 179515DEST_PATH_IMAGE005
Figure 984660DEST_PATH_IMAGE006
. Wherein,
Figure 677547DEST_PATH_IMAGE037
the remaining current value of the ue at the ith sampling time point,
Figure 986169DEST_PATH_IMAGE038
the current value of the ue at the ith sampling time point,
Figure 811036DEST_PATH_IMAGE039
the voltage value of the ue at the ith sampling time point,
Figure 572319DEST_PATH_IMAGE040
in order to count the sampling time points of a single power frequency cycle when collecting the electricity consumption information of the user terminal, in the above embodiment, each power frequency cycle includes 200 sampling time points, then
Figure 163837DEST_PATH_IMAGE040
Is 200.
The pulse factor is also called pulse index, which is the ratio of the signal peak value to the absolute mean value (rectified mean value), and reflects the impulse property of the signal, therefore, the calculation formula of the pulse factor can be expressed as:
Figure 451599DEST_PATH_IMAGE054
Figure 572002DEST_PATH_IMAGE048
i.e. the peak value of the signal,
Figure 460061DEST_PATH_IMAGE055
the signal peak in a power frequency period before the ith sampling time point is indicated. Therefore, in an embodiment of the present invention, for the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time point
Figure 324112DEST_PATH_IMAGE007
Voltage pulse factor
Figure 466381DEST_PATH_IMAGE008
And residual current pulse factor
Figure 288843DEST_PATH_IMAGE009
Respectively expressed as:
Figure 900084DEST_PATH_IMAGE010
Figure 567826DEST_PATH_IMAGE011
Figure 564601DEST_PATH_IMAGE012
. Wherein,
Figure 292385DEST_PATH_IMAGE056
is the current peak value in nearly one power frequency period,
Figure 781135DEST_PATH_IMAGE057
is the voltage peak value in nearly one power frequency period,
Figure 626469DEST_PATH_IMAGE058
the residual current peak value in nearly one power frequency period is obtained.
The peak factor is also called peak index, which is the ratio of the signal peak value to the root mean square value, and can reflect the extreme degree of the peak value compared with the whole waveform. When the impact signal occurs, the peak value of the waveform is increased, and the index is increased accordingly. The peak index calculation formula can be expressed as:
Figure 618696DEST_PATH_IMAGE059
therefore, in an embodiment of the present invention, for the ith sampling time point, the extracted current crest factor corresponding to the ith sampling time point
Figure 782961DEST_PATH_IMAGE013
Voltage peak factor
Figure 352483DEST_PATH_IMAGE014
And residual current crest factor
Figure 627606DEST_PATH_IMAGE015
Respectively expressed as:
Figure 615285DEST_PATH_IMAGE016
Figure 58774DEST_PATH_IMAGE017
Figure 256537DEST_PATH_IMAGE018
the margin factor is also called a margin index, which is the ratio of the signal peak value and the square root amplitude value, and can reflect the fullness degree of the waveform. The margin index calculation formula can be expressed as:
Figure 194406DEST_PATH_IMAGE060
therefore, in an embodiment of the present invention, for the ith sampling time point, the extracted current margin factor corresponding to the ith sampling time point
Figure 426804DEST_PATH_IMAGE019
Factor of voltage margin
Figure 401713DEST_PATH_IMAGE020
And a residual current margin factor
Figure 962139DEST_PATH_IMAGE021
Respectively expressed as:
Figure 579065DEST_PATH_IMAGE022
Figure 525024DEST_PATH_IMAGE023
Figure 936414DEST_PATH_IMAGE024
the kurtosis factor, also called kurtosis index, is defined as normalized 4-order central moments, which reflect the smoothness of the waveform. The kurtosis index is very sensitive to impact components in the signal, and the larger the energy of the impact components is, the larger the kurtosis value is, and the gentler the waveform is. The kurtosis index calculation formula can be expressed as:
Figure 108769DEST_PATH_IMAGE061
. Therein
Figure 372129DEST_PATH_IMAGE062
A kurtosis value is indicated.
Therefore, in an embodiment of the present invention, for the ith sampling time point, the extracted current kurtosis factor corresponding to the ith sampling time point is extracted
Figure 579120DEST_PATH_IMAGE025
Voltage kurtosis factor
Figure 161411DEST_PATH_IMAGE026
And a residual current kurtosis factor
Figure 680117DEST_PATH_IMAGE027
Respectively expressed as:
Figure 373266DEST_PATH_IMAGE028
Figure 310129DEST_PATH_IMAGE029
Figure 63322DEST_PATH_IMAGE030
considering that after an arc fault occurs, part of electric energy can be converted into energy in other forms to be dissipated, energy calculation is carried out on a current waveform, and an energy index is obtained by carrying out non-dimensionalization according to the maximum value of the energy in the current period, so that the change condition of the energy is explored. The calculation formula defining the energy index may be expressed as:
Figure 210269DEST_PATH_IMAGE063
. Therein
Figure 566164DEST_PATH_IMAGE064
In order to be able to do so,
Figure 482168DEST_PATH_IMAGE065
is the period energy maximum.
In one embodiment of the invention,aiming at the ith sampling time point, extracting a current energy index corresponding to the ith sampling time point
Figure 780163DEST_PATH_IMAGE031
Index of voltage energy
Figure 414406DEST_PATH_IMAGE032
And residual current energy index
Figure 449358DEST_PATH_IMAGE033
Respectively expressed as:
Figure 78923DEST_PATH_IMAGE034
Figure 173918DEST_PATH_IMAGE035
Figure 170824DEST_PATH_IMAGE036
(ii) a Wherein,
Figure 9467DEST_PATH_IMAGE066
the maximum value of the current energy in nearly one power frequency period,
Figure 634483DEST_PATH_IMAGE067
the maximum value of the voltage energy in nearly one power frequency period,
Figure 25013DEST_PATH_IMAGE068
the maximum value of the residual current energy in nearly one power frequency period.
Step S104: and determining the information gain between each feature item after feature extraction and the label for each label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the filtered remaining feature items according to the principle of maximizing the classification accuracy of the initial random forest model.
According to the method and the device, the accuracy rate of fault identification and detection is considered to be high and low and is not only related to the quality of the model, but also the quality of the feature data is crucial to the accuracy rate of fault identification, the proper feature selection can provide the semantics and information of the original feature data, and even a simple classification model can obtain a good classification effect.
For example, after feature extraction is performed according to the above embodiment, each set of initial training samples obtained may include 18-dimensional features, and in other situations, when more features are extracted, the dimensions may be higher, so that a large amount of data needs to be analyzed in real time when the target random forest model is subsequently used. In contrast, the method and the device have the advantages that the features are screened through the operation of the step S104, the optimal feature subset is obtained, namely irrelevant features are removed according to a certain evaluation criterion, and the most effective features are reserved, so that the fluctuation of the features is effectively reduced, the natural semantics of original feature data are reserved, the quantity of irrelevant redundant information in the feature set is reduced, and further the target random forest model can efficiently and accurately perform user electricity utilization safety detection.
Specifically, the method adopts a Filter + Wrapper mixed mode to select the features and selects the optimal feature subset. The Filter filtering method is to calculate the intrinsic characteristics of a certain characteristic subset, such as the association degree, information amount, sample distance and the like of a characteristic item and a class label, so as to judge whether the characteristic subset expresses and distinguishes data to the greatest extent. The Wrapper encapsulation method directly uses a target data processing task as an evaluation system, namely similar to black box test, and does not know the characteristics of the selected subset, and only how good the data processing effect is based on the subset. Therefore, when evaluating the subset, the operation required to run the specific data processing task is a process of continuously and circularly improving feedback according to the classification result.
When a Filter filtering method is adopted, the specific scheme of the method is to determine the information gain between each feature item after feature extraction and the label, and Filter each feature item of which the information gain is lower than a preset threshold value.
Specifically, if the random variables X and Y are used separately
Figure 368270DEST_PATH_IMAGE069
And
Figure 384505DEST_PATH_IMAGE070
to indicate that the user is not in a normal position,
Figure 129607DEST_PATH_IMAGE071
and
Figure 831984DEST_PATH_IMAGE072
as a probability density function, the entropy H (X) of the random variable X can then be defined as:
Figure 521591DEST_PATH_IMAGE073
. Likewise, the entropy H (Y) of the random variable Y can be defined as:
Figure 436458DEST_PATH_IMAGE074
the conditional entropy of the random variables X and Y is defined as:
Figure 911433DEST_PATH_IMAGE075
the Information Gain (IG) is the amount of information that can be used to measure the correlation between two variables, the larger the value, the greater the correlation between the variables. The information gain has no symmetry, and the correlation among the characteristics can be measured from the nonlinear angle, so the relationship among the information gain, the entropy and the conditional entropy is
Figure 519132DEST_PATH_IMAGE076
. That is, when
Figure 836980DEST_PATH_IMAGE077
When =0, it is described that the variable X and the variable Y are not correlated. Whereas if the variables X and Y are more correlated, the value of IG (X | Y) is larger.
The information gain can be used to measure how much a feature contributes to the current system classification, which can contribute to a reduction in noise sensitivity in the samples.
In the scheme of the application, each feature item with information gain lower than a preset threshold needs to be filtered. The preset threshold may be set as needed, for example, when the threshold is set to 0, the feature item will be filtered only when the feature item is completely unrelated to the tag.
It should be noted that, in the present application, filtering is performed on different tags, that is, filtering is performed on different tags. For example, if the correlation between the feature a and the tag 1 (for example, the tag 1 is an arc fault) is 0, the feature item a in the tag 1 is filtered, and for example, for the remaining 3 tags, the information gain between the feature a and the corresponding tag exceeds a preset threshold, and for the remaining 3 tags, the feature a is not filtered. That is to say, in this example, when the target training sample is obtained by performing feature selection on the initial training sample subsequently, the feature a in the initial training sample labeled as the arc fault is deleted, and the features a under the remaining labels are not deleted.
Further, in an embodiment of the present invention, the information gain is biased toward the more selective feature, which may cause the over-fitting phenomenon. Thus, more branching features may be penalized.
That is, in an embodiment of the present invention, after determining the information gain between each feature item and the tag, the method may further include:
and punishing the respective information gain of each characteristic item based on the information entropy of each characteristic item to obtain an information gain rate.
And the penalized information gain is in negative correlation with the information entropy of the characteristic item;
correspondingly, filtering each feature item of which the information gain is lower than a preset threshold value, including:
and comparing each information gain rate after punishment is finished with a preset threshold value, and filtering the characteristic items corresponding to each information gain rate lower than the preset threshold value.
In this embodiment, the information gain of each feature item is increasedThe rows are penalized, and it will be appreciated that the more branches of a feature, the higher the penalty. For example, in one embodiment of the invention by
Figure 680171DEST_PATH_IMAGE078
Penalty for information gain, calculated
Figure 134287DEST_PATH_IMAGE079
I.e. the information gain rate of the feature item X.
Wherein,
Figure 552367DEST_PATH_IMAGE080
representing the information gain between the feature item X and the label Y,
Figure 91933DEST_PATH_IMAGE081
the penalty factor corresponding to the feature item X under the label Y is represented, namely the value entropy of the feature X,
Figure 879761DEST_PATH_IMAGE082
representing the information gain ratio between the feature item X and the label Y.
It can be seen that the information gain rate of the random variable X is positively correlated with its information gain, and negatively correlated with its entropy, i.e., the branch of the feature. Therefore, if the random variable X takes a larger value, the information gain rate of X is reduced, which is beneficial to reduce the selection preference.
It can be understood that, if no punishment is performed, each calculated information gain is directly compared with a preset threshold, and in this embodiment, punishment of the information gain needs to be performed according to different branch numbers after the information gain is calculated, so that each information gain rate after the punishment is completed is respectively compared with the preset threshold, and then the feature items corresponding to each information gain rate lower than the preset threshold are filtered.
When Wrapper packaging is carried out, the optimal feature subset is selected from the feature items remaining after filtering according to the principle that the classification accuracy of the initial random forest model is the maximum, and the specific algorithm can be various.
For example, in an embodiment of the present invention, according to a principle that the classification accuracy of the initial random forest model is maximized, the selecting the optimal feature subset from the feature items remaining after filtering may specifically include:
for each label, filtering each feature item with the information gain lower than a preset threshold value, and traversing a feature space through an SFS algorithm to obtain a plurality of feature sets;
and respectively determining the classification accuracy of the initial random forest model under the condition of each feature set, and taking the feature set adopted when the classification accuracy of the initial random forest model is the highest as an optimal feature subset.
In this embodiment, the feature items may be sorted in descending order according to the information gain calculated in the Filter filtering stage. It should be noted that, if the scheme with information gain penalty is adopted, the feature items may be sorted in a descending order according to the information gain rate calculated in the Filter filtering stage. A plurality of feature sets can be obtained by traversing the feature space through an SFS (Sequential Forward Selection) algorithm. Then, the classification accuracy of the initial random forest model under the condition of each feature set can be respectively determined, for example, the classification accuracy of each feature set is calculated by a random forest algorithm. When one feature set is selected, the classification accuracy is the highest, and the feature set is used as the required optimal feature subset.
The SFS algorithm is a bottom-up method, wherein the first feature selects a single optimal feature, the second feature selects a feature which is combined with the first feature to be optimal from all the other features, and each of the latter features selects a feature which is combined with the selected feature to be optimal. The advantage is that certain factors of the combination between the features are considered.
Step S105: and performing feature selection on the initial training sample based on the optimal feature subset corresponding to each label to obtain a target training sample.
For each label, through the feature selection of the Filter + Wrapper in the step S104, the optimal feature subset corresponding to the label can be obtained, so as to perform feature selection on the initial training sample, and obtain the target training sample.
Step S106: and training the initial random forest model through the target training sample to obtain a trained target random forest model.
After the target training sample is obtained, the initial random forest model can be trained through the target training sample, and a trained target random forest model is obtained.
In practice, most of the target training samples may be used for training and a small part for testing, for example, 80% of the sample data may be used for training and 20% of the sample data may be used for testing.
The random forest model is a classifier that contains a plurality of decision trees and whose output classes are dependent on the mode of the class output by the individual trees. Referring to fig. 2b, the schematic diagram of the multi-decision tree principle of the random forest model is shown, after the optimal feature subset corresponding to each label is determined, feature selection may be performed on an initial training sample to obtain a target training sample, the target training sample is split into a training set and a test set, then a plurality of training sample subsets are randomly extracted from the training set by using a boottrap method, decision tree modeling is performed on each subset, decision results of a plurality of trees are integrated, and a final model for user power consumption safety detection is obtained by voting. Namely, each decision tree outputs a fault state discrimination result, and the recognition results of all decision trees are counted according to the principle of 'few obeys majority', so that the perception state with the largest result proportion is taken as a final result.
Step S107: and detecting the current power utilization information of the user side, selecting the characteristics, and inputting the characteristics into the target random forest model to obtain a user power utilization safety detection result output by the target random forest model.
After the target random forest model is obtained through training, the target random forest model can be used, namely data are input into the target random forest model, the target random forest model can output a user electricity utilization safety detection result, for example, the output 000 represents normal electricity utilization, the output 100 represents leakage fault, the output 010 represents arc fault, and the output 001 represents short-circuit fault.
In addition, it can be understood that, when step S107 is executed, the current power consumption information of the user terminal is detected, and then the current power consumption information is input into the target random forest model after the feature selection, where the feature selection is performed, that is, the feature selection is performed according to the determined optimal feature subset corresponding to each label.
By applying the technical scheme provided by the embodiment of the invention, when the user side has an electric leakage fault, when the user side has an electric arc fault, when the user side has a short-circuit fault and when the user side normally uses electricity, the electricity utilization information of the user side is collected, the feature extraction is further carried out based on the collected electricity utilization information, and the corresponding label is set according to the state of the user side when the electricity utilization information is collected to obtain the initial training sample.
In addition, the scheme of the application adopts a random forest model to detect the electricity utilization safety of the user, the random forest model is based on the output of all decision trees in the random forest model, and a few principles of obeying majority are adopted as the final electricity utilization sensing result of the user, so that the probability of over-fitting can be greatly reduced, the generalization capability is enhanced, and the situation of local extreme value is not easy to fall into.
Furthermore, the method considers that the accuracy rate of fault identification detection is not only related to the quality of the model, but also the quality of the feature data is crucial to the accuracy rate of fault identification, and proper feature selection can provide semantics and information of the original feature data, so that a good classification effect can be obtained even through a simple classification model. Therefore, for each label, the information gain between each feature item after feature extraction and the label is determined, and each feature item with the information gain lower than the preset threshold value is filtered, so that the quantity of irrelevant redundant information in the feature set is effectively reduced, and the detection efficiency of the target random forest model in the application process is improved. Furthermore, according to the principle of maximizing the classification accuracy of the initial random forest model, the optimal feature subset is selected from the feature items left after filtering, so that the number of irrelevant redundant information in the feature set is further reduced, the selected optimal feature subset is ensured to have good user electricity safety detection performance, and the detection accuracy is improved. And the scheme of the application is to screen the feature items, so that the natural semantics of the original feature items can not be damaged.
To sum up, the scheme of this application can carry out user's power consumption safety inspection high-efficiently, accurately, has kept the natural semantics of characteristic item, and can avoid appearing traditional artificial intelligence model and appear local extremum easily, overfitting, the poor condition of generalization ability.
Corresponding to the above method embodiment, the embodiment of the invention also provides a user electricity utilization safety detection system, which can be referred to in correspondence with the above.
Referring to fig. 3, a schematic structural diagram of a user electricity utilization safety detection system in the present invention is shown, including:
an initial random forest model construction module 301, configured to construct an initial random forest model that outputs a user power consumption safety detection result for input information;
the power consumption information acquisition module 302 is used for acquiring power consumption information of the user side when the user side has a leakage fault, an arc fault, a short-circuit fault and normal power consumption;
the feature extraction module 303 is configured to perform feature extraction based on the collected power consumption information, and set a corresponding tag according to a user side state when the power consumption information is collected, to obtain an initial training sample;
an optimal feature subset selection module 304, configured to determine, for each type of label, information gains between each feature item after feature extraction and the label, filter each feature item whose information gain is lower than a preset threshold, and select an optimal feature subset from the remaining feature items after filtering according to a principle that the initial random forest model classification accuracy is maximized;
a target training sample determining module 305, configured to perform feature selection on the initial training sample based on the optimal feature subset corresponding to each label, to obtain a target training sample;
the training module 306 is used for training the initial random forest model through the target training sample to obtain a trained target random forest model;
and the execution module 307 is configured to detect current power consumption information of the user side, input the current power consumption information to the target random forest model after the characteristics are selected, and obtain a user power consumption safety detection result output by the target random forest model.
In a specific embodiment of the present invention, the selecting an optimal feature subset from the feature items remaining after filtering according to a principle of maximizing the classification accuracy of the initial random forest model includes:
for each label, filtering each feature item with the information gain lower than a preset threshold value, and traversing a feature space through an SFS algorithm to obtain a plurality of feature sets;
and respectively determining the classification accuracy of the initial random forest model under the condition of each feature set, and taking the feature set adopted when the classification accuracy of the initial random forest model is the highest as an optimal feature subset.
In an embodiment of the present invention, after determining the information gain between each feature item and the tag, the optimal feature subset selection module 304 is further configured to:
punishment is carried out on the information gain of each characteristic item based on the information entropy of each characteristic item to obtain an information gain rate;
correspondingly, filtering each feature item of which the information gain is lower than a preset threshold value, including:
and comparing each information gain rate after punishment is finished with a preset threshold value, and filtering the characteristic items corresponding to each information gain rate lower than the preset threshold value.
In a specific embodiment of the present invention, the constructed initial random forest model uses samples with put back, the parameter random _ state is set to a fixed value, and the parameter oob _ score is set to True.
In a specific embodiment of the present invention, the collecting of the electricity consumption information at the user side includes:
and collecting the current of the user terminal, the voltage of the user terminal and the residual current of the user terminal.
In a specific embodiment of the present invention, the feature extraction based on the collected power consumption information includes:
based on the collected power utilization information, aiming at each sampling time point, extracting a current wave form factor, a current pulse factor, a current peak value factor, a current margin factor, a current kurtosis factor, a current energy index, a voltage wave form factor, a voltage pulse factor, a voltage peak value factor, a voltage margin factor, a voltage kurtosis factor, a voltage energy index, a residual current wave form factor, a residual current pulse factor, a residual current peak value factor, a residual current margin factor, a residual current kurtosis factor and a residual current energy index corresponding to the sampling time point.
In an embodiment of the present invention, for the ith sampling time point, the extracted current form factor corresponding to the ith sampling time point
Figure 313016DEST_PATH_IMAGE001
Voltage wave form factor
Figure 528097DEST_PATH_IMAGE002
And residual current form factor
Figure 554958DEST_PATH_IMAGE003
Respectively expressed as:
Figure 756264DEST_PATH_IMAGE004
Figure 919392DEST_PATH_IMAGE005
Figure 898849DEST_PATH_IMAGE006
aiming at the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time point
Figure 786908DEST_PATH_IMAGE007
Voltage pulse factor
Figure 916538DEST_PATH_IMAGE008
And residual current pulse factor
Figure 934173DEST_PATH_IMAGE009
Respectively expressed as:
Figure 881269DEST_PATH_IMAGE010
Figure 617144DEST_PATH_IMAGE011
Figure 284886DEST_PATH_IMAGE012
extracting a current peak factor corresponding to the ith sampling time point for the ith sampling time point
Figure 297972DEST_PATH_IMAGE013
Voltage peak factor
Figure 291336DEST_PATH_IMAGE014
And residual current peak factor
Figure 373562DEST_PATH_IMAGE015
Respectively expressed as:
Figure 110574DEST_PATH_IMAGE016
Figure 102800DEST_PATH_IMAGE017
Figure 375388DEST_PATH_IMAGE018
aiming at the ith sampling time point, the extracted current margin factor corresponding to the ith sampling time point
Figure 85855DEST_PATH_IMAGE019
Factor of voltage margin
Figure 954453DEST_PATH_IMAGE020
And a residual current margin factor
Figure 801187DEST_PATH_IMAGE021
Respectively expressed as:
Figure 11719DEST_PATH_IMAGE022
Figure 943903DEST_PATH_IMAGE023
Figure 881772DEST_PATH_IMAGE024
aiming at the ith sampling time point, extracting a current kurtosis factor corresponding to the ith sampling time point
Figure 114170DEST_PATH_IMAGE025
Voltage kurtosis factor
Figure 89080DEST_PATH_IMAGE026
And a residual current kurtosis factor
Figure 148040DEST_PATH_IMAGE027
Respectively expressed as:
Figure 499387DEST_PATH_IMAGE028
Figure 445347DEST_PATH_IMAGE029
Figure 591157DEST_PATH_IMAGE030
aiming at the ith sampling time point, the extracted current energy index corresponding to the ith sampling time point
Figure 638879DEST_PATH_IMAGE031
Index of voltage energy
Figure 793916DEST_PATH_IMAGE032
And residual current energy index
Figure 735328DEST_PATH_IMAGE033
Respectively expressed as:
Figure 442253DEST_PATH_IMAGE034
Figure 836325DEST_PATH_IMAGE035
Figure 4892DEST_PATH_IMAGE036
wherein,
Figure 800810DEST_PATH_IMAGE037
the remaining current value of the ue at the ith sampling time point,
Figure 147478DEST_PATH_IMAGE038
the current value at the ue side at the ith sampling time point,
Figure 638633DEST_PATH_IMAGE039
the voltage value of the ue at the ith sampling time point,
Figure 401053DEST_PATH_IMAGE040
in order to collect the electricity consumption information of the user terminal, the number of sampling time points in a single power frequency period,
Figure 910531DEST_PATH_IMAGE041
is the current peak value in nearly one power frequency period,
Figure 100204DEST_PATH_IMAGE042
is the voltage peak value in nearly one power frequency period,
Figure 468869DEST_PATH_IMAGE043
the residual current peak value in nearly one power frequency period,
Figure 143301DEST_PATH_IMAGE044
the maximum value of the current energy in nearly one power frequency period,
Figure 648232DEST_PATH_IMAGE045
the maximum value of the voltage energy in nearly one power frequency period,
Figure 8806DEST_PATH_IMAGE046
the maximum value of the residual current energy in nearly one power frequency period.
Corresponding to the above method and system embodiments, the embodiments of the present invention further provide a user electricity utilization safety detection device and a computer readable storage medium, which can be referred to in correspondence with the above.
The user electricity safety detection device may include:
a memory for storing a computer program;
a processor for executing a computer program to implement the steps of the user electricity safety detection method in any of the above embodiments.
The computer readable storage medium has a computer program stored thereon, and the computer program, when executed by a processor, implements the steps of the user electricity safety detection method in any of the above embodiments. The computer-readable storage medium referred to herein may include Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, without departing from the principle of the present invention, several improvements and modifications can be made to the present invention, and these improvements and modifications also fall into the protection scope of the present invention.

Claims (7)

1. A user electricity utilization safety detection method is characterized by comprising the following steps:
constructing an initial random forest model for outputting a user electricity utilization safety detection result aiming at input information;
collecting power utilization information of a user side when the user side has a leakage fault, an arc fault, a short-circuit fault and normal power utilization;
extracting features based on the collected power utilization information, and setting corresponding labels according to the state of the user side when the power utilization information is collected to obtain an initial training sample;
for each label, determining the information gain between each feature item after feature extraction and the label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the remaining filtered feature items according to the principle of enabling the classification accuracy of the initial random forest model to be maximum;
based on the optimal feature subset corresponding to each label, performing feature selection on the initial training sample to obtain a target training sample;
training the initial random forest model through the target training sample to obtain a trained target random forest model;
detecting current power utilization information of a user side, selecting characteristics, and inputting the characteristics to the target random forest model to obtain a user power utilization safety detection result output by the target random forest model;
the collection of the power utilization information of the user side comprises the following steps:
collecting current of a user side, voltage of the user side and residual current of the user side;
the power consumption information based on collection carries out feature extraction, including:
extracting a current wave form factor, a current pulse factor, a current peak factor, a current margin factor, a current kurtosis factor, a current energy index, a voltage wave form factor, a voltage pulse factor, a voltage peak factor, a voltage margin factor, a voltage peak factor, a voltage kurtosis factor, a voltage energy index, a residual current wave form factor, a residual current pulse factor, a residual current peak factor, a residual current margin factor, a residual current kurtosis factor and a residual current energy index corresponding to each sampling time point based on the collected power utilization information;
aiming at the ith sampling time point, the extracted current form factor corresponding to the ith sampling time pointS Iaf Voltage wave form factorS Uf And residual current form factorS Idf Respectively expressed as:
Figure DEST_PATH_IMAGE001
aiming at the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time pointC Iaf Factor of voltage pulseC Uf And residual current pulse factorC Idf Respectively expressed as:
Figure 513485DEST_PATH_IMAGE002
aiming at the ith sampling time point, extracting a current peak value factor corresponding to the ith sampling time pointI Iaf Voltage peak factorI Uf And residual current peak factorI Idf Respectively expressed as:
Figure DEST_PATH_IMAGE003
aiming at the ith sampling time point, the extracted current margin factor corresponding to the ith sampling time pointCL Iaf Factor of voltage marginCL Uf And a residual current margin factorCL Idf Respectively expressed as:
Figure 40413DEST_PATH_IMAGE004
aiming at the ith sampling time point, extracting a current kurtosis factor corresponding to the ith sampling time pointK Iav Voltage kurtosis factorK Uv And a residual current kurtosis factorK Idv Respectively expressed as:
Figure DEST_PATH_IMAGE005
aiming at the ith sampling time point, the extracted current energy index corresponding to the ith sampling time pointE Iaf Index of voltage energyE Uf And residual current energy indexE Idf Respectively expressed as:
Figure 429937DEST_PATH_IMAGE006
wherein,Id i the remaining current value of the ue at the ith sampling time point,Ia i the current value of the ue at the ith sampling time point,U i the voltage value of the ue at the ith sampling time point,Nthe number of sampling time points of a single power frequency cycle when collecting the power consumption information of the user side, max: (Ia i ) Is the current peak value in nearly one power frequency period,max(U i ) Is the voltage peak value, max (in nearly one power frequency period)Id i ) Is the residual current peak value in nearly one power frequency period, max: (E Ia ) Is the maximum value of current energy in nearly one power frequency period, max: (E U ) Is the maximum voltage energy value, max (in nearly one power frequency period)E Id ) The maximum value of the residual current energy in nearly one power frequency period.
2. The user electricity utilization safety detection method according to claim 1, wherein the extracting an optimal feature subset from the feature items remaining after filtering according to a principle that the classification accuracy of the initial random forest model is the maximum comprises:
for each label, filtering each feature item with the information gain lower than a preset threshold value, and traversing a feature space through an SFS algorithm to obtain a plurality of feature sets;
and respectively determining the classification accuracy of the initial random forest model under the condition of each feature set, and taking the feature set adopted when the classification accuracy of the initial random forest model is the highest as an optimal feature subset.
3. The method for detecting the power consumption safety of the user according to claim 1, further comprising, after determining the information gain between each feature item and the tag:
punishment is carried out on the information gain of each characteristic item based on the information entropy of each characteristic item to obtain an information gain rate;
correspondingly, the filtering the feature items of which the information gain is lower than the preset threshold includes:
and comparing each information gain rate after punishment is finished with a preset threshold value, and filtering the characteristic items corresponding to each information gain rate lower than the preset threshold value.
4. The user power safety detection method according to claim 1, wherein the constructed initial random forest model uses samples with a set back, a parameter random _ state is set to a fixed value, and a parameter oob _ score is set to True.
5. A safety detection system for electricity consumption of a user is characterized by comprising:
the initial random forest model construction module is used for constructing an initial random forest model for outputting a user electricity utilization safety detection result aiming at input information;
the power consumption information acquisition module is used for acquiring the power consumption information of the user side when the user side has an electric leakage fault, an electric arc fault, a short-circuit fault and normal power consumption;
the characteristic extraction module is used for extracting characteristics based on the collected power utilization information, and setting corresponding labels according to the user side state when the power utilization information is collected to obtain an initial training sample;
the optimal feature subset selection module is used for determining the information gain between each feature item after feature extraction and each label aiming at each label, filtering each feature item with the information gain lower than a preset threshold value, and selecting an optimal feature subset from the remaining filtered feature items according to the principle of enabling the classification accuracy of the initial random forest model to be maximum;
the target training sample determining module is used for performing feature selection on the initial training sample based on the optimal feature subset corresponding to each label to obtain a target training sample;
the training module is used for training the initial random forest model through the target training sample to obtain a trained target random forest model;
the execution module is used for detecting the current power utilization information of the user side, inputting the current power utilization information to the target random forest model after the characteristics are selected, and obtaining a user power utilization safety detection result output by the target random forest model;
the collection of the power consumption information of the user side comprises the following steps:
collecting current of a user side, voltage of the user side and residual current of the user side;
the power consumption information based on collection carries out feature extraction, including:
extracting a current wave form factor, a current pulse factor, a current peak factor, a current margin factor, a current kurtosis factor, a current energy index, a voltage wave form factor, a voltage pulse factor, a voltage peak factor, a voltage margin factor, a voltage peak factor, a voltage kurtosis factor, a voltage energy index, a residual current wave form factor, a residual current pulse factor, a residual current peak factor, a residual current margin factor, a residual current kurtosis factor and a residual current energy index corresponding to each sampling time point based on the collected power utilization information;
aiming at the ith sampling time point, the extracted current form factor corresponding to the ith sampling time pointS Iaf Voltage wave form factorS Uf And residual current form factorS Idf Respectively expressed as:
Figure DEST_PATH_IMAGE007
aiming at the ith sampling time point, the extracted current pulse factor corresponding to the ith sampling time pointC Iaf Voltage pulse factorC Uf And residual current pulse factorC Idf Respectively expressed as:
Figure 540325DEST_PATH_IMAGE008
aiming at the ith sampling time point, extracting a current peak value factor corresponding to the ith sampling time pointI Iaf Voltage peak factorI Uf And residual current peak factorI Idf Respectively expressed as:
Figure DEST_PATH_IMAGE009
aiming at the ith sampling time point, the extracted current margin factor corresponding to the ith sampling time pointCL Iaf Factor of voltage marginCL Uf And a residual current margin factorCL Idf Respectively expressed as:
Figure 495643DEST_PATH_IMAGE010
aiming at the ith sampling time point, extracting a current kurtosis factor corresponding to the ith sampling time pointK Iav Voltage kurtosis factorK Uv And a residual current kurtosis factorK Idv Respectively expressed as:
Figure DEST_PATH_IMAGE011
aiming at the ith sampling time point, the extracted current energy index corresponding to the ith sampling time pointE Iaf Index of voltage energyE Uf And residual current energy indexE Idf Respectively expressed as:
Figure 495870DEST_PATH_IMAGE012
wherein,Id i the remaining current value at the ue side at the ith sampling time point,Ia i the current value of the ue at the ith sampling time point,U i the voltage value of the ue at the ith sampling time point,Nthe number of sampling time points of a single power frequency cycle when collecting the electricity consumption information of the user end, max: (Ia i ) Is the current peak value in nearly one power frequency period,max(U i ) Is the voltage peak value, max (in nearly one power frequency period)Id i ) Is the residual current peak value in nearly one power frequency period, max: (E Ia ) Is the maximum value of current energy in nearly one power frequency period, max: (E U ) Is the maximum voltage energy value, max (in nearly one power frequency period)E Id ) The maximum value of the residual current energy in nearly one power frequency period.
6. A user electricity safety detection device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the user electricity safety detection method according to any one of claims 1 to 4.
7. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the user electricity safety detection method according to any one of claims 1 to 4.
CN202211359242.0A 2022-11-02 2022-11-02 User electricity utilization safety detection method, system, equipment and storage medium Active CN115409134B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211359242.0A CN115409134B (en) 2022-11-02 2022-11-02 User electricity utilization safety detection method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211359242.0A CN115409134B (en) 2022-11-02 2022-11-02 User electricity utilization safety detection method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115409134A CN115409134A (en) 2022-11-29
CN115409134B true CN115409134B (en) 2023-02-03

Family

ID=84169173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211359242.0A Active CN115409134B (en) 2022-11-02 2022-11-02 User electricity utilization safety detection method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115409134B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152353B (en) * 2023-08-23 2024-05-28 北京市测绘设计研究院 Real scene three-dimensional model creation method, device, electronic device and readable medium
CN118865587A (en) * 2024-06-27 2024-10-29 国网江苏省电力有限公司南京供电分公司 Method and system for analyzing and monitoring electricity usage habits of elderly people living alone
CN119353342B (en) * 2024-12-24 2025-03-04 奥创动力传动(深圳)有限公司 Electromagnetic brake fault detection method, device, detection terminal and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580829A (en) * 2021-12-29 2022-06-03 国网湖南省电力有限公司 Power utilization safety sensing method, equipment and medium based on random forest algorithm

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104330627B (en) * 2014-10-24 2017-04-26 华中科技大学 Automatic power grid resonance detection method based on active injection current
US10685008B1 (en) * 2016-08-02 2020-06-16 Pindrop Security, Inc. Feature embeddings with relative locality for fast profiling of users on streaming data
CN108509996A (en) * 2018-04-03 2018-09-07 电子科技大学 Feature selection approach based on Filter and Wrapper selection algorithms
US11275974B2 (en) * 2018-09-17 2022-03-15 International Business Machines Corporation Random feature transformation forests for automatic feature engineering
CN112748359A (en) * 2019-10-30 2021-05-04 中国电力科学研究院有限公司 Power distribution network ground fault identification method and system based on random forest
CN113076986B (en) * 2021-03-29 2022-12-09 西安交通大学 Photovoltaic fault arc characteristic selection method combining filtering type and packaging type evaluation strategies
CN113239321A (en) * 2021-05-28 2021-08-10 哈尔滨理工大学 Feature selection method based on filtering and packaging type hierarchy progression
CN114169398A (en) * 2022-01-28 2022-03-11 国网天津市电力公司 Photovoltaic DC arc fault identification method and device based on random forest algorithm

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580829A (en) * 2021-12-29 2022-06-03 国网湖南省电力有限公司 Power utilization safety sensing method, equipment and medium based on random forest algorithm

Also Published As

Publication number Publication date
CN115409134A (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN115409134B (en) User electricity utilization safety detection method, system, equipment and storage medium
Gomes et al. High-sensitivity vegetation high-impedance fault detection based on signal's high-frequency contents
CN107273920A (en) A kind of non-intrusion type household electrical appliance recognition methods based on random forest
CN110634080A (en) Abnormal electricity utilization detection method, device, equipment and computer readable storage medium
CN111767951A (en) Method for discovering abnormal data by applying isolated forest algorithm in residential electricity safety analysis
KR100918313B1 (en) Intelligent electrical quality diagnostic analysis method
CN103077402A (en) Transformer partial-discharging mode recognition method based on singular value decomposition algorithm
CN103942453A (en) Intelligent electricity utilization anomaly detection method for non-technical loss
CN117093947B (en) Power generation diesel engine operation abnormity monitoring method and system
CN115619271B (en) Charging pile state evaluation method and device based on CNN and random forest
CN109214464A (en) A kind of doubtful stealing customer identification device and recognition methods based on big data
CN117554854A (en) A method and device for leakage detection of low-voltage power distribution system
CN111160241A (en) A deep learning-based distribution network fault classification method, system and medium
CN114626433A (en) Fault prediction and classification method, device and system for intelligent electric energy meter
CN113010985A (en) Non-invasive load identification method based on parallel AANN
CN115619778A (en) Power equipment defect identification method and system, readable storage medium and equipment
CN108508297B (en) Fault arc detection method based on mutation coefficient and SVM
CN114169398A (en) Photovoltaic DC arc fault identification method and device based on random forest algorithm
CN116908618A (en) A method for diagnosing AC series arc fault in low-voltage distribution network
CN119691620A (en) Low-voltage electricity safety state monitoring and evaluating method and system
CN115660262A (en) Intelligent engineering quality inspection method, system and medium based on database application
CN113313403B (en) Power distribution network comprehensive evaluation method, device and system based on large-scale high-power electric vehicle charging and discharging and storage medium
CN114859169A (en) Intelligent identification method and system for distribution transformer outgoing line load and storage medium
CN113670432B (en) Vibration information self-perception identification and self-energy supply sensing system identification method
CN112595918A (en) Low-voltage meter reading fault detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant