CN116956197B

CN116956197B - Deep learning-based energy facility fault prediction method and device and electronic equipment

Info

Publication number: CN116956197B
Application number: CN202311182251.1A
Authority: CN
Inventors: 屈道宽; 高琰; 明玲; 陈春廷; 秦丁; 王涛; 付龙; 高玉欣
Original assignee: Shandong Ligong Haoming New Energy Co ltd
Current assignee: Shandong Ligong Haoming New Energy Co ltd
Priority date: 2023-09-14
Filing date: 2023-09-14
Publication date: 2024-01-19
Anticipated expiration: 2043-09-14
Also published as: CN116956197A

Abstract

The invention relates to the technical field of energy facility fault prediction, in particular to an energy facility fault prediction method and device based on deep learning and electronic equipment, wherein the method comprises the steps of obtaining monitoring data of energy facilities; preprocessing the monitoring data based on a preprocessing rule to obtain first data; and inputting the first data into a preset fault prediction model, and outputting a prediction result. The method provided by the invention inputs the monitoring data of the preprocessed energy facility into a preset fault prediction model to obtain a prediction result comprising a prediction fault type and a confidence coefficient. The fault prediction model is a deep learning-based energy facility fault prediction model obtained based on a gray wolf optimization algorithm and cost sensitive function training, so that the accuracy of data feature extraction and the accuracy of fault classification prediction can be improved, the accuracy of fault prediction is improved, and the efficiency of energy facility management is improved.

Description

Deep learning-based energy facility fault prediction method and device and electronic equipment

Technical Field

The invention relates to the technical field of energy facility fault prediction, in particular to an energy facility fault prediction method and device based on deep learning and electronic equipment.

Background

In current industrial production and daily life, stable and safe operation of energy facilities is of paramount importance. However, various malfunctions of the facilities may occur due to the complexity of the energy facilities and the influence of external environmental factors. If these faults cannot be found and handled in time, they may lead to reduced facility performance and even serious safety accidents. Therefore, the development of an effective energy facility fault early warning method has important practical significance.

Conventional energy facility fault pre-warning methods are generally based on empirical rules or statistical models, which often require expert empirical knowledge and have poor results in handling complex and nonlinear fault modes. In recent years, with the development of machine learning technology, a method for performing fault early warning using a machine learning model has been widely studied. The method can learn the fault mode from a large amount of equipment operation data, thereby realizing automatic fault detection and early warning.

However, these methods often have difficulty in predicting faults of energy equipment and unsatisfactory prediction accuracy due to factors such as large data volume, incomplete data collection, low feature extraction efficiency, inaccuracy and the like.

Disclosure of Invention

In view of the above, the present invention aims to provide a method, a device and an electronic device for predicting failure of an energy facility based on deep learning.

In a first aspect, an embodiment of the present invention provides a method for predicting an energy facility fault based on deep learning, where the method includes:

acquiring monitoring data of energy facilities;

preprocessing the monitoring data based on a preprocessing rule to obtain first data;

and inputting the first data into a preset fault prediction model, and outputting a prediction result, wherein the prediction result comprises a prediction fault type and a confidence coefficient, and the confidence coefficient is used for representing the occurrence probability of the prediction fault type.

The training process of the fault prediction model is as follows:

acquiring a sample data set, and labeling sample data in the sample data set to obtain a first sample set;

preprocessing the first sample set based on a preprocessing rule to obtain a second sample set;

inputting the second sample set into a first preset model to generate an expanded sample set;

adding the extended sample set to the second sample set to obtain a training sample set;

inputting the training sample set into a data feature extraction model, and training the feature extraction model based on a wolf optimization algorithm to obtain a target feature set;

Inputting the training sample set and the target feature set into a classifier, training the classifier based on a cost sensitive function, and outputting a predicted fault class set;

and inputting the predicted fault class set into a confidence judgment model, and outputting a predicted result meeting the confidence to obtain a fault prediction model.

With reference to the first aspect, the sample data are readings of at least one sensor at a current time node when the energy facility operates and a current operating state of the energy facility;

labeling the sample data in the sample data set to obtain a first sample set, including:

storing the sample data in a time-series format;

and labeling the sample data according to the current running state of the energy facility at the time node for each time node to obtain a first sample set.

With reference to the first aspect, the first preset model includes a particle filter;

inputting the second sample set into a first preset model, and generating an extended sample set comprises the following steps:

inputting the second sample set into a preset particle filter to obtain equipment failure sample data in the second sample set;

inputting the equipment fault sample data into second equipment state data of a preset state transition model, and generating new second equipment fault sample data by using an observation model;

And inputting the second equipment fault sample data into a preset generation countermeasure network model to obtain an expansion sample set.

With reference to the first aspect, the step of inputting the training sample set into the data feature extraction model and training the feature extraction model based on the wolf optimization algorithm to obtain the target feature set includes:

re-processing the training sample set according to the time sequence characteristics of the training sample set to obtain time window data;

the time window data is input into a preset stack self-encoder after being delayed, and the stack self-encoder is trained based on a gray-wolf optimization algorithm to obtain a feature set containing at least one feature;

for each feature, calculating the information entropy of the feature;

judging whether the information entropy is larger than an information entropy threshold value or not;

if yes, determining the characteristics corresponding to the information entropy as target characteristics, and obtaining a target characteristic set.

With reference to the first aspect, the training sample set includes a plurality of failed training samples and at least one non-failed training sample;

inputting the training sample set and the target feature set into a classifier, training the classifier based on a cost sensitive function, and obtaining a predicted fault class, wherein the method comprises the following steps:

Weighting each training sample to update a preset cost sensitive function so as to obtain a target cost sensitive function;

and training the classifier based on the target cost sensitivity function to obtain a predicted fault class set.

In combination with the first aspect, the step of inputting the predicted fault class set into a confidence judgment model, outputting a predicted result satisfying the confidence, and obtaining a fault prediction model includes:

inputting the predicted fault class set into a second preset model, and outputting the confidence degree corresponding to each predicted fault class;

judging whether the confidence coefficient is larger than a set threshold value according to each confidence coefficient;

if yes, determining and outputting a prediction result, wherein the prediction result comprises a prediction fault category and a corresponding confidence level.

In a second aspect, the present application provides an energy facility fault prediction apparatus based on deep learning, the apparatus comprising:

the acquisition module is used for acquiring monitoring data of the energy facility;

the preprocessing module is used for preprocessing the monitoring data to obtain first data;

the prediction result output module is used for inputting the first data into a preset fault prediction model and outputting a prediction result, wherein the prediction result comprises a prediction fault type and a confidence coefficient, and the confidence coefficient is used for representing the occurrence probability of the prediction fault type;

The training process of the fault prediction model is as follows:

In a third aspect, the present application provides an electronic device comprising a memory having a computer program stored therein and a processor executing the computer program to perform a method as described above.

In a fourth aspect, the present application provides a computer readable storage medium having a computer program stored therein, the computer program being executable by a processor to implement a method as described above.

The embodiment of the invention has the following beneficial effects: the invention provides a deep learning-based energy facility fault prediction method, a deep learning-based energy facility fault prediction device and electronic equipment, wherein the method comprises the steps of obtaining monitoring data of an energy facility; preprocessing the monitoring data based on a preprocessing rule to obtain first data; inputting the first data into a preset fault prediction model, and outputting a prediction result, wherein the prediction result comprises a prediction fault type and a confidence coefficient, and the confidence coefficient is used for representing the occurrence probability of the prediction fault type; wherein, the training process of the fault prediction model is as follows: acquiring a sample data set, and labeling sample data in the sample data set to obtain a first sample set; preprocessing the first sample set based on a preprocessing rule to obtain a second sample set; inputting the second sample set into a first preset model to generate an expanded sample set; adding the extended sample set to the second sample set to obtain a training sample set; inputting the training sample set into a data feature extraction model, and training the feature extraction model based on a wolf optimization algorithm to obtain a target feature set; inputting the training sample set and the target feature set into a classifier, training the classifier based on a cost sensitive function, and outputting a predicted fault class set; and inputting the predicted fault class set into a confidence judgment model, and outputting a predicted result meeting the confidence to obtain a fault prediction model. The method provided by the invention inputs the monitoring data of the preprocessed energy facility into the preset fault prediction model to obtain the prediction result comprising the prediction fault type and the confidence coefficient, wherein the fault prediction model is the deep learning-based energy facility fault prediction model obtained based on the gray wolf optimization algorithm and the cost sensitive function training, so that the accuracy of data feature extraction and the accuracy of fault classification prediction can be improved, and the accuracy of fault prediction and the efficiency of energy facility management are improved; in addition, the output prediction result comprises the prediction fault type and the confidence coefficient, so that equipment maintenance personnel can conveniently grasp the possible fault type of the energy equipment and the probability of occurrence of the fault of the type, and the efficiency of energy facility management is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an energy facility fault prediction method based on deep learning according to an embodiment of the present invention;

FIG. 2 is a flowchart of a training process of an energy facility failure prediction model in the deep learning-based energy facility failure prediction method according to an embodiment of the present invention;

Fig. 3 is a schematic structural diagram of an energy facility fault prediction device based on deep learning according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to facilitate understanding of the present embodiment, technical terms designed in the present application will be briefly described below.

A particle filter (particle filter) is a recursive filter using the monte carlo method (Monte Carlo method), which uses a set of weighted random samples (called particles) to represent the posterior probability of random events, estimates the state of a dynamic system from a noisy or incomplete observation sequence, and can be applied to any state space model.

Particle filters are a generalized method of Kalman filters (Kalman filters) built on a linear state space and gaussian distributed noise; while the state space model of the particle filter may be nonlinear and the noise distribution may be of any type.

The particle filter is capable of estimating the internal state of the dynamic system from a series of observations that contain noise or imperfections. In the analysis of a dynamic system, two models are required, one to describe the change of state with time (system model) and the other to describe the noise observed in each state (observation model), both models being represented by probabilities. In many cases, an estimate must be made of the system every time a new observation is obtained, and this can be effectively achieved using a recursive filter. The recursive filter processes the acquired data continuously, rather than batchwise, and thus does not require storing the complete data, nor reprocessing the existing data when new observations are acquired. The recursive filter comprises two steps:

and (3) predicting: using the system model, the probability density function of the next state is predicted from the information of the previous state.

Updating: the predicted probability density function is modified using the most recent observations.

After technical terms related to the application are introduced, application scenes and design ideas of the embodiment of the application are briefly introduced.

The traditional energy facility fault early warning method has the defects of difficult prediction of the energy equipment faults and unsatisfactory prediction accuracy caused by factors such as large data volume, incomplete data acquisition, low feature extraction efficiency, inaccuracy and the like.

Example 1

The embodiment of the application provides an energy facility fault prediction method based on deep learning, which is shown in combination with fig. 1, and comprises the following steps:

s110, acquiring monitoring data of the energy facility.

S120, preprocessing the monitoring data based on the preprocessing rule to obtain first data.

S130, inputting the first data into a preset fault prediction model, and outputting a prediction result, wherein the prediction result comprises a prediction fault type and a confidence coefficient, and the confidence coefficient is used for representing the occurrence probability of the prediction fault type.

The training process of the fault prediction model is as follows:

s210, acquiring a sample data set, and labeling sample data in the sample data set to obtain a first sample set.

S220, preprocessing the first sample set based on the preprocessing rule to obtain a second sample set.

S230, inputting the second sample set into the first preset model to generate an extended sample set.

S240, adding the extended sample set to the second sample set to obtain a training sample set.

S250, inputting the training sample set into a data feature extraction model, and training the feature extraction model based on a gray wolf optimization algorithm to obtain a target feature set.

S260, inputting the training sample set and the target feature set into a classifier, training the classifier based on a cost sensitive function, and outputting a predicted fault class set.

S270, inputting the predicted fault class set into the confidence judgment model, and outputting a predicted result meeting the confidence to obtain a fault prediction model.

In the embodiment, the monitoring data of the energy facility are preprocessed and then input into a preset fault prediction model to obtain a prediction result, wherein the fault prediction model is a deep learning-based energy facility fault prediction model obtained based on a gray wolf optimization algorithm and cost sensitive function training, so that the accuracy of data feature extraction and the accuracy of fault classification prediction can be improved, the accuracy of fault prediction is improved, and the efficiency of energy facility management is improved; in addition, the output prediction result comprises the prediction fault type and the confidence coefficient, so that equipment maintenance personnel can conveniently grasp the possible fault type of the energy equipment and the probability of occurrence of the fault of the type, and the efficiency of energy facility management is improved.

In the present embodiment, the sample data is derived from at least one sensor associated with the energy facility, specifically, step S210 includes: temperature sensors, pressure sensors, humidity sensors, etc., may be implemented to monitor the operating conditions of the device and generate data.

With reference to the first aspect, the sample data is a reading of at least one sensor at a current time node when the energy facility is running and a current running state of the energy facility;

storing the sample data in a time-series format;

In particular, the acquired sample data is stored in a time-series format, which may be formed as a two-dimensional array, wherein,indicate->First->Readings from the individual sensors. In the two-dimensional array described above, the rows represent time points and the columns represent different sensors. For example, there is +.>A device for individual sensors, < >>The data of the time point can be expressed as oneVector of dimensions- >Wherein->Is->The individual sensor is at->Readings at time points.

Data for each time pointVarious characteristics can be calculated, such as average readings, maximum/minimum readings, standard deviation of readings, etc., for each sensor.

Further, data X for each time point _i Labeling according to the running state of the energy facility.

For example, if at the firstIf the time point equipment operates normally, marking +.>The method comprises the steps of carrying out a first treatment on the surface of the If at->If the equipment at the time point fails, marking +.>. This way of labeling can be expressed as:Wherein, the->Is data, & lt + & gt>Is a label.

In one embodiment, it is assumed that there is an energy facility with three sensors that record the temperature, pressure and humidity of the facility separately, and that record data for a total of T time points. The data may be represented as oneIs a matrix of (a):

；

wherein,，，respectively represent->Temperature, pressure and humidity at the time point. If at->The time point device is normally operated, then->The method comprises the steps of carrying out a first treatment on the surface of the If at->When the equipment fails, the equipment is +.>. The tag may then be indicated as one +.>Vector of->：

。

In a specific step S220, the preprocessing rule provided in the present embodiment includes: data cleaning, data standardization and data smoothing.

Wherein the data cleansing is used to remove or repair errors, anomalies, or inconsistent values in the data. In the embodiment, the missing data is replaced by a median or a mean value; specific:

for raw data acquired from a deviceAnd (5) matrix, and performing data cleaning. For abnormal readings, such as missing values, which may be generated by the energy device, the average value of the data at adjacent time points is used for filling. Suppose at the firstTime point, th->The absence of readings from each sensor is filled by the following method, which can be expressed as:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the reading of the jth sensor at the ith time point;Indicating the reading of the jth sensor at time i-1,/th sensor>Indicating the reading of the jth sensor at the (i+1) th time point.

The data normalization is used for converting the data into a standard form, and Z-score normalization is adopted in the embodiment, so that the characteristics of different scales can be compared under the same scale; specific:

after the data cleansing is completed, the data matrix X is Z-score normalized. Calculate the firstAverage value of individual sensorsAnd standard deviation->The corresponding average value is then subtracted from each sensor reading and divided by the standard deviation, which can be expressed as:

。

Wherein,is a characteristic value before normalization, +.>Is a normalized characteristic value.

The data smoothing process is used for eliminating noise in the data and improving the data quality. In this embodiment, a method of using a sliding window average is used to smooth the data. This eliminates random fluctuations in the short term, thus better reflecting long-term changes in the state of the device. Specific: the size of the sliding window is set to beThe way in which data smoothing is performed can be expressed as:

；

wherein X is _k,j Representing the reading of the jth sensor at the kth time point; for the followingMake->。

Because the acquisition and labeling of training samples are often difficult in the energy setting and safety management tasks, the number and quality of the training samples often have difficulty in meeting the training requirements of high-performance models, resulting in reduced generalization capability of the models. Therefore, the data expansion technique is utilized to increase the diversity and number of data at step S230. In the embodiment, a particle filter improvement strategy based on adaptive weights is introduced in the specific step S230 to optimize the performance of the particle filter, so that the distribution of equipment fault data can be more accurately simulated, and the effect of data enhancement is further improved.

In this embodiment the first pre-set model comprises a particle filter, a generation countermeasure network and a self-encoder; the particle filter comprises a state transition model and an observation model, and is used for generating new sample data based on the first sample data set; generating new sample data for the antagonism network for further optimizing the particle filter generation; the self-encoder is used for carrying out noise reduction processing on the generated factor sample data to obtain an extended sample.

First, a distribution of equipment failure data is modeled using a particle filter. Assume thatIndicate the time point +.>Is->Indicate the time point +.>According to observation data ∈>Estimating posterior distribution of device statesThe posterior distribution of the device state serves as an extended sample.

In particular, the particle filter uses a set of particlesTo approximate a posterior distribution of device states, wherein +.>For the number of particles>Indicate->Individual grainsChild (i.e. possible device status), -in (a) a first device status>Indicate->Weight of individual particles. Weight->The calculation can be made by the following formula:

；

wherein,representing the status of the device->Lower observations->Likelihood function of>Indicate->Weight of the individual particle in last iteration, +.>Is an adaptive parameter, which is +. >The manner of calculation of (a) can be expressed as:

；

where N is the number of particles.

Further, generating new device states using state transition modelsThen generating new equipment failure data +.>This process can be expressed as:

；

wherein,is a state transition model;Is an observation model;Is at the time +.>Is defined by +.>Carrying out average calculation to obtain;Is at the time +.>Equipment failure data of (a).

In this way, the particle filter generates new sample data.

Further, generating the antagonism network is used to further optimize the new sample data generated by the particle filtering. The generation of the countermeasure network is a model that contains both networks (generator G and arbiter D). The object of the generator G is to generate new data that looks like real data, and the object of the arbiter D is to distinguish real data from data generated by the generator G.

Specifically, the generator G is trained as a mapping functionWherein->Is random noise->Is the space for equipment failure data. For each random noise->The generator G outputs a device failure data +.>. The discriminator D is trained as a classifier, which outputs, for the input equipment failure data, the probability that the data is derived from the real equipment failure data. The training objectives of generator G and arbiter D are as follows:

；

Wherein,representing a generator in the generation countermeasure network, the function being to generate new equipment failure data;Means for generating device failure data against network generation;Representing a discriminator in the generation countermeasure network, the role of which is to distinguish the true equipment failure data from the equipment failure data generated by the generator;In order to be subject to a specific probability distribution,representing real sample->Subject to->Distribution of->Representing noise->s is subject to->Is a distribution of (3);representing the distribution of real equipment failure data, +.>Representing the distribution of random noise;Is a desired mathematical symbol, representing averaging or desiring.s represents random noise; during training of generator G, random noise +.>s is used as input, and the generator G generates new equipment failure data based on this random noise.

Further, an information gain function IG is defined, the objective of which is to quantitatively generate an information gain against the network generated data versus equipment failure classification tasks.

Assume thatRepresenting realityEquipment failure data,/-, of (2)>Representing the generation of device failure data against network generation, < >>And->Respectively representing data distribution on equipment fault classification tasks, defining information gain function The following are provided:

;

wherein,KL divergence is expressed to measure the similarity of two distributions.

The aim of the algorithm is to optimize the generator that generates the countermeasure networkSo that the information gain function->Maximum. This is equivalent to minimizing the loss function of the generator +.>Can be expressed as:

；

where n represents the amount of data,representing random noise sampled from the noise distribution.Representing a preset classifier. L (L) _G Is defined as follows:

。

further, the parameters of generator G are updated, using a gradient descent method to minimize the loss functionCan be expressed as:

；

wherein,representation generator->Parameter of->Representing the learning rate.

Further, the self-encoder is utilized to perform noise reduction processing on the generated data, so that the stability of the data is enhanced.

In particular, for generators that have been trainedGenerating new device failure data->Wherein, the method comprises the steps of, wherein,is extracted from the distribution of random noise.

Further, the generated data is passed through a self-encoder comprising an encoder and a decoder. The encoder will input the data encoder will input the device failure dataRepresentation encoded as implicit space->The decoder then decodes this implicit representation as reproduced data +. >Can be expressed as:

an encoder:；

a decoder:；

wherein,representation encoder->Representing the decoder.

The training goal of the self-encoder is to minimize reconstruction errors, i.e., input device failure dataAnd regenerated equipment failure data->The difference between, in this embodiment, is measured using the Mean Square Error (MSE), which can be expressed as:

;

wherein,is the average squared error, +.>Is the number of data, +.>Is the ith equipment failure data and +.>Are respectively->Regenerated data corresponding to the equipment failure data.

By optimizing the reconstruction error, the self-encoder can learn how to remove noise in the input data, thereby obtaining clear and stable generated data, and further obtaining an extended sample set generated based on the second sample set.

In a specific step S240, the extended sample set is added to the second sample set, so as to obtain a training sample set.

In particular, since the data of the energy facility often has time-series characteristics, such as temperature readings, pressure readings, etc., all vary over time. For this type of data, the feature extraction model in the embodiment of the application includes a stack self-encoder, and in this embodiment, a time window feature extraction strategy is proposed and combined with the stack self-encoder to perform feature extraction, so as to improve the accuracy of feature extraction. In the specific step S250, the training sample set is input into a data feature extraction model, the feature extraction model is trained based on a wolf optimization algorithm, and the obtaining of the target feature set includes:

S251, processing the training sample set again according to the time sequence characteristics of the training sample set to obtain time window data.

S252, the time window data is input into a preset stack self-encoder after being delayed, and the stack self-encoder is trained based on a wolf optimization algorithm to obtain a feature set containing at least one feature.

S253, for each feature, information entropy of the feature is calculated.

S254, judging whether the information entropy is larger than an information entropy threshold value.

And S255, if yes, determining the feature corresponding to the information entropy as the target feature, and obtaining a target feature set.

Specifically, consider a length ofWill be->Data of time of day->And (4) front->Data for the individual time points>Together into the self-encoder to better capture the dynamic behavior of the time series data.

Specifically, in step S251, the input time window data is subjected to time delay processing, so as to obtain new input data:

;

wherein,representing a vector of all data points within a time window, new input data +.>Input into the stacked self-encoder to better capture the dynamic behavior of the time series data.

In a specific step S252, new input data is obtained Input to the stack self-encoder, the training process of the stack self-encoder includes two stages, encoding and decoding.

In the encoding phase, the output of each layer is defined as：The method comprises the steps of carrying out a first treatment on the surface of the Wherein,is->Weights of layers, ++>Is->Bias of layer->Is the output of the upper layer;igmoid () is a Sigmoid activation function.

Reconstructing the input during the decoding phase, minimizing the reconstruction error, objective functionCan be expressed as:

the method comprises the steps of carrying out a first treatment on the surface of the Where nf is the number of eigenvalues.

The stack self-encoder is trained based on the wolf optimization algorithm, and for the training process of the stack self-encoder, the position of each parameter is considered as one wolf in each iteration, and each wolf updates their position (parameter value) according to their relative position with the lead wolf.

Setting the parameter set of the stack self-encoder asThe objective function is +.>. Each parameter->The positions are randomly initialized in the search space corresponding to the positions of one wolf, and the positions are randomly initialized in the search space.

In the gray wolf optimization algorithm, the updating of the wolves is guided by three dominant wolves (currently optimal, suboptimal and third-optimal wolves) and respectively marked as alpha, beta and delta. Is arranged at the firstIn the round iteration, the position of the dominant wolf (corresponding parameter value) is +. >、、. In this embodiment, a negative feedback mechanism is introduced, which is related to the distance between the position of the wolf group and the dominant wolf alpha. The negative feedback term is defined as follows:The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representation->Absolute value of each element of vector,Is->Negative feedback coefficient in round iteration. Wherein the negative feedback coefficient is adaptively adjusted. Assume that the current iteration number is +.>The initial negative feedback coefficient is +.>Then in->In the iteration, the negative feedback sparsity->The calculation can be made by the following formula:

；

wherein,is a preset positive number parameter for controlling the falling speed of the learning rate.

Further, the following degree of each wolf to the dominant wolf alpha, beta and delta is calculated, and the following degree is realized by calculating the distance between the wolf and the dominant wolf, and can be expressed as follows:

；

wherein,、、is a randomly generated coefficient vector, each element being in [0,2 ]]Between them.

Further, the step size of each wolf approaching the dominant wolf α, β, δ is calculated as:

；

wherein,is a linearly decreasing constant, +.>、、Is a randomly generated vector with each element at [0,1 ]]Between them.

Further, according to the step length, the position of the wolf group is updated, which can be expressed as:

；

then the original wolf's position update formula is the average of the three dominant wolf's positions The method comprises the steps of carrying out a first treatment on the surface of the Wherein F is a negative feedback term.

And continuously and iteratively updating the position of the wolf cluster (the parameter value of the stack self-encoder) until the preset iteration times are reached or the loss function is reduced to a preset threshold value, stopping iteration, and outputting the position P' of the current wolf cluster as the optimization parameter of the stack self-encoder to extract at least one feature and obtain a feature set containing at least one feature.

Specifically, the features extracted in step S253 calculate information entropy to further select valuable features. Entropy is an important concept in information theory, used for measuring uncertainty of information, and has probability distribution as to a discrete random variable X，The information entropy is defined as:

；

each feature is considered as a random variable, and then its information entropy is calculated. For feature setsCalculating the information entropy of each feature>Then selecting information entropy greater than a threshold +.>As final characteristics, can be expressed as:。

The invention provides a method for extracting characteristics after optimizing parameters of a stack self-encoder based on a gray wolf optimization algorithm, wherein the stack self-encoder firstly converts input data into a low-dimensional code through an encoding process, and then attempts to restore the input data through a decoding process. The invention optimizes the parameters of the stack self-encoder based on the gray-wolf optimization algorithm aiming at the characteristics of the energy facility detection data so as to improve the accuracy and efficiency of feature extraction.

In a specific step S260, the training sample set includes a plurality of failure training samples and at least one non-failure training sample; step S260 inputs the training sample set and the target feature set into a classifier, trains the classifier based on a cost sensitive function, and outputs a predicted fault class set specifically comprising:

s261, inputting a training sample set and a target feature set into a classifier, training the classifier based on a cost sensitive function to obtain a predicted fault class, wherein the step comprises the following steps:

s262, weighting each training sample to update a preset cost sensitive function so as to obtain a target cost sensitive function;

and S263, training the classifier based on the target cost sensitive function to obtain a predicted fault class set.

In the specific step S270, the classifier in this embodiment is a bayesian classifier, which is a machine learning classification algorithm based on bayesian theory, and has the main advantages of fewer data hypotheses and stronger generalization capability. Data set obtained by assumed feature extractionThere is->The category set C may be expressed as:. At the same time, every sample->All have a label->I.e. the corresponding category. ThenLabel set->Can be expressed as:

;

Defining a cost functionFor representing the sample->Misclassification into category->At the cost of (2). Let->Is non-negative and when->When (I)>。

Then, the goal is to minimize the overall cost, which can be expressed as:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein m is the total number of samples, +.>Is the category predicted by the classifier.

Further, in a conventional Bayesian classifier, for each sampleIt calculates each class +.>Posterior probability>The samples are then classified into the class with the highest posterior probability. In the Bayesian classifier with cost sensitivity function in the invention, the cost sensitivity posterior probability of each class is calculated>The samples are then classified into categories with a minimum cost-sensitive posterior probability, resulting in a set of predicted fault categories comprising one or more predicted fault categories.

In the initial stage of training the Bayesian classifier, the performance and stability of the model are further improved through random weight initialization.

Specifically, a random matrix is generatedEvery element->Are independently extracted from the standard normal distribution. The initialization weight matrix is calculated from the random matrix R and the predefined weight matrix W. Specifically, a random matrix is used +.>And a predefined weight matrix- >Multiplication by element is carried out on the elements of (2) to obtain an initialized weight matrix:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the dot product calculation of the matrix.

Under such an initialization strategy, different weights will have different initial values, so that the training of the model is more stable, and the sinking into local optimum can be avoided.

Further, during training, a gradient descent method is used to minimize overall cost. For each sampleAccording to the label->And class of model predictions->Calculating cost->The parameters of the model are then updated so that the overall cost is minimized.

In the embodiment of the application, the self-adaptive learning rate is introduced in the training process to replace the setting of the fixed learning rate in the traditional gradient descent method so as to achieve the purpose of rapid convergence.

Specifically, the parameters of the model are set asThe learning rate is->Then, in each step of gradient descent, the update of the parameters can be expressed as:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein, learning rate->Is a gradient dependent function +.>Therefore, the parameter update mode becomes:

；/>

；

based on the method, when the gradient is large, the learning rate is small, and oscillation caused by overlarge step length is avoided; when the gradient is smaller, the learning rate is larger, and the convergence speed is increased.

Further, in the tasks of energy facility safety management and early warning, failure samples are often less than non-failure samples, and are more important. Therefore, a minority class of samples needs to be given more weight so that the model can pay more attention to these samples during training. Thus, the present embodiment performs sample weight adjustment during training by:

specifically, the present invention is for each sampleIntroducing a weight +.>So that the cost function becomes:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>，The representation belongs to the sample->The number of samples in a category is such that when the number of samples in a category is small, the weight of samples belonging to the category is greater;Is regularized item based on entropy value, wherein +.>Is a super-parameter balancing cost and entropy, +.>The measure of the entropy of the input samples can be expressed as:

。

further, in training using gradient descent, it is necessary to calculate weighted cost functions with respect to parametersCan be expressed as:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The difference value of the weight of the training sample and the weight of the last training sample is>To add sample->Misclassification into category->At the expense of (a) the number of (c) is,for the training, sample +.>Misclassification into category- >Cost of (1) and sample +.>Misclassification into category->The difference in cost of (2);

further, in parameter updating, a weighted gradient needs to be used, which can be expressed as:

；

wherein,is a momentum vector +.>Is initially set to zero and then updated at each iteration.

The updating mode is as follows:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a momentum vector gradient.

Momentum vector gradientThe calculation method can be expressed as:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The weight of the dynamic term can be set and adjusted according to the actual working requirement.

In this embodiment, the classifier is trained in combination with a cost-sensitive function. However, conventional bayesian classifiers may ignore the cost sensitivity of the samples, e.g., in energy facility fault pre-warning, the costs of different types of faults may vary significantly. Therefore, the invention adopts an improved Bayesian classifier and combines cost sensitivity functions.

Step S270 is to input the predicted fault class set into a confidence judging model, output the predicted result meeting the confidence, and obtain a fault predicting model, and specifically comprises the following steps:

s271, inputting the predicted fault class set into a second preset model, and outputting the confidence degree corresponding to each predicted fault class.

S272, judging whether the confidence coefficient is larger than a set threshold value for each confidence coefficient.

If the confidence is greater than the set threshold, step S273 is performed.

S273, determining and outputting a prediction result, wherein the prediction result comprises a prediction fault category and a corresponding confidence level.

When fault early warning is carried out, not only the prediction result (namely the prediction fault type) of the model is concerned, but also the confidence of the model on the prediction result is concerned.

In step S271 of the present embodiment, it is provided thatFor new input data, ++>As a possible type of fault,prediction of +.>Belonging to->Defining the confidence as the probability of the model prediction result, namely:

。

in step S272 of this embodiment, the confidence coefficient threshold calculated in S271 is compared, and when the confidence coefficient is greater than the threshold, the confidence coefficient is added to the fault early warning result and output.

For example, the predicted fault class is A class and B class, and the reliability probability corresponding to the A class is calculated to be 0.6; the probability of the credibility corresponding to the class B is 0.9, and the set threshold value is 0.8, so that the probability of the occurrence of the class A fault is smaller than the set threshold value; the probability of a class B fault occurring is greater than a set threshold. At this time, the fault category early warning is not displayed in the fault early warning result; the final prediction result is: class B faults, confidence 0.9.

Typically, before step S272, the failure prediction model loss needs to be calculated according to the output result, and the model is optimized according to the model loss until the model loss converges to a desired degree, so as to obtain a final failure prediction model. The method is a conventional technical means of model training, and is not limited herein.

Further, the confidence threshold of the present invention may be a dynamically adjusted thresholdOnly when the confidence is above the threshold will a fault warning be issued. Specifically, an initial threshold is first set>And then dynamically adjusting the threshold according to the early warning effect of the model on the historical data. Is provided with->Is->Real number of cases after secondary early warning, +.>Is->And defining early warning precision as follows:

；

further, the threshold value is dynamically adjusted according to the early warning precision:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a learning rate parameter which needs to be set according to actual demands and is set by people.

Based on the above, if the early warning effect of the model is poor (i.e. the early warning precision is low), the threshold value is increased to reduce the false warning rate; otherwise, if the early warning effect of the model is better (namely, the early warning precision is higher), the threshold value is reduced to improve the fault detection rate.

In a second aspect, the present application provides an energy facility fault prediction apparatus based on deep learning, as shown in conjunction with fig. 3, the apparatus including: the device comprises an acquisition module 10, a preprocessing module 20 and a prediction result output module 30.

The acquisition module 10 is used for acquiring monitoring data of the energy facility.

The preprocessing module 20 is configured to preprocess the monitoring data to obtain first data.

The prediction result output module 30 is configured to input the first data into a preset fault prediction model, and output a prediction result, where the prediction result includes a prediction fault type and a confidence level, and the confidence level is used to characterize a probability of occurrence of the prediction fault type;

the training process of the fault prediction model is as follows:

In a third aspect, the present application provides an electronic device, as shown in connection with fig. 4, comprising a memory 40 and a processor 41, the memory 40 storing a computer program, the processor 41 executing the computer program to implement a method as described above. As shown in connection with fig. 4, the electronic device further comprises a bus 42 and a communication interface 43, wherein the processor 41, the communication interface 43 and the memory 40 are connected by means of the bus 42. The memory 40 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and the at least one other network element is achieved via at least one communication interface 43 (which may be wired or wireless), which may use the internet, a wide area network, a local network, a metropolitan area network, etc. The Bus 42 may be an ISA (Industry Standard Architecture ) Bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) Bus, an EISA (Extended Industry Standard Architecture ) Bus, or the like, and may be an AMBA (Advanced Microcontroller Bus Architecture, standard for on-chip buses) Bus, where AMBA defines three types of buses, including an APB (Advanced Peripheral Bus) Bus, an AHB (Advanced High-performance Bus) Bus, and a AXI (Advanced eXtensible Interface) Bus. The bus 42 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.

The processor 41 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 41 or by instructions in the form of software. The processor 41 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), and the like; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory and the processor 41 reads the information in the memory and in combination with its hardware performs the method as shown in any of the foregoing figures 1 to 2.

Fourth aspect the present application provides a computer readable storage medium having a computer program stored therein, the computer program being executable by a processor to implement a method as described above.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

In addition, in the description of embodiments of the present invention, unless explicitly stated and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood by those skilled in the art in specific cases.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention for illustrating the technical solution of the present invention, but not for limiting the scope of the present invention, and although the present invention has been described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that the present invention is not limited thereto: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. An energy facility fault prediction method based on deep learning, which is characterized by comprising the following steps:

acquiring monitoring data of energy facilities;

inputting the first data into a preset fault prediction model, and outputting a prediction result, wherein the prediction result comprises a prediction fault type and a confidence coefficient, and the confidence coefficient is used for representing the occurrence probability of the prediction fault type;

the training process of the fault prediction model is as follows:

inputting the second sample set into a particle filter preset in a first preset model to obtain equipment fault sample data in the second sample set;

inputting the second equipment fault sample data into a preset generation countermeasure network model to obtain an expansion sample set;

Adding the extended sample set to the second sample set to obtain a training sample set; the training sample set includes a plurality of failed training samples and at least one non-failed training sample;

for each feature, calculating the information entropy of the feature;

if yes, determining the characteristics corresponding to the information entropy as target characteristics, and obtaining a target characteristic set;

training the classifier based on the target cost sensitive function to obtain a predicted fault class set;

2. The method of claim 1, wherein the sample data is a reading of at least one sensor at a current time node when the energy facility is operating and a current operating state of the energy facility;

storing the sample data in a time-series format;

3. An energy facility fault prediction apparatus based on deep learning, the apparatus comprising:

The training process of the fault prediction model is as follows:

For each feature, calculating the information entropy of the feature;

4. An electronic device comprising a memory and a processor, the memory having a computer program stored therein, the processor executing the computer program to implement the method of any of claims 1-2.

5. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, which is executed by a processor to implement the method according to any of claims 1-2.