Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a device for controlling self-adaptive combustion parameters of garbage incineration, which solve the problem that the existing garbage incineration control technology needs to rely on a plurality of independent models with strong coupling, and can improve the regulation precision and the regulation efficiency.
In a first aspect, an embodiment of the present application provides a method for controlling a garbage incineration adaptive combustion parameter, including:
Acquiring garbage incineration data related to control parameters in a designated monitoring period;
Decomposing the garbage incineration data into a plurality of Gaussian distribution components, acquiring corresponding data probabilities, and determining a sub-data set of each Gaussian distribution component based on the data probabilities, wherein the data probabilities are used for indicating that any garbage incineration data meets probability values of the Gaussian distribution components;
performing feature extraction on the sub-data set by using a pre-established feature extraction model, and updating network parameters in the feature extraction model when an error result is detected to be larger than a preset error threshold value so as to output a predicted value of the control parameter, wherein the error result is used for representing an error between a designated label and an output result of the feature extraction model;
and determining a control strategy of the control parameter based on the predicted value, wherein the control strategy is used for indicating and controlling the current garbage incineration data.
In one possible implementation manner, the decomposing the garbage incineration data into a plurality of gaussian distribution components and obtaining corresponding data probabilities, determining a sub-data set of each gaussian distribution component based on the data probabilities, includes:
Constructing the garbage incineration data into a data matrix;
Inputting the data matrix into a Gaussian mixture model containing a plurality of Gaussian distribution components, distributing each element in the data matrix into the Gaussian distribution components, and outputting corresponding data probability, wherein the number of the Gaussian distribution components is consistent with the number of data categories of the garbage incineration data;
based on the elements and the corresponding data probabilities, a sub-dataset is constructed for each of the gaussian distribution components.
In a possible implementation manner, the feature extraction model comprises a first feature extraction model and a second feature extraction model which are sequentially arranged, wherein the first feature extraction model comprises a plurality of multi-layer perceptrons;
The feature extraction of the sub-data set by using a pre-established feature extraction model, and updating network parameters in the feature extraction model when an error result is detected to be greater than a preset error threshold value, so as to output a predicted value of the control parameter, including:
inputting the sub-data sets into the first feature extraction model, performing feature extraction on the corresponding sub-data sets through the multi-layer perceptron, and taking the data feature values output by the multi-layer perceptron as the output of the first feature extraction model;
Vertically splicing the data characteristic values to obtain fusion characteristics;
and extracting the time characteristics of the fusion characteristics through the second characteristic extraction model so as to output the predicted value of the control parameter.
In one possible implementation manner, the multi-layer perceptron includes a plurality of hidden layers, the feature extraction is performed on the sub-data set by using a pre-established feature extraction model, and when an error result is detected to be greater than a preset error threshold, network parameters in the feature extraction model are updated to output a predicted value of the control parameter, and the method further includes:
For any multi-layer perceptron, detecting whether an error result of the multi-layer perceptron is larger than a preset error threshold, wherein the error result is used for indicating the difference between the appointed label and a predicted value determined by taking the output of a current hidden layer as a data characteristic value currently output by the multi-layer perceptron;
When the error result is detected to be larger than the error threshold value and the number of nodes of the nodes in each hidden layer contained in the multi-layer perceptron is smaller than the node number threshold value, adding new nodes in the hidden layer and enabling the added nodes to meet constraint conditions, wherein the node number threshold value is determined by the data length of the garbage incineration data, and the constraint conditions are determined by the error result before adding the nodes, the output of the last node in the hidden layer before adding the nodes, the output of the added nodes and the error result after adding the nodes;
Updating the output weight of each node in the hidden layer based on the appointed label, the output weight of each node before the node is added and the output of each node after the node is added;
and updating the data characteristic value output by the multi-layer perceptron based on the updated output weight value.
In one possible implementation manner, the feature extracting the sub-data set by using a pre-established feature extracting model, and updating network parameters in the feature extracting model when an error result is detected to be greater than a preset error threshold value, so as to output a predicted value of the control parameter, and further include:
Determining the output of the hidden layer by the following formula (1):
(1)
Wherein, For the sub-data set to be used,Representing a set of output weights for each node in the hidden layer before the node is added,Indicating that the number of nodes in the hidden layer before the node is increased,For the length of the data to be described,Indicating that the output weight of the j-th node in the hidden layer before the node is increased,Representing the activation function of the j-th node,Representing the input weight of the j-th node,Representing the bias of the jth node;
determining the error result by the following formula (2):
(2)
Wherein, Representing the error result before adding the node,For the purpose of the said assigned tag(s),Representing a predicted value determined by taking the output of the current hidden layer as the current output data characteristic value of the multi-layer perceptron;
the constraint is determined by the following equation (3):
(3)
Wherein, Representing the error result for the sub-data set before adding a node,Represent the firstThe output of the individual nodes is provided with,Representing the output of the added node,Representing an error result after adding the node;
Updating the output weight of each node in the hidden layer through the following formula (4):
(4)
Wherein, Representing a set of output weights for each node in the hidden layer after adding the node,Representing the output of each node in the hidden layer after the node is added; is a preset F norm; representing a pseudo-inverse operation; The value of the argument that minimizes the function is represented.
In one possible implementation manner, the monitoring period is used for indicating a time interval formed by the current moment and a designated time period before the current moment, and the determining the control strategy of the control parameter based on the predicted value comprises the following steps:
obtaining an observation value of the control parameter at the current moment;
Determining a reinforcement learning evaluation value based on the garbage incineration data in the current monitoring period and a control reward value obtained after responding to the regulation action of the garbage incineration data, wherein the control reward value comprises a reward value corresponding to the control parameter when the value of the control parameter at the next moment falls within a control target interval of the control parameter after the regulation action is executed, and the reinforcement learning evaluation value is used for evaluating a desired cumulative reward obtained by the executed regulation action;
A control strategy for the control parameter is determined based on a regulatory action that maximizes the reinforcement learning assessment value.
In a possible implementation manner, the determining the control strategy of the control parameter based on the predicted value further includes:
Processing a sub-data set determined by the garbage incineration data obtained after the regulation and control actions are executed through the feature extraction model to obtain a predicted value of the control parameter;
And iteratively determining the regulation and control action which maximizes the reinforcement learning evaluation value according to the predicted value, the garbage incineration data obtained after the regulation and control action is executed and the corresponding control reward value, and taking the regulation and control action as a control strategy of the control parameter.
In one possible implementation, the control parameters include a furnace temperature of the incinerator, and the waste incineration data includes a grate speed, a first air supply amount, a second air supply amount, a fan pressure, a grate temperature of the incinerator, and a furnace temperature at a current time.
In a second aspect, an embodiment of the present application provides a garbage incineration adaptive combustion parameter control device, including:
the data acquisition module is used for acquiring the garbage incineration data related to the control parameters in the appointed monitoring period;
The data decomposition module is used for decomposing the garbage incineration data into a plurality of Gaussian distribution components, acquiring corresponding data probabilities, and determining a sub-data set of each Gaussian distribution component based on the data probabilities, wherein the data probabilities are used for indicating that any garbage incineration data meets probability values of the Gaussian distribution components;
The prediction module is used for carrying out feature extraction on the sub-data set by utilizing a pre-established feature extraction model, and updating network parameters in the feature extraction model when an error result is detected to be larger than a preset error threshold value so as to output a predicted value of the control parameter, wherein the error result is used for representing an error between a designated label and an output result of the feature extraction model;
And the strategy generation module is used for determining a control strategy of the control parameter based on the predicted value, wherein the control strategy is used for indicating and regulating the current garbage incineration data.
According to the garbage incineration self-adaptive combustion parameter control method and device, garbage incineration data related to the control parameters are decomposed into the Gaussian distribution components, the sub-data set of each Gaussian distribution component is determined based on the acquired corresponding data probability, the relevance among the data can be kept, then the sub-data set is subjected to feature extraction through the feature extraction model, the network parameters in the feature extraction model are updated when the error result is detected to be larger than the error threshold value, and the predicted value of the control parameters is output, so that the control strategy of the control parameters is determined by means of the predicted value, the complexity of data processing is reduced without depending on a plurality of independent models with strong coupling, the network parameter self-adaptive configuration of the models is realized, the accuracy of model processing is improved, and therefore the regulation precision and regulation efficiency are improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a schematic flow chart of a method for controlling adaptive combustion parameters of garbage incineration according to an embodiment of the present application, where the method includes steps S101 to S104.
S101, acquiring the garbage incineration data related to the control parameters in a designated monitoring period.
In the application, the monitoring period is used for indicating a time interval formed by the current time and a designated time period before the current time, so that the garbage incineration data in the time interval is acquired.
It should be noted that the objective of the method for controlling the self-adaptive combustion parameters of garbage incineration provided by the application may be to reduce the emission of harmful substances in the incineration process as much as possible, such as dioxin compounds (generated by incomplete combustion of chlorine-containing organic substances), carbon monoxide (generated when most substances are incompletely combusted), nitrogen oxides (generated when the incineration temperature is too high), and heavy metal vapors (the metal such as mercury, lead, cadmium, etc. is released into the flue gas in the form of vapor or particles due to the too high incineration temperature). The key factors for reducing the emission of harmful substances in the incineration process are sufficient oxygen and proper temperature, and the incineration temperature can provide the complete combustion temperature and avoid the generation of nitrogen oxides or heavy metal steam when the incineration temperature is 850-950 degrees. According to the application, the current garbage incineration data is used for acquiring the prediction result of the control parameters (such as the incineration temperature, the furnace temperature of the incinerator and the like), and the control strategy is determined by combining the prediction result and the garbage incineration data.
In a specific embodiment, the control parameters include a furnace temperature of the incinerator, and the waste incineration data includes a grate speed, a first air supply amount, a second air supply amount, a fan pressure, a grate temperature and a current furnace temperature of the incinerator.
The control parameter may be, for example, a furnace temperature, a grate speed, a first air supply amount, a second air supply amount, a fan pressure, a grate temperature and a furnace temperature at the current time in a time interval formed by the current time and a specified time period before the current time, a prediction result about the furnace temperature is determined by the above-mentioned garbage incineration data, and a control strategy is determined by combining the prediction result and the garbage incineration data, so that the furnace temperature is kept at 850 ° -950 °.
S102, decomposing the garbage incineration data into a plurality of Gaussian distribution components, acquiring corresponding data probabilities, and determining a sub-data set of each Gaussian distribution component based on the data probabilities, wherein the data probabilities are used for indicating probability values of any garbage incineration data meeting the Gaussian distribution components.
It should be noted that the gaussian mixture model is composed of a plurality of gaussian distributions (i.e., gaussian distribution components), each of which represents a cluster in the data space, and a set of mixture coefficients, which reflect the weights of the gaussian components in the overall model.
According to the application, the garbage incineration data is decomposed into a plurality of Gaussian distributions, so that the Gaussian mixture model can accurately describe the garbage incineration data, and then a sub-data set of each Gaussian distribution is constructed according to the data probability that different garbage incineration data meet different Gaussian distributions, so that the decomposition of the data is realized, and meanwhile, the relevance among the data is reserved.
S103, carrying out feature extraction on the sub-data set by utilizing a pre-established feature extraction model, and updating network parameters in the feature extraction model when an error result is detected to be larger than a preset error threshold value so as to output a predicted value of the control parameter, wherein the error result is used for representing an error between a designated label and an output result of the feature extraction model.
In the present application, the feature extraction model may be constituted by one or more neural network models. In the feature extraction process, data features, data correlations, etc. in the sub-data set may be extracted, and also time features of the data characterization may be extracted, so that the features extract predicted values for the control parameters.
It should be noted that, the present application determines the error result of the feature extraction model according to the difference between the output result (such as the predicted value of the control parameter) of the feature extraction model and the specified label. And then, detecting whether an error result is larger than an error threshold value, if so, triggering and updating the network parameters of the feature extraction model, realizing the self-adaptive adaptation of the model parameters, reducing the errors of the model and the model output result, and improving the prediction accuracy.
For example, the feature extraction model may include a multi-layer perceptron, and the residual (i.e., error result) represents the difference between the network output and the target value (i.e., the specified label in the present application) during the training of the multi-layer perceptron, particularly during the back propagation phase, and directs the update of the weight parameters through the network back propagation.
S104, determining a control strategy of the control parameter based on the predicted value, wherein the control strategy is used for indicating and regulating current garbage incineration data.
In the present application, a control strategy for controlling the refuse incineration data may be determined by reinforcement learning or other means in combination with a predicted value of a control parameter, and the value of the control parameter may be set within a control target zone. For example, a corresponding control strategy is generated for the furnace temperature by regulating and controlling the grate speed, the primary air supply amount, the secondary air supply amount, the fan pressure, the grate temperature and the furnace temperature at the current moment of the incinerator, so that the furnace temperature is kept at 850-950 degrees.
According to the garbage incineration self-adaptive combustion parameter control method and device, garbage incineration data related to the control parameters are decomposed into the Gaussian distribution components, the sub-data set of each Gaussian distribution component is determined based on the acquired corresponding data probability, the relevance among the data can be kept, then the sub-data set is subjected to feature extraction through the feature extraction model, the network parameters in the feature extraction model are updated when the error result is detected to be larger than the error threshold value, and the predicted value of the control parameters is output, so that the control strategy of the control parameters is determined by means of the predicted value, the complexity of data processing is reduced without depending on a plurality of independent models with strong coupling, the network parameter self-adaptive configuration of the model is realized, the accuracy of model processing is improved, and therefore the regulation precision and regulation efficiency are improved.
In some embodiments, the decomposing the garbage incineration data into a plurality of gaussian distribution components and obtaining corresponding data probabilities, determining a sub-data set of each of the gaussian distribution components based on the data probabilities, comprises:
Constructing the garbage incineration data into a data matrix;
Inputting the data matrix into a Gaussian mixture model containing a plurality of Gaussian distribution components, distributing each element in the data matrix into the Gaussian distribution components, and outputting corresponding data probability, wherein the number of the Gaussian distribution components is consistent with the number of data categories of the garbage incineration data;
based on the elements and the corresponding data probabilities, a sub-dataset is constructed for each of the gaussian distribution components.
In this embodiment, since the garbage incineration data may include a plurality of indexes and parameters, the value length (i.e., data length) of the indexes/parameters is set to L, a data matrix is constructed according to the data length and the number of indexes/parameters in the garbage incineration data, such as the indexes/parameters including the grate speed, the first air supply amount, the second air supply amount, the fan pressure, the grate temperature of the incinerator and the current furnace temperature of the incinerator, for example, 6, to generateIs a matrix M of (a).
The matrix is converted into a linear combination of a plurality of gaussian distributions (i.e., gaussian distribution components) by using a gaussian mixture model, expressed asWherein, the method comprises the steps of, wherein,A gaussian mixture model is represented for the matrix M,Representing the number of gaussian distribution components; a mixing coefficient representing the kth Gaussian distribution component, the mixing coefficient reflecting the weight of the corresponding Gaussian distribution component in the overall model, satisfying And is also provided with;A probability density function representing the kth gaussian distribution component.
In this embodiment, the number of gaussian distribution components corresponds to the number of data categories of the waste incineration data. For example, the garbage incineration data includes 6 data types including grate speed, primary air supply amount, secondary air supply amount, fan pressure, grate temperature, and current furnace temperature, and the number of gaussian distribution components is 6.
More specifically, for the probability density function of the kth gaussian distribution component, the expression is as follows:
;
Wherein, Representing the matrix average of the kth gaussian distribution component,A row covariance matrix representing the kth gaussian distribution component,A column variance matrix representing the kth gaussian distribution component,Representing the trace of the matrix.
Further, each element in the data matrix is substituted into each Gaussian distribution component in sequence, so that probability values of different elements meeting different Gaussian distributions are obtained, and corresponding data probabilities are obtained. Then, each element is multiplied by a corresponding data probability (e.g., a probability value that the element satisfies a certain gaussian distribution component) to obtain a sub-data set of the gaussian distribution component. Wherein the sub-data sets are in one-to-one correspondence with the gaussian distribution components, i.e. the number of sub-data sets corresponds to the number of gaussian distribution components.
Therefore, in this embodiment, by distributing the elements in the data matrix to the gaussian distribution components, and determining the sub-data set of each gaussian distribution component based on the elements and the corresponding data probabilities, the decomposition of the data is implemented, and at the same time, the relevance between the data can be maintained.
In some embodiments, the feature extraction model comprises a first feature extraction model and a second feature extraction model sequentially arranged in sequence, wherein the first feature extraction model comprises a plurality of multi-layer perceptrons;
The feature extraction of the sub-data set by using a pre-established feature extraction model, and updating network parameters in the feature extraction model when an error result is detected to be greater than a preset error threshold value, so as to output a predicted value of the control parameter, including:
inputting the sub-data sets into the first feature extraction model, performing feature extraction on the corresponding sub-data sets through the multi-layer perceptron, and taking the data feature values output by the multi-layer perceptron as the output of the first feature extraction model;
Vertically splicing the data characteristic values to obtain fusion characteristics;
and extracting the time characteristics of the fusion characteristics through the second characteristic extraction model so as to output the predicted value of the control parameter.
In this embodiment, the first feature extraction module is used to extract data features, and the second feature module is used to extract temporal features. It should be noted that each multi-layer perceptron configured in the first feature extraction model corresponds to one sub-data set, i.e., the number of multi-layer perceptrons corresponds to the number of sub-data sets (i.e., the number of gaussian distribution components). In this case, the sub-data sets are input to the first feature extraction model, that is, each sub-data set is input to the corresponding multi-layer perceptron to perform feature extraction, and the data feature values of the respective sub-data sets by the multi-layer perceptron are used as the output of the first feature extraction model.
It should be noted that, in this embodiment, a plurality of multi-layer perceptrons are configured to process the respective sub-data sets, but the multi-layer perceptrons are not independent of each other. As the output of the hidden layer of any multi-layer perceptron is used as the data characteristic value output by the multi-layer perceptron, the data characteristic value output by other multi-layer perceptrons is used as the output of the first characteristic extraction model, the output is processed through the second characteristic extraction model, the predicted value is output, the error result is determined through the difference between the appointed label and the predicted value, the predicted value is determined by the output of a plurality of multi-layer perceptrons, and the network parameters of the multi-layer perceptrons are triggered and updated by judging whether the error result is larger than an error threshold value, so that the parameter configuration of the self-adaptive adjustment model is realized. Therefore, the embodiment overcomes the defect that the existing deep learning method needs to rely on a plurality of strong coupling independent models through sharing loss (namely error results) and self-adaptive adjustment model parameter configuration, reduces the error and data processing complexity of the models, and improves the accuracy and efficiency of the models at the same time, so that the regulation and control precision and efficiency of parameters can be effectively improved.
And further, vertically splicing the data characteristic values of the multiple layers of perceptrons output by the first characteristic extraction model to obtain fusion characteristics. And carrying out feature extraction on the fusion features through a second feature extraction model to obtain time features in the data, thereby obtaining a predicted value for the control parameters.
Illustratively, the second feature extraction model may be LSTM (Long Short-Term Memory network), which is a special Recurrent Neural Network (RNN) for solving the gradient disappearance and gradient explosion problems encountered by the conventional RNN when dealing with Long-Term dependencies.
Therefore, in the embodiment, the first feature extraction model with a plurality of multi-layer perceptrons is configured to perform feature extraction on the sub-data set, so that the problem that the existing garbage incineration control technology needs to rely on a plurality of independent models with strong coupling is solved, sharing loss and model network parameter self-adaptive configuration are realized, and the accuracy and efficiency of model processing are improved. And then, extracting time features by feature stitching and configuring a second feature extraction model, capturing time dependency relationship between data, and improving the accuracy of prediction, so that the regulation precision and the regulation efficiency are improved.
In some embodiments, the multi-layer perceptron includes a plurality of hidden layers, fig. 2 is a flow chart of network parameter updating provided in the embodiment of the application, and the embodiment provides a method for updating network parameters, which includes steps S201 to S204.
S201, aiming at any multi-layer perceptron, detecting whether an error result of the multi-layer perceptron is larger than a preset error threshold value, wherein the error result is used for indicating the difference between the appointed label and a predicted value determined by taking the output of a current hidden layer as a data characteristic value currently output by the multi-layer perceptron;
S202, when the error result is detected to be larger than the error threshold value and the number of nodes of the nodes in each hidden layer contained in the multi-layer perceptron is smaller than the node number threshold value, adding new nodes in the hidden layer and enabling the added nodes to meet constraint conditions, wherein the node number threshold value is determined by the data length of the garbage incineration data, and the constraint conditions are determined by the error result before adding the nodes, the output of the last node in the hidden layer before adding the nodes, the output of the added nodes and the error result after adding the nodes;
s203, updating the output weight of each node in the hidden layer based on the appointed label, the output weight of each node before the node is added and the output of each node after the node is added;
S204, updating the data characteristic value output by the multi-layer perceptron based on the updated output weight.
In this embodiment, the multi-layer perceptron includes N hidden layers, where N is greater than or equal to 1, each hidden layer includes a plurality of nodes, and the number of nodes is the maximum data length L. Specifically, if the number of nodes in the hidden layer before the node is increased to beThe hidden layer performs the following operation, namely, the output of the hidden layer is determined by the following formula (1):
(1)
Wherein, For the sub-data set to be used,Representing a set of output weights for each node in the hidden layer before the node is added,Indicating that the number of nodes in the hidden layer before the node is increased,For the length of the data to be described,Indicating that the output weight of the j-th node in the hidden layer before the node is increased,Representing the activation function of the j-th node,Representing the input weight of the j-th node,Representing the bias of the jth node.
Then, detecting whether an error result of the multi-layer perceptron is greater than an error threshold, wherein the error result is determined by the following formula (2):
(2)
Wherein, Representing the error result before adding the node,For the purpose of the said assigned tag(s),And representing the predicted value determined by taking the output of the current hidden layer as the data characteristic value currently output by the multi-layer perceptron.
It should be noted that, the output of the hidden layer in any multi-layer perceptron replaces the data feature value of the multi-layer perceptron output by the first feature extraction model, the output of the first feature extraction model is input into the second feature extraction model, the predicted value of the control parameter is output, and then the predicted value and the specified label determine the residual error (i.e. error result).
Exemplary, output of hidden layer in 1 st multilayer perceptronOutput as a first feature extraction modelOne component of (a), i.eThereby outputting the first feature extraction modelAnd obtaining a predicted value through the second feature extraction model, and determining an error result by the predicted value and the specified label.
Further, when the residual error is greater than an error threshold and the number of current nodes of the hidden layer of the multi-layer perceptron does not reach a maximum value L, new nodes are added in the hidden layer according to constraint conditions.
Specifically, the constraint condition is determined by the following formula (3):
(3)
Wherein, Representing the error result for the sub-data set before adding a node,Represent the firstThe output of the individual node (i.e. the output of the last node in the hidden layer before the added node),Representing the output of the added node,Representing the error result after adding the node.
Further, the output weight of each node in the hidden layer is updated based on the specified label, the output weight of each node before the node is added, and the output of each node after the node is added. Specifically, the output weight of each node in the hidden layer is updated through the following formula (4):
(4)
Wherein, Representing a set of output weights for each node in the hidden layer after adding the node,Representing the output of each node in the hidden layer after the node is added; is a preset F-norm. Wherein, Representing a pseudo-inverse operation on the matrix,Representing a set of output weights for each node in the hidden layer after adding the node,Output weights representing nodes added in the hidden layer; representing the value of the argument that minimizes the function, i.e. for the argument Make the followingThe function taking the minimum valueValues.
Further, based on the updated output weight, the data characteristic value output by the multi-layer perceptron is updated.
Therefore, in this embodiment, by detecting whether the error result exceeds the error threshold, the node of the hidden layer in the multi-layer perceptron is triggered and increased, and the output weight of the node in the hidden layer and the data characteristic value output by the multi-layer perceptron are updated, so that multiple strongly coupled independent models are not needed, adaptability and accuracy of the model are improved, the model is applicable to multiple complex and real-time changing data environments, and accuracy and efficiency of data processing are improved.
In some embodiments, the monitoring period is used for indicating a time interval formed by the current moment and a designated time period before, and the determining the control strategy of the control parameter based on the predicted value comprises the following steps:
obtaining an observation value of the control parameter at the current moment;
Determining a reinforcement learning evaluation value based on the garbage incineration data in the current monitoring period and a control reward value obtained after responding to the regulation action of the garbage incineration data, wherein the control reward value comprises a reward value corresponding to the control parameter when the value of the control parameter at the next moment falls within a control target interval of the control parameter after the regulation action is executed, and the reinforcement learning evaluation value is used for evaluating a desired cumulative reward obtained by the executed regulation action;
A control strategy for the control parameter is determined based on a regulatory action that maximizes the reinforcement learning assessment value.
In the present embodiment, the control strategy may be determined by means of reinforcement learning. Specifically, reinforcement learning is performed by the following formula:
;
Wherein, Representation ofThe state of the moment, in particular the control parameter, beingTime observations, e.g. furnace temperature at time t;Representation ofA control action performed at a moment, the control action being used to characterize the control of the waste incineration data in relation to the control parameter, e.g.Wherein, the method comprises the steps of, wherein,The grate speed of the incinerator is indicated at time t,The first air supply amount at the time t is shown,The secondary air supply quantity at the time t is shown,The pressure of the fan at the moment t is expressed,And the temperature of the fire grate at the time t is shown.In order for the decay rate to be a decay rate,In order for the rate of learning to be high,For a bonus function (for determining a control bonus value), e.g. the bonus function may be when the furnace temperatureWhen the value of 850-950 DEG is satisfied, the rewarding value is 1, and the rest rewarding values are 0.Representing reinforcement learning assessment values, i.e. atThe time of day status and the total prize value expected to be achieved under the action.
Then, each regulation action is traversed, the regulation action with the maximized reinforcement learning evaluation value is selected, the control strategy corresponding to the regulation action is determined, and the control strategy enables the control parameter to be kept in the control target interval.
Therefore, in the embodiment, reinforcement learning is performed through the observed value of the control parameter at the current moment and the current garbage incineration data, and the regulation action for maximizing the reinforcement learning estimated value is selected to generate a corresponding control strategy for indicating and regulating the garbage incineration data, so that the effective balance of regulation efficiency and regulation accuracy is realized, reasonable scheduling is realized, the regulation efficiency and accuracy are effectively improved, and the regulation stability is maintained.
In some embodiments, the determining the control strategy of the control parameter based on the predicted value further includes:
Processing a sub-data set determined by the garbage incineration data obtained after the regulation and control actions are executed through the feature extraction model to obtain a predicted value of the control parameter;
And iteratively determining the regulation and control action which maximizes the reinforcement learning evaluation value according to the predicted value, the garbage incineration data obtained after the regulation and control action is executed and the corresponding control reward value, and taking the regulation and control action as a control strategy of the control parameter.
In the present embodiment, the control operation at the next time is outputted based on the reinforcement learningThat is, the grate speed at time t+1, the primary air supply, the secondary air supply, the fan pressure, the grate temperature, that is. In order to make the value of the control parameter at the next time be within the control target section, the following will be performedAs a means ofOperation of time of day (trueThe operation at the moment is set toSequentially inputting the time-wise data into a Gaussian mixture model and a feature extraction model for processing, and outputting a prediction resultTaking this as the state at the next time. Then, the prediction result is subjected to reinforcement learningIterating so thatMaximizing the value of (a) the resulting regulatory actionThe method is a regulation and control mode of incinerator parameters, namely an output control strategy.
Therefore, the embodiment predicts by combining the gaussian mixture model, the feature extraction model and the reinforcement learning result, and iterates the prediction result through reinforcement learning, so that the regulation and control action which maximizes the reinforcement learning evaluation value is screened, the accuracy and the efficiency of the regulation and control action are improved as a control strategy, the numerical value of the control parameter is ensured to fall in the control target interval, and the reliability and the accuracy of regulation and control are improved.
Fig. 3 is a schematic structural diagram of a device for controlling adaptive combustion parameters of garbage incineration according to an embodiment of the present application, where the device 300 for controlling adaptive combustion parameters of garbage incineration includes:
A data acquisition module 301, configured to acquire garbage incineration data related to the control parameter in a specified monitoring period;
A data decomposition module 302, configured to decompose the garbage incineration data into a plurality of gaussian distribution components, and obtain corresponding data probabilities, and determine a sub-data set of each gaussian distribution component based on the data probabilities, where the data probabilities are used to indicate probability values that any garbage incineration data satisfies the gaussian distribution components;
a prediction module 303, configured to perform feature extraction on the sub-data set by using a pre-established feature extraction model, and update a network parameter in the feature extraction model when an error result is detected to be greater than a preset error threshold value, so as to output a predicted value of the control parameter, where the error result is used to characterize an error between a specified label and an output result of the feature extraction model;
The policy generation module 304 is configured to determine a control policy of the control parameter based on the predicted value, where the control policy is used to instruct to regulate current garbage incineration data.
In some embodiments, the data decomposition module 302 includes:
the matrix construction unit is used for constructing the garbage incineration data into a data matrix;
the distribution unit is used for inputting the data matrix into a Gaussian mixture model containing a plurality of Gaussian distribution components, distributing each element in the data matrix into the Gaussian distribution components, and outputting corresponding data probability, wherein the number of the Gaussian distribution components is consistent with the number of data categories of the garbage incineration data;
And a sub-data set construction unit, configured to construct a sub-data set of each gaussian distribution component based on the element and the corresponding data probability.
In some embodiments, the feature extraction model comprises a first feature extraction model and a second feature extraction model sequentially arranged in sequence, wherein the first feature extraction model comprises a plurality of multi-layer perceptrons;
the prediction module 303 includes:
the first feature extraction unit is used for inputting the sub-data sets into the first feature extraction model, carrying out feature extraction on the corresponding sub-data sets through the multi-layer perceptron, and taking the data feature values output by the multi-layer perceptron as the output of the first feature extraction model;
The splicing unit is used for vertically splicing the data characteristic values to obtain fusion characteristics;
and the second feature extraction unit is used for extracting the time features of the fusion features through the second feature extraction model so as to output the predicted value of the control parameter.
In some embodiments, the multi-layered perceptron comprises a plurality of hidden layers;
The prediction module 303 further includes:
The detection unit is used for detecting whether an error result of the multi-layer perceptron is larger than a preset error threshold value or not according to any multi-layer perceptron, wherein the error result is used for indicating the difference between the appointed label and a predicted value determined by taking the output of a current hidden layer as a data characteristic value currently output by the multi-layer perceptron;
A node adding unit, configured to add new nodes in each hidden layer when the error result is detected to be greater than the error threshold and the number of nodes in each hidden layer included in the multi-layer perceptron is less than the node number threshold, and make the added nodes meet a constraint condition, where the node number threshold is determined by a data length of the msw incineration data, and the constraint condition is determined by an error result before adding a node, an output of a last node in the hidden layer before adding the node, an output of the added node, and an error result after adding the node;
The network parameter updating unit is used for updating the output weight of each node in the hidden layer based on the designated label, the output weight of each node before the node is added and the output of each node after the node is added;
And the output updating unit is used for updating the data characteristic value output by the multi-layer perceptron based on the updated output weight.
In some embodiments, the prediction module 303 further comprises:
A data eigenvalue calculation unit for determining the output of the hidden layer by the following formula (1):
(1)
Wherein, For the sub-data set to be used,Representing a set of output weights for each node in the hidden layer before the node is added,Indicating that the number of nodes in the hidden layer before the node is increased,For the length of the data to be described,Indicating that the output weight of the j-th node in the hidden layer before the node is increased,Representing the activation function of the j-th node,Representing the input weight of the j-th node,Representing the bias of the jth node;
an error result calculation unit for determining the error result by the following formula (2):
(2)
Wherein, Representing the error result before adding the node,For the purpose of the said assigned tag(s),Representing a predicted value determined by taking the output of the current hidden layer as the current output data characteristic value of the multi-layer perceptron;
A constraint condition calculation unit for determining the constraint condition by the following formula (3):
(3)
Wherein, Representing the error result for the sub-data set before adding a node,Represent the firstThe output of the individual nodes is provided with,Representing the output of the added node,Representing an error result after adding the node;
The output weight calculating unit is used for updating the output weight of each node in the hidden layer through the following formula (4):
(4)
Wherein, Representing a set of output weights for each node in the hidden layer after adding the node,Representing the output of each node in the hidden layer after the node is added; is a preset F norm; representing a pseudo-inverse operation; The value of the argument that minimizes the function is represented.
In some embodiments, policy generation module 304 includes:
An observation value obtaining unit, configured to obtain an observation value of the control parameter at a current time;
A reinforcement learning unit, configured to determine a reinforcement learning evaluation value based on the garbage incineration data in the current monitoring period and a control reward value obtained after a regulation action for responding to the garbage incineration data, where the control reward value includes a reward value corresponding to when a value of the control parameter for a next time after the regulation action is performed falls within a control target interval of the control parameter, and the reinforcement learning evaluation value is used to evaluate a desired cumulative reward obtained by the performed regulation action;
and a strategy determining unit for determining a control strategy of the control parameter based on the regulation action maximizing the reinforcement learning evaluation value.
In some embodiments, policy generation module 304 further includes:
the model processing unit is used for processing a sub-data set determined by the garbage incineration data obtained after the regulation and control actions are executed through the feature extraction model to obtain a predicted value of the control parameter;
And the iteration unit is used for determining the regulation and control action which maximizes the reinforcement learning evaluation value as a control strategy of the control parameter according to the predicted value, the garbage incineration data obtained after the regulation and control action is executed and the corresponding control rewarding value in an iterated manner.
In some embodiments, the control parameters include a furnace temperature of the incinerator, and the waste incineration data includes a grate speed, a primary air supply, a secondary air supply, a fan pressure, a grate temperature, and a current time of the furnace temperature of the incinerator.
The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.