Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides an intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control, improves energy utilization efficiency, reduces unnecessary energy loss, enhances the stability and reliability of the whole heat supply system, ensures heat supply quality and optimizes energy distribution.
In order to solve the technical problems, the invention provides the following technical scheme:
An intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control comprises the following steps: S1, collecting outdoor environment data and heat load data of each heating power station, and preprocessing the data;
s2, training a deep reinforcement learning model based on the outdoor environment data and the heat load data;
S3, formulating a heating strategy based on the deep reinforcement learning model;
and S4, executing the heating strategy through an adaptive control algorithm.
As a preferable scheme of the intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control, the outdoor environment data comprises temperature, humidity, wind speed and solar radiation intensity;
The thermal load data represents the amount of heat that the thermal station needs to provide to maintain the indoor temperature of all users to the heating standard temperature.
The preprocessing comprises missing value filling, abnormal value detection and replacement, data normalization and data synchronous integration, wherein the method for data synchronous integration is to correspond heat load data and outdoor environment data according to time, so as to form a time sequence corresponding to the heat load data and the outdoor environment data.
As an optimal scheme of the intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control, the method for training the deep reinforcement learning model comprises the following steps:
S100, setting a time period for heat supply regulation, constructing a state space, an action set and a reward function, constructing a strategy network and initializing parameters;
S200, selecting an action from the action collection set and executing the action;
S300, calculating a cumulative rewards value and updating parameters of a strategy network;
S400, entering the next time period, generating a new action set and updating the feature vector of the state space;
S500, repeating the steps S200-S400 until the accumulated rewards value converges to complete the training of the strategy network, storing the strategy network and deploying and applying the deep reinforcement learning model.
As a preferable scheme of the intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control, the time period is the minimum time unit for heat supply regulation and control;
the state space is composed of the outdoor environment data and heat load data;
the action set is dynamically generated based on the state space and the time period, and the method comprises the following steps:
For any time period A, extracting heat load data of each heating power station in a time period corresponding to the time period A in each of the previous n years, wherein n is a positive integer, averaging n heat load data of the ith heating power station, namely E Ai, wherein the value range of i is 1,2, m and m represent the number of the heating power stations, setting an arithmetic sequence containing k elements, namely Q, Q= { Q 1,q2,......,qk }, wherein Q 1 is the smallest element in the sequence Q, the value range is (0, 1), Q k is the largest element in the sequence Q, and the value range is (1, 2), and for the time period A, the form of an action set A a is as follows:
Aa={a1,a2,......,ak};
the specific form of each action is as follows:
aj={qj·EA1,qj·EA2,......,qj·EAm};
Where a j represents a j-th action in the action set, and j has a value range of 1, 2.
As an optimal scheme of the intelligent heat supply regulation method based on deep reinforcement learning and self-adaptive control, the calculation formula of the reward function is as follows:
Wherein s j represents a feature vector of a state space at a jth time period, a j represents an action performed at the jth time period, R (s j,aj) represents a bonus function value of the action a j performed at the time when the feature vector of the state space is s j, α is a weight parameter, β is a proportionality coefficient, and E ij represents a heat load supply amount allocated to the ith heat station at the jth time period; e j represents the total heat generation amount of the thermal power plant in the jth time period, and the calculation formula is as follows:
Where η i denotes the heat loss coefficient when the thermal power plant delivers a heat load to the i-th thermal power station.
The intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control is characterized in that a characteristic vector of a state space consists of historical outdoor environment data, real-time outdoor environment data and historical heat load data of each heating power station, the updating method is that the current entering time period is set as B, the heat load data and the outdoor environment data of each heating power station in a time period corresponding to the time period B in each year of the previous n years are extracted, the real-time outdoor environment data are obtained, and the characteristic vector of the state space corresponding to the time period B is formed.
The intelligent heat supply regulation and control method based on the deep reinforcement learning and the self-adaptive control is characterized in that when the time period of each heat supply regulation and control starts, real-time outdoor environment parameters are collected and input into a deep reinforcement learning model, an action set is automatically generated by the model and the action with the highest selection probability is output, the heat load supply quantity of a heat power plant to each heat station in the current time period of the heat supply regulation and control is determined based on the action with the highest selection probability, and the total heat production quantity of the heat power plant in the current time period of the heat supply regulation and control is calculated.
The intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control is characterized in that the heat supply strategy is implemented by specifically comprising the steps of self-adaptively controlling the total heat generation amount of a thermal power plant and the heat load supply amount of the self-adaptively controlling the thermal power plant to each thermal power plant, wherein the method for self-adaptively controlling the total heat generation amount of the thermal power plant comprises the following steps:
Collecting fuel consumption in real time;
A PID controller is designed for the thermal power plant, and an error term e 0 (t) is calculated, and the formula is as follows:
e0(t)=Ej-E(t);
wherein E (t) represents the actual heat generation amount at the current moment, and the calculation formula is as follows:
E(t)=η0·m(t)·Hv;
Wherein η 0 denotes a boiler heat efficiency, m (t) denotes a fuel mass which has been currently consumed, H v denotes a heating value of the fuel;
Calculating a control output u 0 (t) at the current moment through a PID algorithm, wherein u 0 (t) represents the regulated supply quantity of the fuel;
Based on u 0 (t), adjusting the fuel supply by the hardware device;
the actual heat production is continuously monitored, and the fuel supply is adjusted until the actual heat production reaches the total heat production.
As an optimal scheme of the intelligent heat supply regulation method based on deep reinforcement learning and self-adaptive control, the method for self-adaptively controlling the heat load supply quantity of the thermal power plant to each heat power station comprises the following steps:
Collecting the primary water supply temperature, the primary backwater temperature and the primary water supply instantaneous flow of each heating power station in real time;
The PID controller is designed for each heating station, and the error term e i (t) of the PID controller of the ith heating station is calculated as follows:
ei(t)=Eij-Ei(t);
Wherein E i (t) represents the heat load value actually obtained by the ith heating station, and the calculation formula is as follows:
Ei(t)=c·(Tri-Tsi)·Qi(t);
Wherein c is the specific heat capacity of water, T ri is the primary backwater temperature of the ith heating station, T si is the primary water supply temperature of the ith heating station, Q i (T) is the total primary water supply amount of the ith heating station from the current time period to the current time, and the total primary water supply amount is obtained by integrating the instantaneous primary water supply flow with time;
Calculating a control output u i (t) of each heating power station at the current moment through a PID algorithm, wherein u i (t) represents the opening degree of a branch valve of the ith heating power station and the frequency of a circulating pump;
And continuously monitoring the heat load value actually acquired by each heating power station, and adjusting the opening degree of the branch valve and the frequency of the circulating pump until the heat load value actually acquired by each heating power station reaches the distributed heat load supply quantity.
Compared with the prior art, the invention has the following beneficial effects:
The deep reinforcement learning is applied to a heat supply regulation scene, so that the transition from passive on-demand heat supply to active prediction and optimization of a heat supply strategy is realized, and the intelligent level of a heat supply system is remarkably improved. The method comprises the steps of finishing macroscopic heat supply strategy planning through deep reinforcement learning, considering complex factors such as seasonality, weather change, predicted user demands and the like, rapidly responding according to the current actual conditions through self-adaptive control, overcoming the limitation of a single technology in heat supply regulation and control, reducing the complexity of a deep reinforcement learning model, relieving the burden of the deep reinforcement learning model and accelerating the convergence speed of a network, and organically combining the two, thereby not only meeting the long-term stable and efficient operation of a heat supply system, but also taking into account the short-term flexible regulation and control demands.
The heat supply strategy based on deep learning can be more accurately matched with the actual heat demand, and the situation of excessive heat supply or insufficient heat supply is avoided, so that the energy utilization efficiency is improved, the unnecessary energy loss is reduced, and the energy conservation and emission reduction are facilitated. The self-adaptive control algorithm is used for executing the heat supply strategy, the total heat generation amount of the thermal power plant and the heat load supply amount received by each heat station can be dynamically adjusted according to actual conditions, the stability and reliability of the whole heat supply system are enhanced, the heat supply quality is ensured, and meanwhile, the energy distribution is optimized.
Detailed Description
The following detailed description of the present invention is made with reference to the accompanying drawings and specific embodiments, and it is to be understood that the specific features of the embodiments and the embodiments of the present invention are detailed description of the technical solutions of the present invention, and not limited to the technical solutions of the present invention, and that the embodiments and the technical features of the embodiments of the present invention may be combined with each other without conflict.
Example 1
This embodiment describes an intelligent heat supply regulation and control method based on deep reinforcement learning and adaptive control, and referring to fig. 1, the method includes the following steps:
An intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control comprises the following steps:
s1, collecting outdoor environment data and heat load data of each heating power station, and preprocessing the data;
The outdoor environment data comprises temperature, humidity, wind speed and solar radiation intensity, and the outdoor environment data can be used for predicting the heat load requirement of each heating station. Temperature is one of the most significant factors in the heat load demand. Lower outdoor temperatures may result in increased heating heat load. In heating season, the high humidity can make the human body feel cooler, and the demand of people for heating is increased. The wind power influences the convection heat exchange of the outer surface of the building, and the heat dissipation process of the building shell can be accelerated due to the fact that the wind speed is increased, so that the heat loss is increased. Solar radiation has direct influence on the heat gain of a building, especially for a building with a large-area glass curtain wall or a south window, the indoor temperature can be obviously improved by sunlight irradiation, and the heating requirement is reduced.
The thermal load data represents the amount of heat that the thermal station needs to provide to maintain the indoor temperature of all users to the heating standard temperature.
The preprocessing comprises missing value filling, abnormal value detection and replacement, data normalization and data synchronous integration;
the method for synchronously integrating the data is to correspond the heat load data and the outdoor environment data according to time to form a time sequence corresponding to the heat load data and the outdoor environment data.
The method for filling the missing values is one of linear interpolation, polynomial interpolation and prediction based on a time sequence model;
The method of data normalization is one of minimum-maximum normalization and Z-score normalization, and because outdoor environment parameters and heat load data tend to have different dimensions and numerical ranges, the parameters and the heat load data need to be converted to the same dimension so that a deep reinforcement learning model can process all input features. After the preprocessing step is completed, the collected data can be used as effective input of a deep reinforcement learning model, so that an intelligent model capable of adjusting a heating strategy according to real-time environmental conditions is constructed.
S2, training a deep reinforcement learning model based on the outdoor environment data and the heat load data, wherein referring to FIG. 2, the method comprises the following steps:
S100, setting a time period for heat supply regulation, constructing a state space, an action set and a reward function, constructing a strategy network and initializing parameters;
The time period is the minimum time unit for heat supply regulation and control, for example, one day is set as one time period;
the state space is composed of the outdoor environment data and heat load data;
the action set is dynamically generated based on the state space and the time period, and the method comprises the following steps:
For any time period A, extracting heat load data of each heating power station in a time period corresponding to the time period A in each of the previous n years, wherein n is a positive integer, averaging n heat load data of the ith heating power station, namely E Ai, wherein the value range of i is 1,2, m and m represent the number of the heating power stations, setting an arithmetic sequence containing k elements, namely Q, Q= { Q 1,q2,......,qk }, wherein Q 1 is the smallest element in the sequence Q, the value range is (0, 1), Q k is the largest element in the sequence Q, and the value range is (1, 2), and for the time period A, the form of an action set A a is as follows:
Aa={a1,a2,......,ak};
the specific form of each action is as follows:
aj={qj·EA1,qj·EA2,......,qj·EAm};
Wherein a j represents the j-th action in the action set, and the value range of j is 1, 2. Q j represents a j-th element in the arithmetic array Q;
The scheme is used for setting the action based on the historical heat load requirement of each heating station as a reference, for example, the optional action of the ith heating station is set to be 0.8 times, 0.9 times, 1.0 times, 1.1 times and 1.2 times of the historical average heat load requirement, and the strategy network optimizes the action based on the historical data setting through the comparison of the real-time outdoor environment parameters and the historical outdoor environment parameters, so that the training speed of the deep reinforcement learning model is improved.
The calculation formula of the reward function is as follows:
Wherein s j represents a feature vector of a state space in a jth time period, a j represents an action performed in the jth time period, R (s j,aj) represents a bonus function value of the action a j performed when the feature vector of the state space is s j, alpha is a weight parameter, beta is a proportionality coefficient, which is set by a person skilled in the art according to actual requirements, E ij represents a heat load supply amount allocated to an ith heat station in the jth time period, and the heat load supply amount is determined by the action selected to be performed in the jth time period; e j represents the total heat generation amount of the thermal power plant in the jth time period, and the calculation formula is as follows based on the heat load supply amount of each thermal power station:
Wherein η i represents the heat loss coefficient of the thermal power plant when delivering heat load to the ith thermal power station, which is experimentally determined by one skilled in the art;
the rewarding function gives consideration to the balance between the total heat production of the thermal power plant and the supply and demand of each thermal power station, saves the total energy consumption, and maximally balances the uniformity degree of the heat load supply quantity of each thermal power station.
The strategy network comprises an input layer, a hiding layer and an output layer, wherein the input layer is used for inputting characteristic vectors of a state space, the hiding layer is used for further extracting characteristics of the state space, and the output layer is used for generating selection probability of each action in the action aggregation set under the current state space. Converting the output into a probability distribution using a softmax function to ensure that the sum of the selection probabilities of all actions is 1;
S200, selecting an action from the action collection set and executing the action;
The method for selecting an action is as follows:
inputting the feature vector of the current state space into the strategy network to obtain the selection probability of each action in the action aggregation set under the current state space;
setting a threshold parameter epsilon, wherein the value range is (0,0.15);
Generating a random number r, wherein the value range is [0,1], if r is more than or equal to epsilon, executing the action with highest selection probability, and if r is less than epsilon, randomly selecting one action from the action collection set and executing the action;
S300, calculating a cumulative rewards value and updating parameters of a strategy network;
the cumulative prize value is calculated as follows:
Wherein R N represents the current cumulative prize value, N represents the number of actions that have been performed, β represents the discount factor, β j represents the power of the discount factor β to the power of j, R (s j,aj) represents the prize value for performing action a j in state space s j, and j has a range of values of 1, 2.
The calculation formula for updating the parameters of the policy network is as follows:
wherein δ represents any one parameter in the policy network; The gradient of the function pair delta in brackets is represented, eta is the learning rate, L N is the objective function, and the calculation formula is as follows:
LN=ln(p(sN,aN)·RN);
Where p (s N,aN) represents the probability of selection of action a N in the environmental state s N.
S400, entering the next time period, generating a new action set and updating the feature vector of the state space;
The characteristic vector of the state space consists of historical outdoor environment data, real-time outdoor environment data and historical heat load data of each heating power station, and the characteristic vector updating method of the state space comprises the steps of enabling the current entering time period to be B, extracting the heat load data and the outdoor environment data of each heating power station in the time period corresponding to the time period B in each year of the previous n years, acquiring the real-time outdoor environment data and forming the characteristic vector of the state space corresponding to the time period B.
In the training stage of the deep reinforcement learning model, the real-time outdoor environment data is obtained by actually reading outdoor environment data in a corresponding time period in a state space, and after model training is completed, the real-time outdoor environment data is collected in real time through a sensor and other detection equipment in the actual deployment and application process.
S500, repeating the steps S200-S400 until the accumulated rewards value converges to complete the training of the strategy network, storing the strategy network and deploying and applying the deep reinforcement learning model.
After multiple iterations, the jackpot value tends to stabilize and no significant fluctuations occur, i.e., the jackpot value is considered to converge, and the policy network has been able to make decisions to maximize the jackpot value.
S3, formulating a heating strategy based on the deep reinforcement learning model, wherein the method comprises the following steps:
And determining the heat load supply quantity of the heat power plant to each heat station in the current heat supply regulation time period based on the action with the highest selection probability, and calculating the total heat production quantity of the heat power plant in the current heat supply regulation time period.
And S4, executing the heating strategy through an adaptive control algorithm. The method comprises the steps of adaptively controlling the total heat generation amount of the thermal power plant and adaptively controlling the heat load supply amount of the thermal power plant to each thermal power plant.
The method for adaptively controlling the total heat generation amount of the thermal power plant comprises the following steps:
Collecting fuel consumption in real time;
A PID controller is designed for the thermal power plant, and an error term e 0 (t) is calculated, and the formula is as follows:
e0(t)=Ej-E(t);
wherein E (t) represents the actual heat generation amount at the current moment, and the calculation formula is as follows:
E(t)=η0·m(t)·Hv;
Wherein η 0 denotes the boiler thermal efficiency, the efficiency of converting the heat energy released by the combustion of the fuel into the heat of the feed water for the boiler, m (t) denotes the mass of fuel currently consumed, H v denotes the heating value of the fuel, the heat released when the fuel is completely combusted for a unit mass is generally given in the product manual of the fuel;
Calculating a control output u 0 (t) at the current moment through a PID algorithm, wherein u 0 (t) represents the regulated supply quantity of the fuel;
Mapping the control output signal u 0 (t) into the adjustment range of the actual fuel supply quantity, and driving a fuel valve by adjusting a combustion controller or a servo motor to realize accurate control of the fuel supply quantity;
Continuously monitoring the actual heat generation amount, and adjusting the fuel supply amount until the actual heat generation amount reaches the total heat generation amount;
The parameters of the PID controller are preliminarily set through on-site debugging and an empirical formula according to the requirements of the response characteristics of the boiler and the stability of the system, and are adjusted regularly.
The method for adaptively controlling the heat load supply amount of the thermal power plant to be transmitted to each thermal power station comprises the following steps:
the method comprises the steps of collecting the primary water supply temperature, the primary backwater temperature and the primary water supply instantaneous flow of each heating station in real time, and acquiring heat supply network data in real time by installing a flowmeter and a thermometer on a main pipe network.
The PID controller is designed for each heating station, and the error term e i (t) of the PID controller of the ith heating station is calculated as follows:
ei(t)=Eij-Ei(t);
Wherein E i (t) represents the heat load value actually obtained by the ith heating station, and the calculation formula is as follows:
Ei(t)=c·(Tri-Tsi)·Qi(t);
Wherein c is the specific heat capacity of water, T ri is the primary backwater temperature of the ith heating station, T si is the primary water supply temperature of the ith heating station, Q i (T) is the total primary water supply amount of the ith heating station from the current time period to the current time, and the total primary water supply amount is obtained by integrating the instantaneous primary water supply flow with time;
Calculating a control output u i (t) of each heating power station at the current moment through a PID algorithm, wherein u i (t) represents the opening degree of a branch valve of the ith heating power station and the frequency of a circulating pump;
The distribution proportion of the primary water supply quantity transmitted by the thermal power station to each thermal power station can be adjusted by adjusting the opening of the branch valve and the frequency of the circulating pump, and then the distribution proportion of the heat load supply quantity of each thermal power station is adjusted.
And continuously monitoring the heat load value actually acquired by each heating power station, and adjusting the opening degree of the branch valve and the frequency of the circulating pump until the heat load value actually acquired by each heating power station reaches the distributed heat load supply quantity.
In deep reinforcement learning, the advantages of adaptive control can help improve the performance and convergence speed of the algorithm. The self-adaptive control can solve part of bottom layer control problems by monitoring and adjusting system parameters in real time, and especially can perform local optimization faster when facing the change of the dynamic characteristics of the system, so as to keep the stability of the system.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are all within the protection of the present invention.