CN118519340B

CN118519340B - Intelligent heating control method based on deep reinforcement learning and adaptive control

Info

Publication number: CN118519340B
Application number: CN202410578917.3A
Authority: CN
Inventors: 路亮亮; 杨雪平; 刘文韬; 于文浩
Original assignee: Shijiazhuang Huadian Heat Supply Group Co ltd
Current assignee: Shijiazhuang Huadian Heat Supply Group Co ltd
Priority date: 2024-05-10
Filing date: 2024-05-10
Publication date: 2025-02-11
Anticipated expiration: 2044-05-10
Also published as: CN118519340A

Abstract

The present invention relates to the technical field of heating network control, and discloses an intelligent heating control method based on deep reinforcement learning and adaptive control, comprising the following steps: S1: collecting outdoor environment data and heat load data of each heating station, and performing data preprocessing; S2: training a deep reinforcement learning model based on the outdoor environment data and heat load data; S3: formulating a heating strategy based on the deep reinforcement learning model; S4: executing the heating strategy through an adaptive control algorithm. The present invention improves energy utilization efficiency, reduces unnecessary energy loss, enhances the stability and reliability of the entire heating system, ensures heating quality, and optimizes energy distribution.

Description

Intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control

Technical Field

The invention relates to the technical field of heat supply pipe network regulation and control, in particular to an intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control.

Background

Along with the acceleration of the urban process and the deep penetration of the environment-friendly and energy-saving concepts, the intelligent and refined management of the heating system becomes the necessary trend of industry development. Especially in northern areas, central heating systems are in charge of guaranteeing the heating demands of residents in winter, and the running efficiency and the energy utilization rate of the central heating systems directly influence the energy consumption structure and the environmental protection level of cities. Modern heating systems have evolved from initially simple temperature control to employing automatic control systems, implementing the basic function of on-demand heating. The use of automation devices such as sensors and controllers enables the heating system to monitor the indoor and outdoor environmental temperature changes in real time and adjust the operating conditions of the heating device accordingly. Meanwhile, an advanced data analysis technology is also applied to the heat supply field, and future heat load demands can be predicted through mining and analyzing historical data, so that certain predictive regulation and control are realized.

However, the conventional heat supply regulation and control modes are mostly based on fixed rules or simplified mathematical models, and it is difficult to sufficiently cope with the complicated and changeable external environment conditions and the change of the heat demands of users. For example, a sudden weather change may cause an increase in the error of the existing prediction model, and thus the heating strategy cannot be timely and accurately adjusted, thereby causing a problem of excessive or insufficient heating. In addition, due to the lack of a dynamic optimization mechanism, the system is often difficult to realize globally optimal energy configuration, and the heat supply efficiency is greatly limited. The current heating strategies mostly depend on statically set thresholds and empirical rules, and do not have the capability of online learning and continuous optimization. In the face of rapidly changing environmental conditions and user behavior habits, such "one-cut" regulation and control methods are prone to waste of heat supply resources or unbalance of supply and demand. Because the regulation and control means are relatively single and not accurate enough, large deviation often exists between the energy consumption and the actual heat demand, and therefore, the operation cost is increased, and the environmental pollution is also aggravated. In summary, although the automation and informatization of the heating industry has advanced, many challenges remain in practical operation.

For example, china patent with the publication number CN110736129B discloses an intelligent balance control system and method for an urban heating network. The system comprises a first adjusting component, a second adjusting component, a plate heat exchanger unit and a control cabinet. Through set up first governing valve on the water supply pipe way once, set up first temperature sensor on the water return pipe way once, set up second governing valve and second temperature sensor on the water supply pipe way of secondary, set up the third governing valve on the bypass pipeline, adjust each valve according to the temperature value, separate the regulation of primary pipe network and the regulation of secondary pipe network, the regulation of primary pipe network relies on the return water temperature of primary pipe network to realize, the regulation of secondary pipe network relies on secondary water supply temperature to realize, when secondary network water supply temperature demand changes, the hydraulic operating mode of primary network is not influenced, the hydraulic operating mode of primary network is stable is guaranteed, the decoupling operation of primary pipe network and secondary pipe network has been realized, the difficult problem of primary pipe network and secondary pipe network hydraulic imbalance has been solved.

The patent with the application publication number of CN108592173A discloses a heat supply pipe network regulation and control method which comprises the following steps of intelligent monitoring information acquisition, corresponding module information, temperature information corresponding to a heat supply pipe network, pipeline information and heat source information in the heat supply pipe network, obtaining the heat supply load of the heat supply pipe network according to the current outdoor temperature information and the building information, and obtaining an actual value corresponding to the heat supply pipe network at the minimum total time consumption according to the heat supply load, the heat dissipation condition, the pipeline information and the heat source information. The invention can solve the disadvantage of the traditional heat supply network and reduce the total energy consumption of the heat supply network.

The problems of the prior art are that the complex and changeable external environment conditions and the change of the heat demand of the user are difficult to fully cope with, the regulation and control means are relatively single and inaccurate, and large deviation exists between the energy consumption and the actual heat demand.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides an intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control, improves energy utilization efficiency, reduces unnecessary energy loss, enhances the stability and reliability of the whole heat supply system, ensures heat supply quality and optimizes energy distribution.

In order to solve the technical problems, the invention provides the following technical scheme:

An intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control comprises the following steps: S1, collecting outdoor environment data and heat load data of each heating power station, and preprocessing the data;

s2, training a deep reinforcement learning model based on the outdoor environment data and the heat load data;

S3, formulating a heating strategy based on the deep reinforcement learning model;

and S4, executing the heating strategy through an adaptive control algorithm.

As a preferable scheme of the intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control, the outdoor environment data comprises temperature, humidity, wind speed and solar radiation intensity;

The thermal load data represents the amount of heat that the thermal station needs to provide to maintain the indoor temperature of all users to the heating standard temperature.

The preprocessing comprises missing value filling, abnormal value detection and replacement, data normalization and data synchronous integration, wherein the method for data synchronous integration is to correspond heat load data and outdoor environment data according to time, so as to form a time sequence corresponding to the heat load data and the outdoor environment data.

As an optimal scheme of the intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control, the method for training the deep reinforcement learning model comprises the following steps:

S100, setting a time period for heat supply regulation, constructing a state space, an action set and a reward function, constructing a strategy network and initializing parameters;

S200, selecting an action from the action collection set and executing the action;

S300, calculating a cumulative rewards value and updating parameters of a strategy network;

S400, entering the next time period, generating a new action set and updating the feature vector of the state space;

S500, repeating the steps S200-S400 until the accumulated rewards value converges to complete the training of the strategy network, storing the strategy network and deploying and applying the deep reinforcement learning model.

As a preferable scheme of the intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control, the time period is the minimum time unit for heat supply regulation and control;

the state space is composed of the outdoor environment data and heat load data;

the action set is dynamically generated based on the state space and the time period, and the method comprises the following steps:

For any time period A, extracting heat load data of each heating power station in a time period corresponding to the time period A in each of the previous n years, wherein n is a positive integer, averaging n heat load data of the ith heating power station, namely E _Ai, wherein the value range of i is 1,2, m and m represent the number of the heating power stations, setting an arithmetic sequence containing k elements, namely Q, Q= { Q ₁,q₂,......,q_k }, wherein Q ₁ is the smallest element in the sequence Q, the value range is (0, 1), Q _k is the largest element in the sequence Q, and the value range is (1, 2), and for the time period A, the form of an action set A _a is as follows:

A_a＝{a₁,a₂,......,a_k};

the specific form of each action is as follows:

a_j＝{q_j·E_A1,q_j·E_A2,......,q_j·E_Am};

Where a _j represents a j-th action in the action set, and j has a value range of 1, 2.

As an optimal scheme of the intelligent heat supply regulation method based on deep reinforcement learning and self-adaptive control, the calculation formula of the reward function is as follows:

Wherein s _j represents a feature vector of a state space at a jth time period, a _j represents an action performed at the jth time period, R (s _j,a_j) represents a bonus function value of the action a _j performed at the time when the feature vector of the state space is s _j, α is a weight parameter, β is a proportionality coefficient, and E _ij represents a heat load supply amount allocated to the ith heat station at the jth time period; e _j represents the total heat generation amount of the thermal power plant in the jth time period, and the calculation formula is as follows:

Where η _i denotes the heat loss coefficient when the thermal power plant delivers a heat load to the i-th thermal power station.

The intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control is characterized in that a characteristic vector of a state space consists of historical outdoor environment data, real-time outdoor environment data and historical heat load data of each heating power station, the updating method is that the current entering time period is set as B, the heat load data and the outdoor environment data of each heating power station in a time period corresponding to the time period B in each year of the previous n years are extracted, the real-time outdoor environment data are obtained, and the characteristic vector of the state space corresponding to the time period B is formed.

The intelligent heat supply regulation and control method based on the deep reinforcement learning and the self-adaptive control is characterized in that when the time period of each heat supply regulation and control starts, real-time outdoor environment parameters are collected and input into a deep reinforcement learning model, an action set is automatically generated by the model and the action with the highest selection probability is output, the heat load supply quantity of a heat power plant to each heat station in the current time period of the heat supply regulation and control is determined based on the action with the highest selection probability, and the total heat production quantity of the heat power plant in the current time period of the heat supply regulation and control is calculated.

The intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control is characterized in that the heat supply strategy is implemented by specifically comprising the steps of self-adaptively controlling the total heat generation amount of a thermal power plant and the heat load supply amount of the self-adaptively controlling the thermal power plant to each thermal power plant, wherein the method for self-adaptively controlling the total heat generation amount of the thermal power plant comprises the following steps:

Collecting fuel consumption in real time;

A PID controller is designed for the thermal power plant, and an error term e ₀ (t) is calculated, and the formula is as follows:

e₀(t)＝E_j-E(t);

wherein E (t) represents the actual heat generation amount at the current moment, and the calculation formula is as follows:

E(t)=η₀·m(t)·H_v;

Wherein η ₀ denotes a boiler heat efficiency, m (t) denotes a fuel mass which has been currently consumed, H _v denotes a heating value of the fuel;

Calculating a control output u ₀ (t) at the current moment through a PID algorithm, wherein u ₀ (t) represents the regulated supply quantity of the fuel;

Based on u ₀ (t), adjusting the fuel supply by the hardware device;

the actual heat production is continuously monitored, and the fuel supply is adjusted until the actual heat production reaches the total heat production.

As an optimal scheme of the intelligent heat supply regulation method based on deep reinforcement learning and self-adaptive control, the method for self-adaptively controlling the heat load supply quantity of the thermal power plant to each heat power station comprises the following steps:

Collecting the primary water supply temperature, the primary backwater temperature and the primary water supply instantaneous flow of each heating power station in real time;

The PID controller is designed for each heating station, and the error term e _i (t) of the PID controller of the ith heating station is calculated as follows:

e_i(t)＝E_ij-E_i(t);

Wherein E _i (t) represents the heat load value actually obtained by the ith heating station, and the calculation formula is as follows:

E_i(t)＝c·(T_ri-T_si)·Q_i(t);

Wherein c is the specific heat capacity of water, T _ri is the primary backwater temperature of the ith heating station, T _si is the primary water supply temperature of the ith heating station, Q _i (T) is the total primary water supply amount of the ith heating station from the current time period to the current time, and the total primary water supply amount is obtained by integrating the instantaneous primary water supply flow with time;

Calculating a control output u _i (t) of each heating power station at the current moment through a PID algorithm, wherein u _i (t) represents the opening degree of a branch valve of the ith heating power station and the frequency of a circulating pump;

And continuously monitoring the heat load value actually acquired by each heating power station, and adjusting the opening degree of the branch valve and the frequency of the circulating pump until the heat load value actually acquired by each heating power station reaches the distributed heat load supply quantity.

Compared with the prior art, the invention has the following beneficial effects:

The deep reinforcement learning is applied to a heat supply regulation scene, so that the transition from passive on-demand heat supply to active prediction and optimization of a heat supply strategy is realized, and the intelligent level of a heat supply system is remarkably improved. The method comprises the steps of finishing macroscopic heat supply strategy planning through deep reinforcement learning, considering complex factors such as seasonality, weather change, predicted user demands and the like, rapidly responding according to the current actual conditions through self-adaptive control, overcoming the limitation of a single technology in heat supply regulation and control, reducing the complexity of a deep reinforcement learning model, relieving the burden of the deep reinforcement learning model and accelerating the convergence speed of a network, and organically combining the two, thereby not only meeting the long-term stable and efficient operation of a heat supply system, but also taking into account the short-term flexible regulation and control demands.

The heat supply strategy based on deep learning can be more accurately matched with the actual heat demand, and the situation of excessive heat supply or insufficient heat supply is avoided, so that the energy utilization efficiency is improved, the unnecessary energy loss is reduced, and the energy conservation and emission reduction are facilitated. The self-adaptive control algorithm is used for executing the heat supply strategy, the total heat generation amount of the thermal power plant and the heat load supply amount received by each heat station can be dynamically adjusted according to actual conditions, the stability and reliability of the whole heat supply system are enhanced, the heat supply quality is ensured, and meanwhile, the energy distribution is optimized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a flow chart of an intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control provided by the invention;

FIG. 2 is a flow chart of a method for training a deep reinforcement learning model for formulating a heating strategy in accordance with the present invention.

Detailed Description

The following detailed description of the present invention is made with reference to the accompanying drawings and specific embodiments, and it is to be understood that the specific features of the embodiments and the embodiments of the present invention are detailed description of the technical solutions of the present invention, and not limited to the technical solutions of the present invention, and that the embodiments and the technical features of the embodiments of the present invention may be combined with each other without conflict.

Example 1

This embodiment describes an intelligent heat supply regulation and control method based on deep reinforcement learning and adaptive control, and referring to fig. 1, the method includes the following steps:

An intelligent heat supply regulation and control method based on deep reinforcement learning and self-adaptive control comprises the following steps:

s1, collecting outdoor environment data and heat load data of each heating power station, and preprocessing the data;

The outdoor environment data comprises temperature, humidity, wind speed and solar radiation intensity, and the outdoor environment data can be used for predicting the heat load requirement of each heating station. Temperature is one of the most significant factors in the heat load demand. Lower outdoor temperatures may result in increased heating heat load. In heating season, the high humidity can make the human body feel cooler, and the demand of people for heating is increased. The wind power influences the convection heat exchange of the outer surface of the building, and the heat dissipation process of the building shell can be accelerated due to the fact that the wind speed is increased, so that the heat loss is increased. Solar radiation has direct influence on the heat gain of a building, especially for a building with a large-area glass curtain wall or a south window, the indoor temperature can be obviously improved by sunlight irradiation, and the heating requirement is reduced.

The preprocessing comprises missing value filling, abnormal value detection and replacement, data normalization and data synchronous integration;

the method for synchronously integrating the data is to correspond the heat load data and the outdoor environment data according to time to form a time sequence corresponding to the heat load data and the outdoor environment data.

The method for filling the missing values is one of linear interpolation, polynomial interpolation and prediction based on a time sequence model;

The method of data normalization is one of minimum-maximum normalization and Z-score normalization, and because outdoor environment parameters and heat load data tend to have different dimensions and numerical ranges, the parameters and the heat load data need to be converted to the same dimension so that a deep reinforcement learning model can process all input features. After the preprocessing step is completed, the collected data can be used as effective input of a deep reinforcement learning model, so that an intelligent model capable of adjusting a heating strategy according to real-time environmental conditions is constructed.

S2, training a deep reinforcement learning model based on the outdoor environment data and the heat load data, wherein referring to FIG. 2, the method comprises the following steps:

The time period is the minimum time unit for heat supply regulation and control, for example, one day is set as one time period;

the state space is composed of the outdoor environment data and heat load data;

A_a＝{a₁,a₂,......,a_k};

the specific form of each action is as follows:

a_j＝{q_j·E_A1,q_j·E_A2,......,q_j·E_Am};

Wherein a _j represents the j-th action in the action set, and the value range of j is 1, 2. Q _j represents a j-th element in the arithmetic array Q;

The scheme is used for setting the action based on the historical heat load requirement of each heating station as a reference, for example, the optional action of the ith heating station is set to be 0.8 times, 0.9 times, 1.0 times, 1.1 times and 1.2 times of the historical average heat load requirement, and the strategy network optimizes the action based on the historical data setting through the comparison of the real-time outdoor environment parameters and the historical outdoor environment parameters, so that the training speed of the deep reinforcement learning model is improved.

The calculation formula of the reward function is as follows:

Wherein s _j represents a feature vector of a state space in a jth time period, a _j represents an action performed in the jth time period, R (s _j,a_j) represents a bonus function value of the action a _j performed when the feature vector of the state space is s _j, alpha is a weight parameter, beta is a proportionality coefficient, which is set by a person skilled in the art according to actual requirements, E _ij represents a heat load supply amount allocated to an ith heat station in the jth time period, and the heat load supply amount is determined by the action selected to be performed in the jth time period; e _j represents the total heat generation amount of the thermal power plant in the jth time period, and the calculation formula is as follows based on the heat load supply amount of each thermal power station:

Wherein η _i represents the heat loss coefficient of the thermal power plant when delivering heat load to the ith thermal power station, which is experimentally determined by one skilled in the art;

the rewarding function gives consideration to the balance between the total heat production of the thermal power plant and the supply and demand of each thermal power station, saves the total energy consumption, and maximally balances the uniformity degree of the heat load supply quantity of each thermal power station.

The strategy network comprises an input layer, a hiding layer and an output layer, wherein the input layer is used for inputting characteristic vectors of a state space, the hiding layer is used for further extracting characteristics of the state space, and the output layer is used for generating selection probability of each action in the action aggregation set under the current state space. Converting the output into a probability distribution using a softmax function to ensure that the sum of the selection probabilities of all actions is 1;

The method for selecting an action is as follows:

inputting the feature vector of the current state space into the strategy network to obtain the selection probability of each action in the action aggregation set under the current state space;

setting a threshold parameter epsilon, wherein the value range is (0,0.15);

Generating a random number r, wherein the value range is [0,1], if r is more than or equal to epsilon, executing the action with highest selection probability, and if r is less than epsilon, randomly selecting one action from the action collection set and executing the action;

the cumulative prize value is calculated as follows:

Wherein R _N represents the current cumulative prize value, N represents the number of actions that have been performed, β represents the discount factor, β ^j represents the power of the discount factor β to the power of j, R (s _j,a_j) represents the prize value for performing action a _j in state space s _j, and j has a range of values of 1, 2.

The calculation formula for updating the parameters of the policy network is as follows:

wherein δ represents any one parameter in the policy network; The gradient of the function pair delta in brackets is represented, eta is the learning rate, L _N is the objective function, and the calculation formula is as follows:

L_N＝ln(p(s_N,a_N)·R_N);

Where p (s _N,a_N) represents the probability of selection of action a _N in the environmental state s _N.

The characteristic vector of the state space consists of historical outdoor environment data, real-time outdoor environment data and historical heat load data of each heating power station, and the characteristic vector updating method of the state space comprises the steps of enabling the current entering time period to be B, extracting the heat load data and the outdoor environment data of each heating power station in the time period corresponding to the time period B in each year of the previous n years, acquiring the real-time outdoor environment data and forming the characteristic vector of the state space corresponding to the time period B.

In the training stage of the deep reinforcement learning model, the real-time outdoor environment data is obtained by actually reading outdoor environment data in a corresponding time period in a state space, and after model training is completed, the real-time outdoor environment data is collected in real time through a sensor and other detection equipment in the actual deployment and application process.

After multiple iterations, the jackpot value tends to stabilize and no significant fluctuations occur, i.e., the jackpot value is considered to converge, and the policy network has been able to make decisions to maximize the jackpot value.

S3, formulating a heating strategy based on the deep reinforcement learning model, wherein the method comprises the following steps:

And determining the heat load supply quantity of the heat power plant to each heat station in the current heat supply regulation time period based on the action with the highest selection probability, and calculating the total heat production quantity of the heat power plant in the current heat supply regulation time period.

And S4, executing the heating strategy through an adaptive control algorithm. The method comprises the steps of adaptively controlling the total heat generation amount of the thermal power plant and adaptively controlling the heat load supply amount of the thermal power plant to each thermal power plant.

The method for adaptively controlling the total heat generation amount of the thermal power plant comprises the following steps:

Collecting fuel consumption in real time;

e₀(t)＝E_j-E(t);

E(t)=η₀·m(t)·H_v;

Wherein η ₀ denotes the boiler thermal efficiency, the efficiency of converting the heat energy released by the combustion of the fuel into the heat of the feed water for the boiler, m (t) denotes the mass of fuel currently consumed, H _v denotes the heating value of the fuel, the heat released when the fuel is completely combusted for a unit mass is generally given in the product manual of the fuel;

Mapping the control output signal u ₀ (t) into the adjustment range of the actual fuel supply quantity, and driving a fuel valve by adjusting a combustion controller or a servo motor to realize accurate control of the fuel supply quantity;

Continuously monitoring the actual heat generation amount, and adjusting the fuel supply amount until the actual heat generation amount reaches the total heat generation amount;

The parameters of the PID controller are preliminarily set through on-site debugging and an empirical formula according to the requirements of the response characteristics of the boiler and the stability of the system, and are adjusted regularly.

The method for adaptively controlling the heat load supply amount of the thermal power plant to be transmitted to each thermal power station comprises the following steps:

the method comprises the steps of collecting the primary water supply temperature, the primary backwater temperature and the primary water supply instantaneous flow of each heating station in real time, and acquiring heat supply network data in real time by installing a flowmeter and a thermometer on a main pipe network.

e_i(t)＝E_ij-E_i(t);

E_i(t)＝c·(T_ri-T_si)·Q_i(t);

The distribution proportion of the primary water supply quantity transmitted by the thermal power station to each thermal power station can be adjusted by adjusting the opening of the branch valve and the frequency of the circulating pump, and then the distribution proportion of the heat load supply quantity of each thermal power station is adjusted.

In deep reinforcement learning, the advantages of adaptive control can help improve the performance and convergence speed of the algorithm. The self-adaptive control can solve part of bottom layer control problems by monitoring and adjusting system parameters in real time, and especially can perform local optimization faster when facing the change of the dynamic characteristics of the system, so as to keep the stability of the system.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are all within the protection of the present invention.

Claims

1. An intelligent heating control method based on deep reinforcement learning and adaptive control, characterized in that it includes the following steps:

S1: Collect outdoor environment data and heat load data of each heating station, and perform data preprocessing;

S2: training a deep reinforcement learning model based on the outdoor environment data and the heat load data;

The method for training the deep reinforcement learning model is as follows:

S100: Setting the time period for heating control; constructing the state space, action set, and reward function; constructing the strategy network and initializing the parameters;

The time period is the minimum time unit for heat supply regulation;

The state space is composed of the outdoor environment data and heat load data;

The action set is dynamically generated based on the state space and time period, and the method is as follows: for any time period A, extract the heat load data of each thermal power station in the time period corresponding to the time period A in each year of the previous n years, where n is a positive integer; calculate the average of the n heat load data of the i-th thermal power station, denoted as E _Ai , the value range of i is 1, 2, ..., m, and m represents the number of thermal power stations; set an arithmetic progression containing k elements, denoted as Q, Q = {q ₁ ,q ₂ , ..., q _k }; wherein q ₁ is the minimum element in the sequence Q, and the value range is (0, 1); q _k is the maximum element in the sequence Q, and the value range is (1, 2); then for the time period A, the form of the action set A _a is as follows:

A _a = {a ₁ , a ₂ ,..., a _k };

The specific form of each action is as follows:

a _j ={q _j ·E _A1 ,q _j ·E _A2 ,...,q _j ·E _Am };

Where a _j represents the jth action in the action set, and the value range of j is 1, 2, ..., k; q _j represents the jth element in the arithmetic sequence Q;

The calculation formula of the reward function is as follows:

Where _sj represents the characteristic vector of the state space at the jth time period; _aj represents the action performed at the jth time period; R( _sj , _aj ) represents the reward function value of executing action _aj when the characteristic vector of the state space is _sj ; α is the weight parameter, β is the proportional coefficient; _Eij represents the heat load supply allocated to the i-th thermal power station at the jth time period; represents the heat load demand of the i-th thermal power station in the j-th time period; _Ej represents the total heat output of the thermal power plant in the j-th time period, and the calculation formula is as follows:

Where η _i represents the heat loss coefficient when the thermal power plant delivers heat load to the i-th thermal power station;

S200: Select an action from the action collection and execute it;

S300: Calculate the cumulative reward value and update the parameters of the strategy network;

S400: Entering the next time period, generating a new action set and updating the feature vector of the state space;

S500: Repeat steps S200-S400 until the cumulative reward value converges and the training of the policy network is completed; save the policy network and deploy the deep reinforcement learning model;

S3: Formulate a heating strategy based on the deep reinforcement learning model;

S4: Execute the heating strategy through an adaptive control algorithm.

2. The intelligent heating control method based on deep reinforcement learning and adaptive control according to claim 1, characterized in that: the outdoor environment data includes temperature, humidity, wind speed, and solar radiation intensity;

The heat load data represents the amount of heat that the heating station needs to provide to keep the indoor temperature of all users at the heating standard temperature;

The preprocessing includes missing value filling, outlier detection and replacement, data normalization, and data synchronization integration; wherein the method of data synchronization integration is to match the heat load data with the outdoor environment data according to time to form a time series corresponding to the heat load data and the outdoor environment data.

3. The intelligent heating control method based on deep reinforcement learning and adaptive control as described in claim 2 is characterized in that: the characteristic vector of the state space is composed of historical outdoor environment data, real-time outdoor environment data, and historical heat load data of each thermal power station; the updating method is as follows: let the current time period be B, extract the heat load data and outdoor environment data of each thermal power station in the time period corresponding to time period B in each year of the previous n years, and obtain real-time outdoor environment data to form the characteristic vector of the state space corresponding to time period B.

4. The intelligent heating regulation method based on deep reinforcement learning and adaptive control as described in claim 3 is characterized in that: the method for formulating the heating strategy is as follows: at the beginning of each heating regulation time period, real-time outdoor environmental parameters are collected and input into the deep reinforcement learning model, and the model automatically generates an action collection and outputs the action with the highest selection probability; based on the action with the highest selection probability, the heat load supply delivered by the thermal power plant to each thermal power station in the current heating regulation time period is determined, and the total heat production of the thermal power plant in the current heating regulation time period is calculated.

5. The intelligent heating control method based on deep reinforcement learning and adaptive control according to claim 4 is characterized in that: executing the heating strategy specifically includes adaptively controlling the total heat output of the thermal power plant and adaptively controlling the heat load supply of the thermal power plant to each thermal power station; wherein the method for adaptively controlling the total heat output of the thermal power plant is as follows:

Real-time collection of fuel consumption;

To design a PID controller for a thermal power plant, calculate the error term e ₀ (t) using the following formula:

e ₀ (t) = E _j - E (t);

Among them, E(t) represents the actual heat production at the current moment, and the calculation formula is as follows:

E(t) = η ₀ ·m(t)·H _v ;

Where η ₀ represents the thermal efficiency of the boiler; m(t) represents the mass of fuel currently consumed; H _v represents the calorific value of the fuel;

The control output u ₀ (t) at the current moment is calculated by the PID algorithm, where u ₀ (t) represents the adjusted supply amount of fuel;

Based on u ₀ (t), the fuel supply is adjusted through the hardware device;

The actual heat production is continuously monitored and the fuel supply is adjusted until the actual heat production reaches the total heat production.

6. The intelligent heating control method based on deep reinforcement learning and adaptive control as claimed in claim 5 is characterized in that: the method for adaptively controlling the heat load supply amount delivered by the thermal power plant to each thermal power station is as follows:

Real-time collection of primary water supply temperature, primary return water temperature, and primary water supply instantaneous flow rate of each thermal power station;

Design a PID controller for each thermal power station and calculate the error term e _i (t) of the PID controller of the i-th thermal power station. The formula is as follows:

e _i (t) = E _ij - _{E i} (t);

Where E _i (t) represents the actual heat load value obtained by the i-th thermal power station; the calculation formula is as follows:

E _i (t)=c·(T _ri -T _si )·Q _i (t);

Wherein, c is the specific heat capacity of water; _Tri represents the primary return water temperature of the i-th thermal power station; _Tsi represents the primary supply water temperature of the i-th thermal power station; _Qi (t) represents the total primary water supply of the i-th thermal power station from the current time period to the current moment, which is obtained by integrating the instantaneous flow of primary water supply over time;

The control output u _i (t) of each thermal power station at the current moment is calculated by the PID algorithm, where u _i (t) represents the branch valve opening and circulating pump frequency of the i-th thermal power station;

Continuously monitor the actual heat load value obtained by each thermal power station, and adjust the branch valve opening and circulation pump frequency until the actual heat load value obtained by each thermal power station reaches the allocated heat load supply.