CN120338210B

CN120338210B - Reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning

Info

Publication number: CN120338210B
Application number: CN202510819807.6A
Authority: CN
Inventors: 郑冬燕; 汤国和; 李善综; 邱文丰; 林木隆; 赖永泉; 郭淑慧; 巫美强; 李伟; 蒋永强; 陈毅锋; 王晗; 李敏; 刘和昌; 李嘉第
Original assignee: Zhujiang Water Resources Comprehensive Technology Center Of Zhujiang Water Resources Commission Of Ministry Of Water Resources
Current assignee: Zhujiang Water Resources Comprehensive Technology Center Of Zhujiang Water Resources Commission Of Ministry Of Water Resources
Priority date: 2025-06-19
Filing date: 2025-06-19
Publication date: 2025-10-21
Anticipated expiration: 2045-06-19
Also published as: CN120338210A

Abstract

The invention discloses a reservoir dispatching method based on a deep learning self-adaptive dynamic network and reinforcement learning, which relates to the technical field of reservoir dispatching and comprises the steps of acquiring real-time water level data, meteorological data, regional cloud layer and radar image data of surface features of a reservoir, carrying out data fusion on the three data, inputting the fused data into a preset self-adaptive dynamic transducer network model to obtain predicted output, wherein the predicted output is a reservoir water level sequence, a warehouse-in flow sequence and a lower discharge flow sequence, constructing a reinforcement learning state space, inputting the reinforcement learning state space into a preset reinforcement learning network, wherein the action space of the reinforcement learning network comprises flood discharge quantity, power generation flow and ecological flow, and the reinforcement learning network is optimized to maximize accumulated discount rewards, and the rewards function is a weighting function of flood control rewards, power generation rewards and ecological rewards. The invention improves the collaborative management efficiency of a plurality of targets such as reservoir flood control safety, power generation benefit, ecological protection and the like.

Description

Reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning

Technical Field

The invention relates to the technical field of reservoir dispatching, in particular to a reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning.

Background

With the aggravation of global climate change, extreme weather events frequently occur, and the traditional reservoir scheduling method is difficult to adapt to complex and changeable hydrological meteorological environments. The conventional scheduling mode is often based on fixed rules or offline history experience, lacks effective utilization of real-time weather and water level data, faces sudden heavy rain or drought and waterlogging abrupt transition, and is easy to generate decision delay and water level control errors, so that flood control risks are increased, power generation efficiency is reduced, and stability of a downstream ecological system is damaged. The reservoir scheduling method of the deep learning network structure which is currently applied is usually of a static fixed structure, lacks real-time self-adaptive capacity, cannot adjust the network structure in time along with the dynamic change of hydrologic prediction requirements, and causes unstable prediction precision and insufficient model generalization capability.

Reservoir dispatching methods based on deep learning self-adaptive dynamic network and reinforcement learning are developed to solve the problems.

Disclosure of Invention

The invention provides a reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning, which aims to solve the problems of unstable prediction precision and insufficient generalization capability of the existing reservoir dispatching method.

The invention realizes the above purpose through the following technical scheme:

the reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning comprises the following steps:

acquiring real-time water level data, meteorological data, regional cloud layer and radar image data of ground surface characteristics of a reservoir, and carrying out data fusion on the three data;

Inputting the fused data into a preset self-adaptive dynamic transducer network model to obtain predicted output, wherein the predicted output comprises a reservoir water level sequence, a warehouse-in flow sequence and a lower drainage flow sequence;

Constructing a reinforcement learning state space, wherein the reinforcement learning state space comprises a predicted reservoir water level sequence, a warehouse-in flow sequence, a downward leakage flow sequence and a latest real-time meteorological data sequence of a preset time step in the future of a transducer network model;

Inputting the reinforcement learning state space into a preset reinforcement learning network, wherein the action space of the reinforcement learning network comprises flood discharge amount, power generation flow and ecological flow, the optimization target of the reinforcement learning network is to maximize accumulated discount rewards, the rewarding function is a weighting function of flood control rewarding items, power generation rewarding items and ecological rewarding items, and the reinforcement learning network finally converges and outputs an optimal combination scheme of flood discharge amount, power generation flow and ecological flow through continuous iterative optimization.

Further, the preprocessing of the real-time water level data and the meteorological data is performed before the data fusion, and the preprocessing step comprises the following steps:

further, performing spatial interpolation processing on the real-time water level data, the meteorological data, the regional cloud layer and the radar image data of the ground surface characteristics by adopting a Kriging interpolation algorithm based on the spatial distance weight to obtain continuity data covering the whole reservoir region;

detecting abnormal values of the continuous data, and eliminating the abnormal values in real time;

and carrying out smooth denoising treatment on the data with the outliers removed.

Further, when the result error of each prediction output and actual observation of the adaptive dynamic transducer network model exceeds a preset upper limit threshold, a preset number of coding layers are automatically added to the adaptive dynamic transducer network model, when the result error is lower than a preset lower limit threshold, a preset number of coding layers are automatically reduced to the adaptive dynamic transducer network model, and when the result error of each prediction output and actual observation of the adaptive dynamic transducer network model is between the lower limit threshold and the upper limit threshold, the adaptive dynamic transducer network model is unchanged.

Further, the result error is a weighted sum of a reservoir water level mean square error of a predicted value and a true value of the reservoir water level, a reservoir flow mean square error of a predicted value and a true value of the reservoir flow, a predicted value of the drain flow and a drain flow mean square error of the true value.

Further, the adaptive dynamic transducer network model outputs weights of flood control rewarding items, power generation rewarding items and ecological rewarding items in the rewarding function each time based on a dynamic attention mechanism。

Further, according to the weights of flood control rewarding items, electricity generation rewarding items and ecological rewarding itemsUpdating target weightsThe updating mode is as follows:

;

After the dynamic attention weight is adjusted in real time, the dynamic attention weight is used as a target weighting coefficient in the next reinforcement learning decision, so that the system can be ensured to accurately adapt to the real-time requirements of different scheduling targets according to the real-time state;

The dynamic attention mechanism adopts a target weight vector of a trainable parameter, and calculates the attention weight of each target in real time by a Softmax function.

Further, R _t is a cumulative discount prize, alpha _{Flood control} is a target weight coefficient for a flood control prize, alpha _{Generating electricity} is a target weight coefficient for a power generation prize, alpha _{Ecological system} is a target weight coefficient for a ecological prize,In order to predict the water level,In order to prevent flood and limit the water level,The method is characterized in that the method comprises the steps of generating economic benefit coefficient of unit flow, wherein Q _{Generating electricity ,t} is generating flow, Q _{Ecological system ,t} is current ecological flow, Q _{Ecological system , Ideal for} is ideal ecological flow, R _{Flood control} is flood control rewarding item, R _{Generating electricity} is generating rewarding item, R _{Ecological system} is ecological rewarding item and t is time step.

Further, determining the opening degree of each flood discharge gate of the reservoir and the power output scheme of the generator set according to the output optimal combination scheme of the flood discharge amount, the power generation flow and the ecological flow, and forming a specific execution instruction, wherein the specific execution instruction comprises the following steps:

calculating in real time according to the output flood discharge amount and a gate flow-opening relation formula to obtain the opening of the flood discharge gate;

Determining the start-stop number of the units and the load distribution of each unit in real time according to the output power generation flow;

constructing an execution instruction according to the opening degree of the flood discharge gate, the start-stop number of the units and the load distribution of each unit;

Judging whether the total discharging amount of the flood discharging and power generating is not lower than the ecological flow, if yes, considering that the ecological water is contained without additional scheduling, otherwise, supplementing water through an ecological special gate hole or a low-load unit according to the difference, wherein the difference is the difference between the ecological flow and the total discharging amount of the flood discharging and power generating.

Further, a long-term running root mean square error historical database is constructed, statistical analysis is carried out on historical trends of result errors and decision errors at regular intervals, and a preset error threshold value is dynamically adjusted according to the historical error trends.

Further, acquiring real-time water level data, meteorological data, regional cloud cover and radar image data of surface features of the reservoir, including:

Acquiring real-time water level data monitored by a wave-type water level sensor, wherein the wave-type water level sensor is deployed in a reservoir area and a key section of a main warehouse-in river channel;

Acquiring meteorological data of a meteorological data acquisition system, wherein the meteorological data acquisition system is deployed in a water reservoir area and comprises a precipitation sensor, a wind speed sensor and a temperature and humidity sensor;

and acquiring radar image data of regional cloud and earth surface features, and performing satellite remote sensing observation on the upstream regions of reservoirs and watercourses by using a high-resolution radar remote sensing satellite with the spatial resolution of 30m to acquire the radar image data of the regional cloud and earth surface features.

The invention has the beneficial effects that:

Compared with the prior art, the reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning provided by the invention has the advantages that the cooperative management efficiency of a plurality of targets such as reservoir flood control safety, power generation benefit and ecological protection is comprehensively improved, the stability of prediction precision is improved, and the model flooding capability is improved.

Drawings

FIG. 1 is a flowchart of a reservoir dispatching method based on deep learning adaptive dynamic network and reinforcement learning in an embodiment.

Fig. 2 is a schematic diagram of an adaptive dynamic transducer network structure in step S2 in the embodiment.

FIG. 3 is a flowchart of a reinforcement learning multi-objective collaborative optimization module in step S3 in an embodiment.

Fig. 4 is a functional framework diagram of the intelligent co-scheduling platform in step S6 in the embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Furthermore, the terms "first," "second," and the like, are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

The following describes specific embodiments of the present invention in detail with reference to the drawings.

As shown in fig. 1, the specific implementation manner of the reservoir dispatching method flow chart based on the deep learning self-adaptive dynamic network and the reinforcement learning of the invention is as follows:

and S1, real-time data fusion and preprocessing.

In practical implementation of the invention, 8 ultrasonic water level sensors (model: MB 7389) are deployed on the reservoir area and key sections of the main warehouse river channel, the water level measurement precision is +/-2 cm, real-time water level data are acquired every 1 minute, and the data are uploaded to a data fusion center in real time through a LoRa wireless transmission protocol. Meanwhile, a meteorological data acquisition system consisting of meteorological stations is installed in a warehouse area and specifically comprises a precipitation sensor (model: TB4, precision + -0.5 mm), a wind speed sensor (model: windSonic, precision + -0.2 m/s) and a temperature and humidity sensor (model: HMP155, temperature precision + -0.5 ℃ and humidity precision + -3%), wherein meteorological data acquisition is carried out every 5 minutes. In addition, a high-resolution radar remote sensing satellite with the spatial resolution of 30m is used, satellite remote sensing observation is carried out on the reservoir and the upstream area of the river basin once per hour, and radar image data of regional cloud layers and surface features are obtained, wherein the data format is GeoTIFF. The data fusion module firstly carries out spatial interpolation processing on a plurality of water level measuring points and meteorological sensor data by adopting a Kriging interpolation algorithm based on spatial distance weight, and obtains continuity data covering the whole reservoir area. The abnormal value detection after data fusion is realized based on a Z-Score method, regional cloud layers are combined with meteorological data, sudden heavy rain and other conditions are predicted in advance, and the earth surface features are used for distinguishing different runoff producing effects under the same rainfall so as to schedule more accurately.

The specific formula is as follows:

Wherein, the Representing the real-time measurement value,Representing the mean of the past 24 hours of historical data,Represents standard deviation, Z represents standard fraction. When (when)And when the system automatically marks the data as an abnormal value and eliminates the abnormal value in real time. Finally, after the smoothing denoising treatment is carried out by a Savitzky-Golay smoothing filter, the data is standardized to be processed into a sequence form, and the sequence form is periodically updated to the input of a subsequent network model by taking 10 minutes as a unit.

And S2, adaptive dynamic network accurate prediction based on a transducer structure.

The method adopts a Transformer network structure, takes the standardized data sequence processed in the step 1 as the input of the network, realizes short-time (30 minutes in future) accurate prediction of reservoir water level, warehouse-in flow and lower discharge flow, and comprises the following specific embodiments:

Step 2.1, as shown in fig. 2, the initial structure of the transducer network is built up by 4 standard transducer encoder layers, each comprising a multi-headed self-attention sub-Layer and a feed-forward neural network sub-Layer, and Layer Norm Layer and residual connection. Each self-attention sub-layer contains 8 attention heads, each head having dimensions of 64, so the output dimension of each self-attention sub-layer is 512 dimensions, calculated as follows:

the calculation mode of the single attention head is as follows:

Wherein, the Respectively representing a query matrix, a key matrix and a value matrix, and the dimensions are all. The output of the multi-head attention is:

Wherein, the ,Representing a linear transformation parameter matrix.

The feedforward neural network is composed of two linear transformation layers, the structure is 512-2048-512-D, and the formula is as follows:

The initial input data dimension of the network is standardized data of 120 minutes in the past (12 time steps are included in each time step, the characteristic data of water level, precipitation amount, wind speed, temperature and humidity and the like are included in each time step, the total input dimension is 128 dimensions), the water level, warehouse-in flow and downward discharge flow of 3 time steps (30 minutes) in the future are output to be predicted (the output dimension is 3 dimensions), and meanwhile, the weights of flood control rewarding items, power generation rewarding items and ecological rewarding items in the rewarding function are output (the output dimension is 3 dimensions). The training process adopts an Adam optimizer, the initial learning rate is 0.001, and the loss function adopts a Mean Square Error (MSE):

Wherein, the The actual observed value is represented by a set of values,Representing the predicted value of the network output,Is the number of samples in a batch. In the training process, branches outputting weights of flood control rewarding items, power generation rewarding items and ecological rewarding items are not involved in training, namely gradients of the branches are cut off.

Step 2.2, self-adaptive dynamic adjustment of network structure, in which the network dynamic adjustment module calculates the result error according to real timeLine automatic adjustment of the number of layers of a transducer network, wherein the resulting errorThe calculation mode of (2) is as follows:

Wherein, the Mean square error and weight of predicted value and true value output by network、、By means of an off-line cross-validation determination,The water level is indicated and the water level,The flow rate of the warehouse entry is represented,The leakage flow is represented, t represents time, and t represents time;

taking the latest W prediction loops as a sliding window, firstly calculating a comprehensive error sequence Is the sliding average value of (2)Standard deviationThe standard deviation takes the form of a weighted covariance:

Wherein the method comprises the steps of The index weight vector determined for offline cross-validation,For covariance matrices of three classes of RMSE within a window, then dynamically constructing thresholds:

;

Wherein the coefficient is relaxed Offline determination on a historical dataset by bayesian optimization, as the overall result error gradually decreases,With a consequent reduction in the size of the film,Correspondingly down-regulating, otherwise, when the result error increases sharply,Is dynamically lifted to avoid false positives, and once the result is in real time, the errorExceeding the new upper threshold, the layer adding operation is still triggered, when the result error of each predicted result and the actual observed result exceeds the upper thresholdWhen the number of network layers is increased by 1 and the maximum increase is not more than 3, when the result error is lower than the lower thresholdWhen the network layer number is reduced by 1, the specific regulation rule is as follows:

;

the dynamic adjustment mechanism of the network layer number can effectively balance the calculation load and the prediction precision, and the real-time property and the stability of the prediction are maintained.

And 2.3, implementing a dynamic attention mechanism, wherein the dynamic attention mechanism is designed for adaptively adjusting the predicted weights of three different targets of flood control, power generation and ecology. According to the weights of flood control rewarding items, electricity generation rewarding items and ecological rewarding itemsUpdating target weightsThe updating mode is as follows:

;

After the dynamic attention weight is adjusted in real time, the dynamic attention weight is used as a target weighting coefficient in the next reinforcement learning decision, so that the system can be ensured to accurately adapt to the real-time requirements of different scheduling targets according to the real-time state.

And 3, implementing the reinforcement learning multi-objective collaborative optimization module.

And 3.1, defining a multi-objective optimized reinforcement learning state space, wherein the state space comprises meteorological data, a predicted water level sequence output by a self-adaptive network, a warehouse-in flow sequence and a downward leakage flow sequence. As shown in fig. 3, a state space for reinforcement learning is constructed. And (3) defining a state vector of the reinforcement learning module as S _t based on the prediction result output by the dynamic transducer network in the step (2). Specifically, the state vector contains the future 3 time-step predicted water level sequences output by the transducer networkWarehouse-in flow prediction sequenceAnd a down-flow prediction sequence Q _out,t=[Q_out,t+1,Q_out,t+2,Q_out,t+3 and a latest real-time meteorological data sequence M _t, wherein the meteorological data comprise precipitation, wind speed and temperature and humidity. After the state data are spliced, the dimension of the state space is controlled to be between 100 and 500, and the state space is input into the reinforcement learning network after standardized processing. Warehouse entry flow sequence, downward leakage flow sequence

And 3.2, defining a reinforcement learning action space. The motion vector a _t contains three variables of flood discharge, power generation flow and ecological flow, and is specifically set as follows:

Flood discharge flow regulation range is 50 to 5000 m3/s, and step length is 50m 3/s;

The power generation flow adjustment range is 100 to 2000 m3/s, and the step length is 20m 3/s;

the ecological flow regulating range is 10 to 500 m3/s, and the step length is 10 m3/s;

The discrete combination of the action space is generated by adopting a grid search mode, and the state vector is mapped to the action space through a three-layer fully connected network so as to strengthen the learning model to perform efficient action selection.

And 3.3, strengthening the concrete implementation of the learning network. The reinforcement learning network adopts a deep Q network, and the network structure is specifically as follows:

An input layer for inputting a state vector S _t;

the hidden layer is composed of a 4-layer convolution network and a 2-layer full-connection network, wherein the convolution network core size is [8×8, 4×4,3×3] in sequence, the convolution step length is [4, 2,1, 1], and the output feature dimensions are 128, 64 and 64 respectively;

and an output layer, namely the Q value of each action combination.

Optimization of reinforcement learning networks is aimed at maximizing cumulative discount rewardsThe reward function is defined as the weighted result of three targets (flood control, power generation, ecology), and the specific formula is:

;

wherein, the rewarding weight alpha _{Flood control}、α_{Generating electricity}、α_{Ecological system} comes from the calculation result of the dynamic attention mechanism in the step 2.3, and is dynamically updated in real time. Specifically, each target prize is defined as follows:

The flood control rewarding item R _{Flood control} gives negative feedback according to the predicted water level exceeding the flood control limit water level, and is defined as: . Wherein, the In order to predict the water level,Is the flood control limit water level.

The electricity generation rewarding item R _{Generating electricity} is calculated according to the economic benefit generated by the current electricity generation flow, and is defined as: . Wherein k _{Generating electricity} is a unit flow power generation economic benefit coefficient, and Q _{Generating electricity ,t} is power generation flow;

The ecological rewarding item R _{Ecological system} is calculated according to the degree that the downstream ecological flow deviates from the ideal ecological flow: . Wherein Q _{Ecological system ,t} is the current ecological flow rate, and Q _{Ecological system , Ideal for} is the ideal ecological flow rate.

In the training process, parameters of the dynamic transducer network are frozen, only a layer outputting the weight is activated, and training is carried out in combination with the reinforcement learning network.

And finally converging the reinforcement learning network to the optimal strategy of multi-objective collaborative optimization through continuous iterative optimization, and outputting an optimal combination scheme of each flow.

The network updating learning rate of the reinforcement learning network is between 0.001 and 0.005, the dynamic range of the weight coefficient of each target in the reward function is 0.4-0.8 for flood control, 0.1-0.5 for power generation and 0.1-0.3 for ecology, and the target weight is updated once every 30 minutes.

And 4, implementing a real-time reservoir dispatching decision module.

In the embodiment, the real-time reservoir dispatching decision module determines the opening degree of each flood discharge gate of the reservoir and the power scheme of the generator set in real time through the optimal strategy result output by the reinforcement learning multi-objective collaborative optimization module, and forms a specific execution instruction. Specifically, the floodgate opening control commandFlood discharge flow output by reinforcement learning moduleThe valve flow-opening degree relation formula is obtained through real-time calculation:

Wherein, the Is an empirical relationship function between the flood discharge amount and the opening degree obtained by actual measurement according to the hydraulic characteristics of the gate.

The output force adjusting instruction of the generator set is based on the power generation flow output by reinforcement learningThe method comprises the steps of determining the start-stop number of units and the load distribution of each unit in real time:

Wherein, the Is the power generated by the ith unit,Is the water density of the water, the water is in a water-tight state,The gravity acceleration, the H water head,For a real-time head of water,For the efficiency of the machine set,For starting the number of units.

On the basis, checking whether the total drainage quantity of Q _{Flood discharge ,t}＋Q_{Generating electricity ,t} is not lower than Q _{Ecological system ,t}, if so, considering that ecological water is contained without additional scheduling, and if not, supplementing water according to the difference Q _{supplementing ecology ,t}=Q_{Ecological system ,t－}(Q_{Flood discharge ,t}＋Q_{Generating electricity ,t}) through an ecological special gate hole or a low-load unit to ensure that the downstream ecological flow reaches the standard so as to supplement the ecological flow.

Under extreme rainfall conditions (such as the rainfall intensity of a reservoir area exceeds 30 mm/h), the system automatically promotes the flood control target weight alpha _{Flood control} to 0.85, the reinforcement learning network outputs a high-intensity flood discharge strategy in real time, and a storm event is responded quickly, so that the water level of the reservoir area is always strictly controlled below the flood control limit water level, and the flood risk is avoided.

And under the extreme rainfall condition, the system automatically increases the flood control target weight to between 0.7 and 0.9, adjusts the flood discharge amount at the fastest speed and ensures that the water level of the reservoir area is always controlled below the flood control limit water level.

And 5, implementing a rolling optimization and feedback updating mechanism.

Step 5.1, real-time evaluation of result errorAnd dynamic network updates. The system automatically calculates the result error once per hourCalculating an upper limit threshold and a lower limit threshold according to the step S2.2, and triggering the updating process of the transducer self-adaptive dynamic network structure and parameters according to the adjustment rule so as to reduce the error of the subsequent result. The network update is realized through a gradient descent algorithm, the learning rate is 0.001, and the real-time prediction work is immediately put into again after the parameter adjustment is completed.

And 5.2, the reinforcement learning strategy is finely tuned regularly. And automatically counting multi-objective optimization results of reinforcement learning decisions by the system at the end of each day, wherein the multi-objective optimization results comprise flood control risk reduction conditions, power generation income conditions and ecological flow guarantee conditions, and comparing and analyzing with the previous cycle targets. And (3) performing parameter fine adjustment of the reinforcement learning network according to summarized data every week, wherein an experience playback strategy is adopted in the fine adjustment process, the state-action-rewarding data of the last week are stored, and parameters of the reinforcement learning network are updated by a method of randomly extracting samples in batches, so that continuous optimization and generalization performance of the network are ensured.

And 5.3, establishing a long-term error database and periodically optimizing. The system establishes a long-term operation error historical database, and results errors are regularly obtained every monthAnd carrying out statistical analysis on the historical trend of the decision error. Dynamically adjusting a prediction error threshold (initial 0.05m, adjusting a range of + -0.01 m according to trend) and strengthening learning target weight according to historical error trend、、The dynamic adjustment range of the device is controlled within +/-10% to ensure stability and self-adaptation capability in long-term operation.

And 6, deploying and implementing the intelligent collaborative scheduling platform.

As shown in fig. 4, the implementation of the intelligent collaborative scheduling platform is realized through an integrated intelligent management platform deployed on a high-performance server, and the platform has functional modules of real-time data monitoring, automatic execution of prediction decision, real-time alarm, historical data analysis and the like. The platform adopts a distributed architecture, is deployed on an Intel Xeon high-performance server (CPU 32 core, memory 128 GB) and has disaster recovery backup and remote Web access interfaces. The real-time data monitoring module receives and displays sensing data such as water level, weather and the like in real time, the self-adaptive network prediction module updates a prediction result every 10 minutes, the reinforcement learning decision module outputs an optimal scheduling instruction in real time and automatically transmits the optimal scheduling instruction to the on-site execution equipment, and the automatic scheduling execution module completes real-time execution of gate opening and power generation load. The response time of the system alarm mechanism is controlled within 30 seconds, so that the system alarm mechanism can automatically detect events such as water level overrun, extreme weather abnormality and the like, trigger real-time alarm and send the alarm to management personnel through short messages and mails. The platform visual interface comprises a real-time reservoir state (such as a water level real-time curve and a gate opening indication diagram), a meteorological trend diagram and a multi-objective optimization decision chart, supports quick query and analysis of historical data, and provides comprehensive technical support for reservoir comprehensive management in a chart and report mode.

And 7, monitoring the long-term running state of the system and performing performance evaluation.

In order to ensure the long-term stability and the self-adaptability of the system, a perfect long-term running state monitoring and performance evaluation mechanism is established, and the comprehensive performance of the system is evaluated regularly. Generating a comprehensive system operation evaluation report once in a quarter, wherein the evaluation indexes comprise:

Flood control risk reduction rate:

;

the power generation benefit improvement rate:

;

Ecological flow guarantee rate:

;

The evaluation error of each index is controlled within +/-5 percent. According to the quarter evaluation result, if any index has a significant decreasing trend (the decreasing amplitude exceeds 10%), the updating and structure adjusting processes of the self-adaptive dynamic network structure and the reinforcement learning network parameters are automatically triggered to restore the system performance. In addition, a system operation log and abnormal event recording mechanism is established, and changes of the scheduling strategy, prediction of abnormal conditions, parameter adjustment records and the like are stored and managed for a long time to form a complete operation database for annual technology upgrading and system maintenance decision.

The invention comprehensively improves the collaborative management efficiency of a plurality of targets such as reservoir flood control safety, power generation benefit, ecological protection and the like through a real-time data fusion preprocessing technology, a self-adaptive dynamic network structure design and a multi-target reinforcement learning decision mechanism. Especially in extreme weather conditions, the invention can quickly respond and dynamically adjust the scheduling target weight, and effectively reduce decision delay and risk in the traditional reservoir scheduling mode. Through a dynamic network structure and a rolling updating and feedback optimization mechanism for reinforcement learning decision, the invention can keep high self-adaptability and stability for a long time, and meets the strict requirements of real-time and accuracy of actual reservoir management. The method can be widely applied to intelligent and refined management of large and medium-sized reservoirs, and the reservoir scheduling decision level and the safe operation guarantee capability are obviously improved.

The invention provides a reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning, which effectively improves the accuracy and the integrity of reservoir real-time state data by carrying out real-time fusion and accurate pretreatment on multi-source data through a real-time water level sensor, a meteorological data acquisition system and high-resolution satellite remote sensing equipment. By designing the adaptive dynamic network and the dynamic attention mechanism based on the transducer structure, the invention realizes short-time accurate prediction of reservoir water level, warehouse-in flow and discharging flow, and can automatically and dynamically adjust the network structure according to the result error, thereby ensuring good balance between prediction precision and calculation efficiency. Meanwhile, the invention combines a deep reinforcement learning method, constructs a multi-target collaborative optimization decision module, outputs an optimal combination strategy of flood discharge, power generation and ecological flow in real time based on a dynamic prediction result, and realizes the real-time dynamic intelligent collaborative regulation and control of multiple targets of the reservoir. In a real-time scheduling execution link, the method can quickly convert the optimization strategy into accurate execution instructions of gate opening and generator set output, quickly respond under extreme weather conditions, and ensure flood control safety.

The invention provides a complete and innovative rolling optimization feedback updating mechanism, which automatically triggers network parameter fine adjustment and strategy updating through real-time evaluation of result errors and reinforcement of learning decision effects, and effectively improves the stability and adaptability of long-term operation. The deployment of the intelligent collaborative scheduling platform realizes the real-time monitoring of data, the automatic execution of predictive decision-making and the rapid alarm of abnormal events, and effectively improves the automation, refinement and intellectualization level of reservoir management. In addition, the invention further establishes a monitoring and evaluating mechanism for the long-term running state and performance of the system, can evaluate the realization conditions of flood control safety, power generation benefit and ecological protection targets regularly, and adaptively optimizes the network structure and parameters according to the evaluation result, so that the long-term running performance of the system is kept in an optimal state. Experimental verification in an actual reservoir environment shows that the implementation of the method can obviously reduce the flood control risk of the reservoir, improve the economic benefit of power generation and effectively ensure ecological flow, has the outstanding advantages of rapid real-time response, strong multi-objective collaborative optimization capability and high long-term operation stability, and can effectively meet the management requirements of a modern reservoir in complex weather and hydrologic environments.

The reservoir dispatching method based on the deep learning self-adaptive dynamic network solves the problem.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the scope of the invention.

Claims

1. A reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning, characterized by including:

Obtain real-time reservoir water level data, meteorological data, and radar image data of regional cloud cover and surface features and fuse the three;

The fused data is input into the preset adaptive dynamic Transformer network model to obtain the prediction output, which includes the reservoir water level series, the inflow flow series and the outflow flow series.

Construct a reinforcement learning state space, which includes the predicted reservoir water level sequence, inflow sequence, outflow sequence, and the latest real-time meteorological data sequence at the future preset time step of the Transformer network model;

The reinforcement learning state space is input into a pre-set reinforcement learning network. The action space of the reinforcement learning network includes flood discharge, power generation flow, and ecological flow. The optimization goal of the reinforcement learning network is to maximize the cumulative discounted reward. The reward function is a weighted function of the flood control reward item, the power generation reward item, and the ecological reward item. Through continuous iterative optimization, the reinforcement learning network eventually converges and outputs the optimal combination of flood discharge, power generation flow, and ecological flow.

When the error between the predicted output of the adaptive dynamic Transformer network model and the actual observation result exceeds a preset upper threshold, the adaptive dynamic Transformer network model automatically adds a preset number of coding layers. When the result error is lower than the preset lower threshold, the adaptive dynamic Transformer network model automatically reduces the preset number of coding layers. When the error between the predicted output of the adaptive dynamic Transformer network model and the actual observation result is between the lower threshold and the upper threshold, the adaptive dynamic Transformer network model remains unchanged.

The result error is the weighted sum of the mean square error of the reservoir water level between the predicted value and the actual value, the mean square error of the inflow between the predicted value and the actual value, and the mean square error of the outflow between the predicted value and the actual value.

Adaptive dynamic Transformer network model adaptive dynamic adjustment according to the result error of real-time calculation Automatically adjust the number of Transformer network layers, where the result error is calculated as follows:

,

in, Refers to the mean square error between the predicted value and the true value output by the network, weight 、、 Determined by offline cross-validation, H represents the water level, Indicates the inflow flow, represents the discharge flow, and t represents the time;

Taking the latest W prediction cycles as the sliding window, first calculate the comprehensive error sequence The sliding mean of and standard deviation , the standard deviation takes the form of a weighted covariance:

,

in The indicator weight vector determined for offline cross-validation, is the covariance matrix of the three types of RMSE within the window, and then the threshold is dynamically constructed:

;

The relaxation coefficient Through Bayesian optimization, it is determined offline on the historical data set. When the overall result error gradually decreases, Then it shrinks, On the contrary, when the result error increases sharply, It is dynamically lifted to avoid misjudgment, and once the real-time result error Beyond the new upper threshold, the layer-increasing operation will still be triggered. When the error between the predicted result and the actual observation result exceeds the upper threshold, When the error of the result is lower than the lower threshold, the number of network layers increases by 1, and the maximum increase does not exceed 3; When , the number of network layers is reduced by 1, and the specific adjustment rules are as follows:

;

Based on the optimal combination of flood discharge, power generation flow and ecological flow, the opening of each flood discharge gate of the reservoir and the output plan of the generator set are determined, and specific execution instructions are formed, including:

The flood discharge gate opening is calculated in real time based on the output flood discharge volume and the gate flow-opening relationship formula;

Determine the number of units started and stopped and the load distribution of each unit in real time based on the output power flow;

Construct execution instructions based on the flood gate opening, the number of units started and stopped, and the load distribution of each unit;

Determine whether the total discharge of "flood discharge + power generation" is not lower than the ecological flow. If so, it is considered that the ecological water demand has been included and no additional scheduling is required. If not, water is replenished through the ecological special sluice or low-load unit according to the difference. The difference is the difference between the ecological flow and the total discharge of "flood discharge + power generation".

2. The reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning according to claim 1 is characterized in that the real-time water level data and meteorological data are preprocessed before data fusion, and the preprocessing step includes:

Using the Kriging interpolation algorithm based on spatial distance weights, real-time water level data, meteorological data, and radar image data of regional cloud cover and surface features are spatially interpolated to obtain continuous data covering the entire reservoir area.

Perform outlier detection on continuous data and remove outliers in real time;

The data after removing outliers is smoothed and denoised.

3. The reservoir scheduling method based on deep learning adaptive dynamic network and reinforcement learning according to claim 1 is characterized in that the adaptive dynamic Transformer network model outputs the weights of flood control reward items, power generation reward items and ecological reward items in the reward function based on the dynamic attention mechanism each time. .

4. The reservoir scheduling method based on deep learning adaptive dynamic network and reinforcement learning according to claim 3 is characterized in that the weights of flood control reward items, power generation reward items and ecological reward items are used to determine the optimal allocation of resources. Update target weight , the update method is:

;

The dynamic attention mechanism uses a target weight vector with trainable parameters and calculates the attention weight of each target in real time using the Softmax function.

5. The reservoir scheduling method based on deep learning adaptive dynamic network and reinforcement learning according to claim 4 is characterized in that the formula of the reward function is as follows:

;

_Rt is the cumulative discount reward, αflood _control is the target weight coefficient of the flood control reward item, αpower _generation is the target weight coefficient of the power generation reward item, and _αecology is the target weight coefficient of the ecological reward item. To predict water levels, To prevent floods, water levels are limited. is the economic benefit coefficient of power generation per unit flow, Qpower _,t is the power generation flow, _Qecology,t is the current ecological flow, _{Qecology,ideal} is the ideal ecological flow, Rflood _control is the flood control reward item, Rpower _generation is the power generation reward item, and _Recology is the ecological reward item.

6. The reservoir scheduling method based on deep learning adaptive dynamic network and reinforcement learning according to claim 1 is characterized in that a long-term running root mean square error historical database is constructed, and statistical analysis of the historical trends of result error and decision error is performed regularly, and the preset error threshold is dynamically adjusted according to the historical error trend.

7. The reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning according to claim 1 is characterized in that obtaining real-time reservoir water level data, meteorological data, and radar image data of regional cloud cover and surface features includes:

Acquiring real-time water level data monitored by wave-type water level sensors deployed in key sections of the reservoir area and main inflow rivers;

Obtain meteorological data from the meteorological data acquisition system, which is deployed in the reservoir area and includes precipitation sensors, wind speed sensors, and temperature and humidity sensors;

Obtain radar image data of regional cloud and surface features. Use a high-resolution radar remote sensing satellite with a spatial resolution of 30m to conduct satellite remote sensing observations of the reservoir and the upstream area of the basin to obtain radar image data of regional cloud and surface features.