CN120338210B - Reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning - Google Patents
Reservoir operation method based on deep learning adaptive dynamic network and reinforcement learningInfo
- Publication number
- CN120338210B CN120338210B CN202510819807.6A CN202510819807A CN120338210B CN 120338210 B CN120338210 B CN 120338210B CN 202510819807 A CN202510819807 A CN 202510819807A CN 120338210 B CN120338210 B CN 120338210B
- Authority
- CN
- China
- Prior art keywords
- reservoir
- reinforcement learning
- flow
- data
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06312—Adjustment or analysis of established resource schedule, e.g. resource or task levelling, or dynamic rescheduling
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A10/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
- Y02A10/40—Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Operations Research (AREA)
- General Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Water Supply & Treatment (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a reservoir dispatching method based on a deep learning self-adaptive dynamic network and reinforcement learning, which relates to the technical field of reservoir dispatching and comprises the steps of acquiring real-time water level data, meteorological data, regional cloud layer and radar image data of surface features of a reservoir, carrying out data fusion on the three data, inputting the fused data into a preset self-adaptive dynamic transducer network model to obtain predicted output, wherein the predicted output is a reservoir water level sequence, a warehouse-in flow sequence and a lower discharge flow sequence, constructing a reinforcement learning state space, inputting the reinforcement learning state space into a preset reinforcement learning network, wherein the action space of the reinforcement learning network comprises flood discharge quantity, power generation flow and ecological flow, and the reinforcement learning network is optimized to maximize accumulated discount rewards, and the rewards function is a weighting function of flood control rewards, power generation rewards and ecological rewards. The invention improves the collaborative management efficiency of a plurality of targets such as reservoir flood control safety, power generation benefit, ecological protection and the like.
Description
Technical Field
The invention relates to the technical field of reservoir dispatching, in particular to a reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning.
Background
With the aggravation of global climate change, extreme weather events frequently occur, and the traditional reservoir scheduling method is difficult to adapt to complex and changeable hydrological meteorological environments. The conventional scheduling mode is often based on fixed rules or offline history experience, lacks effective utilization of real-time weather and water level data, faces sudden heavy rain or drought and waterlogging abrupt transition, and is easy to generate decision delay and water level control errors, so that flood control risks are increased, power generation efficiency is reduced, and stability of a downstream ecological system is damaged. The reservoir scheduling method of the deep learning network structure which is currently applied is usually of a static fixed structure, lacks real-time self-adaptive capacity, cannot adjust the network structure in time along with the dynamic change of hydrologic prediction requirements, and causes unstable prediction precision and insufficient model generalization capability.
Reservoir dispatching methods based on deep learning self-adaptive dynamic network and reinforcement learning are developed to solve the problems.
Disclosure of Invention
The invention provides a reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning, which aims to solve the problems of unstable prediction precision and insufficient generalization capability of the existing reservoir dispatching method.
The invention realizes the above purpose through the following technical scheme:
the reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning comprises the following steps:
acquiring real-time water level data, meteorological data, regional cloud layer and radar image data of ground surface characteristics of a reservoir, and carrying out data fusion on the three data;
Inputting the fused data into a preset self-adaptive dynamic transducer network model to obtain predicted output, wherein the predicted output comprises a reservoir water level sequence, a warehouse-in flow sequence and a lower drainage flow sequence;
Constructing a reinforcement learning state space, wherein the reinforcement learning state space comprises a predicted reservoir water level sequence, a warehouse-in flow sequence, a downward leakage flow sequence and a latest real-time meteorological data sequence of a preset time step in the future of a transducer network model;
Inputting the reinforcement learning state space into a preset reinforcement learning network, wherein the action space of the reinforcement learning network comprises flood discharge amount, power generation flow and ecological flow, the optimization target of the reinforcement learning network is to maximize accumulated discount rewards, the rewarding function is a weighting function of flood control rewarding items, power generation rewarding items and ecological rewarding items, and the reinforcement learning network finally converges and outputs an optimal combination scheme of flood discharge amount, power generation flow and ecological flow through continuous iterative optimization.
Further, the preprocessing of the real-time water level data and the meteorological data is performed before the data fusion, and the preprocessing step comprises the following steps:
further, performing spatial interpolation processing on the real-time water level data, the meteorological data, the regional cloud layer and the radar image data of the ground surface characteristics by adopting a Kriging interpolation algorithm based on the spatial distance weight to obtain continuity data covering the whole reservoir region;
detecting abnormal values of the continuous data, and eliminating the abnormal values in real time;
and carrying out smooth denoising treatment on the data with the outliers removed.
Further, when the result error of each prediction output and actual observation of the adaptive dynamic transducer network model exceeds a preset upper limit threshold, a preset number of coding layers are automatically added to the adaptive dynamic transducer network model, when the result error is lower than a preset lower limit threshold, a preset number of coding layers are automatically reduced to the adaptive dynamic transducer network model, and when the result error of each prediction output and actual observation of the adaptive dynamic transducer network model is between the lower limit threshold and the upper limit threshold, the adaptive dynamic transducer network model is unchanged.
Further, the result error is a weighted sum of a reservoir water level mean square error of a predicted value and a true value of the reservoir water level, a reservoir flow mean square error of a predicted value and a true value of the reservoir flow, a predicted value of the drain flow and a drain flow mean square error of the true value.
Further, the adaptive dynamic transducer network model outputs weights of flood control rewarding items, power generation rewarding items and ecological rewarding items in the rewarding function each time based on a dynamic attention mechanism。
Further, according to the weights of flood control rewarding items, electricity generation rewarding items and ecological rewarding itemsUpdating target weightsThe updating mode is as follows:
;
;
;
After the dynamic attention weight is adjusted in real time, the dynamic attention weight is used as a target weighting coefficient in the next reinforcement learning decision, so that the system can be ensured to accurately adapt to the real-time requirements of different scheduling targets according to the real-time state;
The dynamic attention mechanism adopts a target weight vector of a trainable parameter, and calculates the attention weight of each target in real time by a Softmax function.
Further, R t is a cumulative discount prize, alpha Flood control is a target weight coefficient for a flood control prize, alpha Generating electricity is a target weight coefficient for a power generation prize, alpha Ecological system is a target weight coefficient for a ecological prize,In order to predict the water level,In order to prevent flood and limit the water level,The method is characterized in that the method comprises the steps of generating economic benefit coefficient of unit flow, wherein Q Generating electricity ,t is generating flow, Q Ecological system ,t is current ecological flow, Q Ecological system , Ideal for is ideal ecological flow, R Flood control is flood control rewarding item, R Generating electricity is generating rewarding item, R Ecological system is ecological rewarding item and t is time step.
Further, determining the opening degree of each flood discharge gate of the reservoir and the power output scheme of the generator set according to the output optimal combination scheme of the flood discharge amount, the power generation flow and the ecological flow, and forming a specific execution instruction, wherein the specific execution instruction comprises the following steps:
calculating in real time according to the output flood discharge amount and a gate flow-opening relation formula to obtain the opening of the flood discharge gate;
Determining the start-stop number of the units and the load distribution of each unit in real time according to the output power generation flow;
constructing an execution instruction according to the opening degree of the flood discharge gate, the start-stop number of the units and the load distribution of each unit;
Judging whether the total discharging amount of the flood discharging and power generating is not lower than the ecological flow, if yes, considering that the ecological water is contained without additional scheduling, otherwise, supplementing water through an ecological special gate hole or a low-load unit according to the difference, wherein the difference is the difference between the ecological flow and the total discharging amount of the flood discharging and power generating.
Further, a long-term running root mean square error historical database is constructed, statistical analysis is carried out on historical trends of result errors and decision errors at regular intervals, and a preset error threshold value is dynamically adjusted according to the historical error trends.
Further, acquiring real-time water level data, meteorological data, regional cloud cover and radar image data of surface features of the reservoir, including:
Acquiring real-time water level data monitored by a wave-type water level sensor, wherein the wave-type water level sensor is deployed in a reservoir area and a key section of a main warehouse-in river channel;
Acquiring meteorological data of a meteorological data acquisition system, wherein the meteorological data acquisition system is deployed in a water reservoir area and comprises a precipitation sensor, a wind speed sensor and a temperature and humidity sensor;
and acquiring radar image data of regional cloud and earth surface features, and performing satellite remote sensing observation on the upstream regions of reservoirs and watercourses by using a high-resolution radar remote sensing satellite with the spatial resolution of 30m to acquire the radar image data of the regional cloud and earth surface features.
The invention has the beneficial effects that:
Compared with the prior art, the reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning provided by the invention has the advantages that the cooperative management efficiency of a plurality of targets such as reservoir flood control safety, power generation benefit and ecological protection is comprehensively improved, the stability of prediction precision is improved, and the model flooding capability is improved.
Drawings
FIG. 1 is a flowchart of a reservoir dispatching method based on deep learning adaptive dynamic network and reinforcement learning in an embodiment.
Fig. 2 is a schematic diagram of an adaptive dynamic transducer network structure in step S2 in the embodiment.
FIG. 3 is a flowchart of a reinforcement learning multi-objective collaborative optimization module in step S3 in an embodiment.
Fig. 4 is a functional framework diagram of the intelligent co-scheduling platform in step S6 in the embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Furthermore, the terms "first," "second," and the like, are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
The following describes specific embodiments of the present invention in detail with reference to the drawings.
As shown in fig. 1, the specific implementation manner of the reservoir dispatching method flow chart based on the deep learning self-adaptive dynamic network and the reinforcement learning of the invention is as follows:
and S1, real-time data fusion and preprocessing.
In practical implementation of the invention, 8 ultrasonic water level sensors (model: MB 7389) are deployed on the reservoir area and key sections of the main warehouse river channel, the water level measurement precision is +/-2 cm, real-time water level data are acquired every 1 minute, and the data are uploaded to a data fusion center in real time through a LoRa wireless transmission protocol. Meanwhile, a meteorological data acquisition system consisting of meteorological stations is installed in a warehouse area and specifically comprises a precipitation sensor (model: TB4, precision + -0.5 mm), a wind speed sensor (model: windSonic, precision + -0.2 m/s) and a temperature and humidity sensor (model: HMP155, temperature precision + -0.5 ℃ and humidity precision + -3%), wherein meteorological data acquisition is carried out every 5 minutes. In addition, a high-resolution radar remote sensing satellite with the spatial resolution of 30m is used, satellite remote sensing observation is carried out on the reservoir and the upstream area of the river basin once per hour, and radar image data of regional cloud layers and surface features are obtained, wherein the data format is GeoTIFF. The data fusion module firstly carries out spatial interpolation processing on a plurality of water level measuring points and meteorological sensor data by adopting a Kriging interpolation algorithm based on spatial distance weight, and obtains continuity data covering the whole reservoir area. The abnormal value detection after data fusion is realized based on a Z-Score method, regional cloud layers are combined with meteorological data, sudden heavy rain and other conditions are predicted in advance, and the earth surface features are used for distinguishing different runoff producing effects under the same rainfall so as to schedule more accurately.
The specific formula is as follows:
Wherein, the Representing the real-time measurement value,Representing the mean of the past 24 hours of historical data,Represents standard deviation, Z represents standard fraction. When (when)And when the system automatically marks the data as an abnormal value and eliminates the abnormal value in real time. Finally, after the smoothing denoising treatment is carried out by a Savitzky-Golay smoothing filter, the data is standardized to be processed into a sequence form, and the sequence form is periodically updated to the input of a subsequent network model by taking 10 minutes as a unit.
And S2, adaptive dynamic network accurate prediction based on a transducer structure.
The method adopts a Transformer network structure, takes the standardized data sequence processed in the step 1 as the input of the network, realizes short-time (30 minutes in future) accurate prediction of reservoir water level, warehouse-in flow and lower discharge flow, and comprises the following specific embodiments:
Step 2.1, as shown in fig. 2, the initial structure of the transducer network is built up by 4 standard transducer encoder layers, each comprising a multi-headed self-attention sub-Layer and a feed-forward neural network sub-Layer, and Layer Norm Layer and residual connection. Each self-attention sub-layer contains 8 attention heads, each head having dimensions of 64, so the output dimension of each self-attention sub-layer is 512 dimensions, calculated as follows:
the calculation mode of the single attention head is as follows:
Wherein, the Respectively representing a query matrix, a key matrix and a value matrix, and the dimensions are all. The output of the multi-head attention is:
Wherein, the ,Representing a linear transformation parameter matrix.
The feedforward neural network is composed of two linear transformation layers, the structure is 512-2048-512-D, and the formula is as follows:
The initial input data dimension of the network is standardized data of 120 minutes in the past (12 time steps are included in each time step, the characteristic data of water level, precipitation amount, wind speed, temperature and humidity and the like are included in each time step, the total input dimension is 128 dimensions), the water level, warehouse-in flow and downward discharge flow of 3 time steps (30 minutes) in the future are output to be predicted (the output dimension is 3 dimensions), and meanwhile, the weights of flood control rewarding items, power generation rewarding items and ecological rewarding items in the rewarding function are output (the output dimension is 3 dimensions). The training process adopts an Adam optimizer, the initial learning rate is 0.001, and the loss function adopts a Mean Square Error (MSE):
Wherein, the The actual observed value is represented by a set of values,Representing the predicted value of the network output,Is the number of samples in a batch. In the training process, branches outputting weights of flood control rewarding items, power generation rewarding items and ecological rewarding items are not involved in training, namely gradients of the branches are cut off.
Step 2.2, self-adaptive dynamic adjustment of network structure, in which the network dynamic adjustment module calculates the result error according to real timeLine automatic adjustment of the number of layers of a transducer network, wherein the resulting errorThe calculation mode of (2) is as follows:
Wherein, the Mean square error and weight of predicted value and true value output by network、、By means of an off-line cross-validation determination,The water level is indicated and the water level,The flow rate of the warehouse entry is represented,The leakage flow is represented, t represents time, and t represents time;
taking the latest W prediction loops as a sliding window, firstly calculating a comprehensive error sequence Is the sliding average value of (2)Standard deviationThe standard deviation takes the form of a weighted covariance:
Wherein the method comprises the steps of The index weight vector determined for offline cross-validation,For covariance matrices of three classes of RMSE within a window, then dynamically constructing thresholds:
;
;
Wherein the coefficient is relaxed Offline determination on a historical dataset by bayesian optimization, as the overall result error gradually decreases,With a consequent reduction in the size of the film,Correspondingly down-regulating, otherwise, when the result error increases sharply,Is dynamically lifted to avoid false positives, and once the result is in real time, the errorExceeding the new upper threshold, the layer adding operation is still triggered, when the result error of each predicted result and the actual observed result exceeds the upper thresholdWhen the number of network layers is increased by 1 and the maximum increase is not more than 3, when the result error is lower than the lower thresholdWhen the network layer number is reduced by 1, the specific regulation rule is as follows:
;
the dynamic adjustment mechanism of the network layer number can effectively balance the calculation load and the prediction precision, and the real-time property and the stability of the prediction are maintained.
And 2.3, implementing a dynamic attention mechanism, wherein the dynamic attention mechanism is designed for adaptively adjusting the predicted weights of three different targets of flood control, power generation and ecology. According to the weights of flood control rewarding items, electricity generation rewarding items and ecological rewarding itemsUpdating target weightsThe updating mode is as follows:
;
;
;
After the dynamic attention weight is adjusted in real time, the dynamic attention weight is used as a target weighting coefficient in the next reinforcement learning decision, so that the system can be ensured to accurately adapt to the real-time requirements of different scheduling targets according to the real-time state;
The dynamic attention mechanism adopts a target weight vector of a trainable parameter, and calculates the attention weight of each target in real time by a Softmax function.
After the dynamic attention weight is adjusted in real time, the dynamic attention weight is used as a target weighting coefficient in the next reinforcement learning decision, so that the system can be ensured to accurately adapt to the real-time requirements of different scheduling targets according to the real-time state.
The dynamic attention mechanism adopts a target weight vector of a trainable parameter, and calculates the attention weight of each target in real time by a Softmax function.
And 3, implementing the reinforcement learning multi-objective collaborative optimization module.
And 3.1, defining a multi-objective optimized reinforcement learning state space, wherein the state space comprises meteorological data, a predicted water level sequence output by a self-adaptive network, a warehouse-in flow sequence and a downward leakage flow sequence. As shown in fig. 3, a state space for reinforcement learning is constructed. And (3) defining a state vector of the reinforcement learning module as S t based on the prediction result output by the dynamic transducer network in the step (2). Specifically, the state vector contains the future 3 time-step predicted water level sequences output by the transducer networkWarehouse-in flow prediction sequenceAnd a down-flow prediction sequence Q out,t=[Qout,t+1,Qout,t+2,Qout,t+3 and a latest real-time meteorological data sequence M t, wherein the meteorological data comprise precipitation, wind speed and temperature and humidity. After the state data are spliced, the dimension of the state space is controlled to be between 100 and 500, and the state space is input into the reinforcement learning network after standardized processing. Warehouse entry flow sequence, downward leakage flow sequence
And 3.2, defining a reinforcement learning action space. The motion vector a t contains three variables of flood discharge, power generation flow and ecological flow, and is specifically set as follows:
Flood discharge flow regulation range is 50 to 5000 m3/s, and step length is 50m 3/s;
The power generation flow adjustment range is 100 to 2000 m3/s, and the step length is 20m 3/s;
the ecological flow regulating range is 10 to 500 m3/s, and the step length is 10 m3/s;
The discrete combination of the action space is generated by adopting a grid search mode, and the state vector is mapped to the action space through a three-layer fully connected network so as to strengthen the learning model to perform efficient action selection.
And 3.3, strengthening the concrete implementation of the learning network. The reinforcement learning network adopts a deep Q network, and the network structure is specifically as follows:
An input layer for inputting a state vector S t;
the hidden layer is composed of a 4-layer convolution network and a 2-layer full-connection network, wherein the convolution network core size is [8×8, 4×4,3×3] in sequence, the convolution step length is [4, 2,1, 1], and the output feature dimensions are 128, 64 and 64 respectively;
and an output layer, namely the Q value of each action combination.
Optimization of reinforcement learning networks is aimed at maximizing cumulative discount rewardsThe reward function is defined as the weighted result of three targets (flood control, power generation, ecology), and the specific formula is:
;
wherein, the rewarding weight alpha Flood control 、α Generating electricity 、α Ecological system comes from the calculation result of the dynamic attention mechanism in the step 2.3, and is dynamically updated in real time. Specifically, each target prize is defined as follows:
The flood control rewarding item R Flood control gives negative feedback according to the predicted water level exceeding the flood control limit water level, and is defined as: . Wherein, the In order to predict the water level,Is the flood control limit water level.
The electricity generation rewarding item R Generating electricity is calculated according to the economic benefit generated by the current electricity generation flow, and is defined as: . Wherein k Generating electricity is a unit flow power generation economic benefit coefficient, and Q Generating electricity ,t is power generation flow;
The ecological rewarding item R Ecological system is calculated according to the degree that the downstream ecological flow deviates from the ideal ecological flow: . Wherein Q Ecological system ,t is the current ecological flow rate, and Q Ecological system , Ideal for is the ideal ecological flow rate.
In the training process, parameters of the dynamic transducer network are frozen, only a layer outputting the weight is activated, and training is carried out in combination with the reinforcement learning network.
And finally converging the reinforcement learning network to the optimal strategy of multi-objective collaborative optimization through continuous iterative optimization, and outputting an optimal combination scheme of each flow.
The network updating learning rate of the reinforcement learning network is between 0.001 and 0.005, the dynamic range of the weight coefficient of each target in the reward function is 0.4-0.8 for flood control, 0.1-0.5 for power generation and 0.1-0.3 for ecology, and the target weight is updated once every 30 minutes.
And 4, implementing a real-time reservoir dispatching decision module.
In the embodiment, the real-time reservoir dispatching decision module determines the opening degree of each flood discharge gate of the reservoir and the power scheme of the generator set in real time through the optimal strategy result output by the reinforcement learning multi-objective collaborative optimization module, and forms a specific execution instruction. Specifically, the floodgate opening control commandFlood discharge flow output by reinforcement learning moduleThe valve flow-opening degree relation formula is obtained through real-time calculation:
Wherein, the Is an empirical relationship function between the flood discharge amount and the opening degree obtained by actual measurement according to the hydraulic characteristics of the gate.
The output force adjusting instruction of the generator set is based on the power generation flow output by reinforcement learningThe method comprises the steps of determining the start-stop number of units and the load distribution of each unit in real time:
Wherein, the Is the power generated by the ith unit,Is the water density of the water, the water is in a water-tight state,The gravity acceleration, the H water head,For a real-time head of water,For the efficiency of the machine set,For starting the number of units.
On the basis, checking whether the total drainage quantity of Q Flood discharge ,t+Q Generating electricity ,t is not lower than Q Ecological system ,t, if so, considering that ecological water is contained without additional scheduling, and if not, supplementing water according to the difference Q supplementing ecology ,t=Q Ecological system ,t-(Q Flood discharge ,t+Q Generating electricity ,t) through an ecological special gate hole or a low-load unit to ensure that the downstream ecological flow reaches the standard so as to supplement the ecological flow.
Under extreme rainfall conditions (such as the rainfall intensity of a reservoir area exceeds 30 mm/h), the system automatically promotes the flood control target weight alpha Flood control to 0.85, the reinforcement learning network outputs a high-intensity flood discharge strategy in real time, and a storm event is responded quickly, so that the water level of the reservoir area is always strictly controlled below the flood control limit water level, and the flood risk is avoided.
And under the extreme rainfall condition, the system automatically increases the flood control target weight to between 0.7 and 0.9, adjusts the flood discharge amount at the fastest speed and ensures that the water level of the reservoir area is always controlled below the flood control limit water level.
And 5, implementing a rolling optimization and feedback updating mechanism.
Step 5.1, real-time evaluation of result errorAnd dynamic network updates. The system automatically calculates the result error once per hourCalculating an upper limit threshold and a lower limit threshold according to the step S2.2, and triggering the updating process of the transducer self-adaptive dynamic network structure and parameters according to the adjustment rule so as to reduce the error of the subsequent result. The network update is realized through a gradient descent algorithm, the learning rate is 0.001, and the real-time prediction work is immediately put into again after the parameter adjustment is completed.
And 5.2, the reinforcement learning strategy is finely tuned regularly. And automatically counting multi-objective optimization results of reinforcement learning decisions by the system at the end of each day, wherein the multi-objective optimization results comprise flood control risk reduction conditions, power generation income conditions and ecological flow guarantee conditions, and comparing and analyzing with the previous cycle targets. And (3) performing parameter fine adjustment of the reinforcement learning network according to summarized data every week, wherein an experience playback strategy is adopted in the fine adjustment process, the state-action-rewarding data of the last week are stored, and parameters of the reinforcement learning network are updated by a method of randomly extracting samples in batches, so that continuous optimization and generalization performance of the network are ensured.
And 5.3, establishing a long-term error database and periodically optimizing. The system establishes a long-term operation error historical database, and results errors are regularly obtained every monthAnd carrying out statistical analysis on the historical trend of the decision error. Dynamically adjusting a prediction error threshold (initial 0.05m, adjusting a range of + -0.01 m according to trend) and strengthening learning target weight according to historical error trend、、The dynamic adjustment range of the device is controlled within +/-10% to ensure stability and self-adaptation capability in long-term operation.
And 6, deploying and implementing the intelligent collaborative scheduling platform.
As shown in fig. 4, the implementation of the intelligent collaborative scheduling platform is realized through an integrated intelligent management platform deployed on a high-performance server, and the platform has functional modules of real-time data monitoring, automatic execution of prediction decision, real-time alarm, historical data analysis and the like. The platform adopts a distributed architecture, is deployed on an Intel Xeon high-performance server (CPU 32 core, memory 128 GB) and has disaster recovery backup and remote Web access interfaces. The real-time data monitoring module receives and displays sensing data such as water level, weather and the like in real time, the self-adaptive network prediction module updates a prediction result every 10 minutes, the reinforcement learning decision module outputs an optimal scheduling instruction in real time and automatically transmits the optimal scheduling instruction to the on-site execution equipment, and the automatic scheduling execution module completes real-time execution of gate opening and power generation load. The response time of the system alarm mechanism is controlled within 30 seconds, so that the system alarm mechanism can automatically detect events such as water level overrun, extreme weather abnormality and the like, trigger real-time alarm and send the alarm to management personnel through short messages and mails. The platform visual interface comprises a real-time reservoir state (such as a water level real-time curve and a gate opening indication diagram), a meteorological trend diagram and a multi-objective optimization decision chart, supports quick query and analysis of historical data, and provides comprehensive technical support for reservoir comprehensive management in a chart and report mode.
And 7, monitoring the long-term running state of the system and performing performance evaluation.
In order to ensure the long-term stability and the self-adaptability of the system, a perfect long-term running state monitoring and performance evaluation mechanism is established, and the comprehensive performance of the system is evaluated regularly. Generating a comprehensive system operation evaluation report once in a quarter, wherein the evaluation indexes comprise:
Flood control risk reduction rate:
;
the power generation benefit improvement rate:
;
Ecological flow guarantee rate:
;
The evaluation error of each index is controlled within +/-5 percent. According to the quarter evaluation result, if any index has a significant decreasing trend (the decreasing amplitude exceeds 10%), the updating and structure adjusting processes of the self-adaptive dynamic network structure and the reinforcement learning network parameters are automatically triggered to restore the system performance. In addition, a system operation log and abnormal event recording mechanism is established, and changes of the scheduling strategy, prediction of abnormal conditions, parameter adjustment records and the like are stored and managed for a long time to form a complete operation database for annual technology upgrading and system maintenance decision.
The invention comprehensively improves the collaborative management efficiency of a plurality of targets such as reservoir flood control safety, power generation benefit, ecological protection and the like through a real-time data fusion preprocessing technology, a self-adaptive dynamic network structure design and a multi-target reinforcement learning decision mechanism. Especially in extreme weather conditions, the invention can quickly respond and dynamically adjust the scheduling target weight, and effectively reduce decision delay and risk in the traditional reservoir scheduling mode. Through a dynamic network structure and a rolling updating and feedback optimization mechanism for reinforcement learning decision, the invention can keep high self-adaptability and stability for a long time, and meets the strict requirements of real-time and accuracy of actual reservoir management. The method can be widely applied to intelligent and refined management of large and medium-sized reservoirs, and the reservoir scheduling decision level and the safe operation guarantee capability are obviously improved.
The invention provides a reservoir dispatching method based on deep learning self-adaptive dynamic network and reinforcement learning, which effectively improves the accuracy and the integrity of reservoir real-time state data by carrying out real-time fusion and accurate pretreatment on multi-source data through a real-time water level sensor, a meteorological data acquisition system and high-resolution satellite remote sensing equipment. By designing the adaptive dynamic network and the dynamic attention mechanism based on the transducer structure, the invention realizes short-time accurate prediction of reservoir water level, warehouse-in flow and discharging flow, and can automatically and dynamically adjust the network structure according to the result error, thereby ensuring good balance between prediction precision and calculation efficiency. Meanwhile, the invention combines a deep reinforcement learning method, constructs a multi-target collaborative optimization decision module, outputs an optimal combination strategy of flood discharge, power generation and ecological flow in real time based on a dynamic prediction result, and realizes the real-time dynamic intelligent collaborative regulation and control of multiple targets of the reservoir. In a real-time scheduling execution link, the method can quickly convert the optimization strategy into accurate execution instructions of gate opening and generator set output, quickly respond under extreme weather conditions, and ensure flood control safety.
The invention provides a complete and innovative rolling optimization feedback updating mechanism, which automatically triggers network parameter fine adjustment and strategy updating through real-time evaluation of result errors and reinforcement of learning decision effects, and effectively improves the stability and adaptability of long-term operation. The deployment of the intelligent collaborative scheduling platform realizes the real-time monitoring of data, the automatic execution of predictive decision-making and the rapid alarm of abnormal events, and effectively improves the automation, refinement and intellectualization level of reservoir management. In addition, the invention further establishes a monitoring and evaluating mechanism for the long-term running state and performance of the system, can evaluate the realization conditions of flood control safety, power generation benefit and ecological protection targets regularly, and adaptively optimizes the network structure and parameters according to the evaluation result, so that the long-term running performance of the system is kept in an optimal state. Experimental verification in an actual reservoir environment shows that the implementation of the method can obviously reduce the flood control risk of the reservoir, improve the economic benefit of power generation and effectively ensure ecological flow, has the outstanding advantages of rapid real-time response, strong multi-objective collaborative optimization capability and high long-term operation stability, and can effectively meet the management requirements of a modern reservoir in complex weather and hydrologic environments.
The reservoir dispatching method based on the deep learning self-adaptive dynamic network solves the problem.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the scope of the invention.
Claims (7)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510819807.6A CN120338210B (en) | 2025-06-19 | 2025-06-19 | Reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510819807.6A CN120338210B (en) | 2025-06-19 | 2025-06-19 | Reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN120338210A CN120338210A (en) | 2025-07-18 |
| CN120338210B true CN120338210B (en) | 2025-10-21 |
Family
ID=96370187
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510819807.6A Active CN120338210B (en) | 2025-06-19 | 2025-06-19 | Reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120338210B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120745946B (en) * | 2025-08-27 | 2025-11-18 | 水利部珠江水利委员会珠江水利综合技术中心 | Dynamic multi-objective optimization control system and method for reservoir flood period water level |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115952958A (en) * | 2023-03-14 | 2023-04-11 | 珠江水利委员会珠江水利科学研究院 | Reservoir Group Joint Optimal Dispatch Method Based on MADDPG Reinforcement Learning |
| CN117575873A (en) * | 2024-01-15 | 2024-02-20 | 安徽大学 | Flood warning method and system integrating meteorological and hydrological sensitivity |
| CN119721368A (en) * | 2024-12-13 | 2025-03-28 | 河海大学 | A method and system for predicting power generation capacity of a hydropower station |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111369181B (en) * | 2020-06-01 | 2020-09-29 | 北京全路通信信号研究设计院集团有限公司 | Train autonomous scheduling deep reinforcement learning method and device |
| US20250075602A1 (en) * | 2023-08-30 | 2025-03-06 | Saudi Arabian Oil Company | Predicting gas lift equipment failure with deep learning techniques |
-
2025
- 2025-06-19 CN CN202510819807.6A patent/CN120338210B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115952958A (en) * | 2023-03-14 | 2023-04-11 | 珠江水利委员会珠江水利科学研究院 | Reservoir Group Joint Optimal Dispatch Method Based on MADDPG Reinforcement Learning |
| CN117575873A (en) * | 2024-01-15 | 2024-02-20 | 安徽大学 | Flood warning method and system integrating meteorological and hydrological sensitivity |
| CN119721368A (en) * | 2024-12-13 | 2025-03-28 | 河海大学 | A method and system for predicting power generation capacity of a hydropower station |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120338210A (en) | 2025-07-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113222296B (en) | Flood control scheduling method based on digital twinning | |
| CN119401452B (en) | Power load prediction method, system equipment and medium based on multi-source heterogeneous data feature fusion | |
| JP4807565B2 (en) | Flow prediction device | |
| CN114611778B (en) | Reservoir water level early warning method and system based on warehousing flow | |
| CN115343784A (en) | Local air temperature prediction method based on seq2seq-attention model | |
| CN120338210B (en) | Reservoir operation method based on deep learning adaptive dynamic network and reinforcement learning | |
| CN115271186B (en) | A reservoir water level prediction and early warning method based on delay factor and PSO RNN Attention model | |
| CN120106490A (en) | Cascade power station dispatching parameter access and influencing factor prediction method | |
| CN120067606A (en) | Intelligent water conservancy dynamic monitoring and early warning method based on deep learning | |
| CN119168115A (en) | A dam seepage prediction method based on multi-scale convolutional neural network and bidirectional long short-term memory neural network | |
| Liu et al. | Lstm-based hazard source detection and risk assessment model for the shandong yellow river basin | |
| CN119829912A (en) | Marine environment forecasting method integrating daily climate state and machine learning model | |
| CN119067269B (en) | A method and system for correcting wind speed prediction in an integrated wind farm | |
| CN115186879A (en) | Water level prediction method based on deep learning meshed watershed inundation model | |
| CN120726779A (en) | A method and system for predicting and alarming water level in an expansion tank of a hydropower station | |
| CN119811044A (en) | An intelligent monitoring and early warning method and system for small hydropower stations in mountainous areas | |
| CN120726765B (en) | A method and device for coordinated flood warning issuance between upstream and downstream areas | |
| CN118759603B (en) | A retractable weather station with protection function | |
| CN120652826B (en) | Hydraulic engineering water delivery quantity adjusting method and system based on artificial intelligence | |
| CN119168322B (en) | A method for early warning analysis of full-pipe operation of sewage pipe network | |
| CN121209593A (en) | A Reservoir Flow Control Method and System Based on Artificial Intelligence | |
| CN121146392A (en) | Cascade reservoir flood limit water level joint application scheduling method and system | |
| CN119721681A (en) | A dam safety state memory maintenance method and system for digital twin water conservancy large model | |
| CN120997018A (en) | Flood disaster forecasting method based on spatiotemporal feature fusion | |
| CN121456786A (en) | Wind power ultra-short-term prediction method based on transform-LSTM fusion model |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |