CN118157143A

CN118157143A - GA-MADRL-PPO combination-based distributed photovoltaic optimal scheduling strategy method, device and system

Info

Publication number: CN118157143A
Application number: CN202410315490.8A
Authority: CN
Inventors: 区伟潮; 邱桂华; 汤志锐; 吴树鸿; 聂家荣; 欧阳卫年; 彭飞进; 黄斐
Original assignee: Guangdong Power Grid Co Ltd; Foshan Power Supply Bureau of Guangdong Power Grid Corp
Current assignee: Guangdong Power Grid Co Ltd; Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority date: 2024-03-19
Filing date: 2024-03-19
Publication date: 2024-06-07

Abstract

The invention discloses a distributed photovoltaic optimal scheduling strategy method, device and system based on GA-MADRL-PPO combination. The method comprises the following steps: according to the modularity index and the active balance index of the distributed photovoltaic power distribution network, carrying out cluster division on a plurality of nodes in the distributed photovoltaic power distribution network to obtain a cluster division result; based on a cluster division result, constructing an initial scheduling model corresponding to the distributed photovoltaic power distribution network, wherein the initial scheduling model adopts a model framework of multi-agent deep reinforcement learning; performing near-end strategy optimization training on the initial scheduling model by using scheduling experience data of the distributed photovoltaic power distribution network to obtain a target scheduling model; and inputting real-time operation data of the distributed photovoltaic power distribution network into a target scheduling model to generate a scheduling strategy. The invention solves the technical problem of poor safety and reliability of the distributed photovoltaic system.

Description

A distributed photovoltaic optimization scheduling strategy method, device and system based on GA-MADRL-PPO combination

技术领域Technical Field

本发明涉及电力技术领域，具体而言，涉及一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法、装置及系统，具体可适用于高比例分布式光伏配电网调度的技术场景下。The present invention relates to the field of electric power technology, and in particular to a distributed photovoltaic optimization scheduling strategy method, device and system based on GA-MADRL-PPO combination, which can be specifically applied to the technical scenario of high-proportion distributed photovoltaic distribution network scheduling.

背景技术Background Art

碳达峰、碳中和的重大决策为我国能源结构调整提供了政策支持和友好环境，推动清洁能源替代和可再生能源发展已成为全球共识。分布式光伏具备清洁无污染、规模灵活、系统安全、调峰性能等优势，成为我国可再生能源体系的重要组成部分。然而，分布式光伏渗透率的不断增加会影响系统的安全性和稳定性，如就地消纳问题、功率倒送问题、网损增加、电压越限等，同时降低新能源的运行可靠性和利用率。对此，如何针对较高渗透率的分布式光伏系统进行调度以提升系统安全性和稳定性成为相关技术领域的重要技术问题之一。The major decisions on carbon peak and carbon neutrality have provided policy support and a friendly environment for my country's energy structure adjustment. It has become a global consensus to promote clean energy substitution and renewable energy development. Distributed photovoltaics have the advantages of cleanliness and pollution-free, flexible scale, system safety, and peak-shaving performance, and have become an important part of my country's renewable energy system. However, the increasing penetration rate of distributed photovoltaics will affect the safety and stability of the system, such as local consumption problems, power reverse transmission problems, increased network losses, voltage over-limit, etc., while reducing the operational reliability and utilization rate of new energy. In this regard, how to dispatch distributed photovoltaic systems with higher penetration rates to improve system safety and stability has become one of the important technical issues in related technical fields.

针对上述的问题，目前尚未提出有效的解决方案。To address the above-mentioned problems, no effective solution has been proposed yet.

发明内容Summary of the invention

本发明实施例提供了一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法、装置及系统，以至少解决相关技术提供的分布式光伏调度方案难以适应分布式光伏出力和负荷不确定的情况导致分布式光伏系统的安全可靠性较差的技术问题。The embodiments of the present invention provide a distributed photovoltaic optimization scheduling strategy method, device and system based on the combination of GA-MADRL-PPO, so as to at least solve the technical problem that the distributed photovoltaic scheduling scheme provided by the relevant technology is difficult to adapt to the situation of distributed photovoltaic output and load uncertainty, resulting in poor safety and reliability of the distributed photovoltaic system.

根据本发明实施例的一个方面，提供了一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法，包括：根据分布式光伏配电网的模块度指标和有功平衡度指标，对分布式光伏配电网中的多个节点进行集群划分，得到集群划分结果；基于集群划分结果，构建分布式光伏配电网对应的初始调度模型，其中，初始调度模型采用多智能体深度强化学习的模型框架；利用分布式光伏配电网的调度经验数据，对初始调度模型进行近端策略优化训练，得到目标调度模型；将分布式光伏配电网的实时运行数据输入至目标调度模型，生成调度策略，其中，调度策略用于调整分布式光伏配电网的运行功率。According to one aspect of an embodiment of the present invention, a distributed photovoltaic optimization scheduling strategy method based on the combination of GA-MADRL-PPO is provided, including: clustering multiple nodes in the distributed photovoltaic distribution network according to the modularity index and the active power balance index of the distributed photovoltaic distribution network to obtain clustering results; based on the clustering results, constructing an initial scheduling model corresponding to the distributed photovoltaic distribution network, wherein the initial scheduling model adopts a model framework of multi-agent deep reinforcement learning; using the scheduling experience data of the distributed photovoltaic distribution network, performing proximal strategy optimization training on the initial scheduling model to obtain a target scheduling model; inputting the real-time operation data of the distributed photovoltaic distribution network into the target scheduling model to generate a scheduling strategy, wherein the scheduling strategy is used to adjust the operating power of the distributed photovoltaic distribution network.

上述基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法中，GA为遗传算法(Genetic Algorithm)的简称，上述MADRL为上述多智能体深度强化学习(Multi-AgentDeep Reinforcement Learning)的简称，上述PPO为上述近端策略优化(Proximal PolicyOptimization)的简称。In the above-mentioned distributed photovoltaic optimization scheduling strategy method based on the combination of GA-MADRL-PPO, GA is the abbreviation of genetic algorithm (Genetic Algorithm), the above-mentioned MADRL is the abbreviation of the above-mentioned multi-agent deep reinforcement learning (Multi-Agent Deep Reinforcement Learning), and the above-mentioned PPO is the abbreviation of the above-mentioned proximal policy optimization (Proximal Policy Optimization).

可选地，基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法还包括：利用分布式光伏配电网对应的目标灵敏度矩阵以及多个节点对应的电压幅值变化量和无功变化量，计算得到多个节点中任意两个节点之间的电气距离，其中，目标灵敏度矩阵用于确定多个节点的节点电压与无功功率之间的敏感度关系；基于多个节点中任意两个节点之间的电气距离和边权，构建模块度指标；基于分布式光伏配电网对应的净功率数据，构建有功平衡度指标。Optionally, the distributed photovoltaic optimization scheduling strategy method based on the combination of GA-MADRL-PPO also includes: using the target sensitivity matrix corresponding to the distributed photovoltaic distribution network and the voltage amplitude change and reactive power change corresponding to multiple nodes to calculate the electrical distance between any two nodes among the multiple nodes, wherein the target sensitivity matrix is used to determine the sensitivity relationship between the node voltage and reactive power of the multiple nodes; based on the electrical distance and edge weight between any two nodes among the multiple nodes, constructing a modularity index; based on the net power data corresponding to the distributed photovoltaic distribution network, constructing an active power balance index.

可选地，根据分布式光伏配电网的模块度指标和有功平衡度指标，对分布式光伏配电网中的多个节点进行集群划分，得到集群划分结果包括：对模块度指标和有功平衡度指标进行加权计算，得到综合性能指标；利用综合性能指标和目标遗传算法，对多个节点进行集群划分，得到集群划分结果。Optionally, multiple nodes in the distributed photovoltaic distribution network are clustered according to the modularity index and the active power balance index of the distributed photovoltaic distribution network, and the clustering results include: weighted calculation of the modularity index and the active power balance index to obtain a comprehensive performance index; and clustering multiple nodes using the comprehensive performance index and the target genetic algorithm to obtain a clustering result.

可选地，利用综合性能指标和目标遗传算法，对多个节点进行集群划分，得到集群划分结果包括：对多个节点的邻接矩阵进行编码，得到编码结果；基于综合性能指标和编码结果，对多个节点进行适应度评估，得到评估结果；响应于评估结果未满足预设适应度条件，根据评估结果和预设调整策略，对编码结果对应的节点交叉概率和节点编译概率进行调整，更新编码结果并重新进行适应度评估；响应于评估结果满足预设适应度条件，基于编码结果确定集群划分结果。Optionally, a plurality of nodes are clustered using a comprehensive performance indicator and a target genetic algorithm, and the clustering results include: encoding the adjacency matrix of the plurality of nodes to obtain an encoding result; performing fitness evaluation on the plurality of nodes based on the comprehensive performance indicator and the encoding result to obtain an evaluation result; in response to the evaluation result not meeting a preset fitness condition, adjusting the node crossover probability and the node compilation probability corresponding to the encoding result according to the evaluation result and a preset adjustment strategy, updating the encoding result and re-evaluating the fitness; in response to the evaluation result meeting the preset fitness condition, determining the clustering result based on the encoding result.

可选地，基于集群划分结果，构建分布式光伏配电网对应的初始调度模型包括：根据集群划分结果，确定多个节点对应的多个区域智能体，其中，多个区域智能体具备马尔可夫决策控制功能；基于多个区域智能体构建深度强化学习框架，其中，深度强化学习框架至少包括通信层，通信层用于实现多个区域智能体的协作决策；基于深度强化学习框架，为分布式光伏配电网构建初始调度模型。Optionally, based on the cluster division results, constructing an initial scheduling model corresponding to the distributed photovoltaic distribution network includes: determining multiple regional agents corresponding to multiple nodes according to the cluster division results, wherein the multiple regional agents have Markov decision control functions; constructing a deep reinforcement learning framework based on the multiple regional agents, wherein the deep reinforcement learning framework includes at least a communication layer, and the communication layer is used to realize collaborative decision-making of multiple regional agents; based on the deep reinforcement learning framework, constructing an initial scheduling model for the distributed photovoltaic distribution network.

可选地，初始调度模型包括状态空间；基于深度强化学习框架，为分布式光伏配电网构建初始调度模型包括：对于深度强化学习框架中的任意一个目标区域智能体，根据目标区域智能体对应的自治区域内的目标发电功率、目标负荷功率、可中断负荷量、电储能电量和调度时段，确定初始调度模型中目标区域智能体对应的状态空间。Optionally, the initial scheduling model includes a state space; based on the deep reinforcement learning framework, constructing an initial scheduling model for the distributed photovoltaic distribution network includes: for any target area agent in the deep reinforcement learning framework, according to the target power generation power, target load power, interruptible load, electric energy storage capacity and scheduling period in the autonomous area corresponding to the target area agent, determining the state space corresponding to the target area agent in the initial scheduling model.

可选地，基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法还包括：获取目标区域智能体对应的自治区域在调度时段内的实际发电功率、实际负荷功率、可中断负荷量和电储能电量；对实际发电功率叠加预设预测偏差得到目标发电功率，对实际负荷功率叠加预设预测偏差得到目标负荷功率，其中，预设预测偏差服从正态分布。Optionally, the distributed photovoltaic optimization scheduling strategy method based on the combination of GA-MADRL-PPO also includes: obtaining the actual power generation power, actual load power, interruptible load and energy storage capacity of the autonomous area corresponding to the target area intelligent body during the scheduling period; superimposing a preset prediction deviation on the actual power generation power to obtain the target power generation power, and superimposing a preset prediction deviation on the actual load power to obtain the target load power, wherein the preset prediction deviation obeys a normal distribution.

可选地，初始调度模型包括动作空间，动作空间中包括多个决策变量；基于深度强化学习框架，为分布式光伏配电网构建初始调度模型包括：对于深度强化学习框架中的任意一个目标区域智能体，利用目标区域智能体对应的状态空间中的状态信息和目标约束，计算得到目标区域智能体对应的多个决策变量，其中，目标约束包括预设的电储能约束和可中断负荷约束，多个决策变量包括：目标区域智能体的自治区域内的可中断负荷功率、电储能发出的有功功率和电储能发出的无功功率。Optionally, the initial scheduling model includes an action space, which includes multiple decision variables; based on the deep reinforcement learning framework, constructing an initial scheduling model for the distributed photovoltaic distribution network includes: for any target area intelligent agent in the deep reinforcement learning framework, using the state information and target constraints in the state space corresponding to the target area intelligent agent, calculate the multiple decision variables corresponding to the target area intelligent agent, wherein the target constraints include preset energy storage constraints and interruptible load constraints, and the multiple decision variables include: the interruptible load power within the autonomous area of the target area intelligent agent, the active power generated by the energy storage, and the reactive power generated by the energy storage.

可选地，初始调度模型包括奖励函数；基于深度强化学习框架，为分布式光伏配电网构建初始调度模型包括：对于深度强化学习框架中多个区域智能体，基于周期运行成本约束和节点电压约束，确定奖励函数，其中，周期运行成本由多个区域智能体在目标调度周期内的目标运行成本确定，目标运行成本包括电储能运行成本、可中断负荷成本和购电成本，节点电压约束由分布式光伏配电网的系统额定电压和多个区域智能体内节点电压幅值的目标取值范围确定。Optionally, the initial scheduling model includes a reward function; based on the deep reinforcement learning framework, constructing an initial scheduling model for the distributed photovoltaic distribution network includes: for multiple regional agents in the deep reinforcement learning framework, determining a reward function based on the periodic operating cost constraints and the node voltage constraints, wherein the periodic operating cost is determined by the target operating costs of the multiple regional agents within the target scheduling cycle, the target operating cost includes the electric energy storage operating cost, the interruptible load cost and the electricity purchase cost, and the node voltage constraint is determined by the system rated voltage of the distributed photovoltaic distribution network and the target value range of the node voltage amplitude in the multiple regional agents.

可选地，利用分布式光伏配电网的调度经验数据，对初始调度模型进行近端策略优化训练，得到目标调度模型包括：对分布式光伏配电网在多个历史调度周期内的历史调度数据进行采样，得到调度经验数据；基于调度经验数据和近端策略优化算法对初始调度模型进行多个轮次的离线训练，以更新初始调度模型中多个区域智能体对应的策略网络参数和价值网络参数，得到目标调度模型。Optionally, the dispatching experience data of the distributed photovoltaic distribution network is used to perform proximal strategy optimization training on the initial dispatching model to obtain the target dispatching model, including: sampling the historical dispatching data of the distributed photovoltaic distribution network in multiple historical dispatching cycles to obtain dispatching experience data; based on the dispatching experience data and the proximal strategy optimization algorithm, multiple rounds of offline training are performed on the initial dispatching model to update the strategy network parameters and value network parameters corresponding to multiple regional agents in the initial dispatching model to obtain the target dispatching model.

可选地，将分布式光伏配电网的实时运行数据输入至目标调度模型，生成调度策略包括：获取实时运行数据，实时运行数据至少包括：分布式光伏配电网在当前调度时段内的光伏发电功率、实时负荷功率、可中断负荷量和电储能电量；将实时运行数据输入至目标调度模型，并获取目标调度模型输出的决策信息，其中，决策信息用于确定分布式光伏配电网的多个自治区域对应的目标调度功率，目标调度功率包括：目标可中断负荷功率、电储能发出的目标有功功率和电储能发出的目标无功功率；根据目标调度功率生成调度策略。Optionally, the real-time operation data of the distributed photovoltaic distribution network is input into the target scheduling model, and the generation of the scheduling strategy includes: obtaining the real-time operation data, the real-time operation data at least including: the photovoltaic power generation power, real-time load power, interruptible load and electric energy storage power of the distributed photovoltaic distribution network in the current scheduling period; inputting the real-time operation data into the target scheduling model, and obtaining the decision information output by the target scheduling model, wherein the decision information is used to determine the target scheduling power corresponding to multiple autonomous areas of the distributed photovoltaic distribution network, the target scheduling power including: the target interruptible load power, the target active power emitted by the electric energy storage and the target reactive power emitted by the electric energy storage; generating the scheduling strategy according to the target scheduling power.

根据本发明实施例的另一方面，还提供了一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略装置，包括：划分模块，用于根据分布式光伏配电网的模块度指标和有功平衡度指标，对分布式光伏配电网中的多个节点进行集群划分，得到集群划分结果；构建模块，用于基于集群划分结果，构建分布式光伏配电网对应的初始调度模型，其中，初始调度模型采用多智能体深度强化学习的模型框架；训练模块，用于利用分布式光伏配电网的调度经验数据，对初始调度模型进行近端策略优化训练，得到目标调度模型；调度模块，用于将分布式光伏配电网的实时运行数据输入至目标调度模型，生成调度策略，其中，调度策略用于调整分布式光伏配电网的运行功率。According to another aspect of an embodiment of the present invention, a distributed photovoltaic optimization scheduling strategy device based on the combination of GA-MADRL-PPO is also provided, including: a division module, used to cluster multiple nodes in a distributed photovoltaic distribution network according to the modularity index and active power balance index of the distributed photovoltaic distribution network, and obtain a cluster division result; a construction module, used to construct an initial scheduling model corresponding to the distributed photovoltaic distribution network based on the cluster division result, wherein the initial scheduling model adopts a model framework of multi-agent deep reinforcement learning; a training module, used to use the scheduling experience data of the distributed photovoltaic distribution network to perform proximal strategy optimization training on the initial scheduling model to obtain a target scheduling model; a scheduling module, used to input the real-time operation data of the distributed photovoltaic distribution network into the target scheduling model to generate a scheduling strategy, wherein the scheduling strategy is used to adjust the operating power of the distributed photovoltaic distribution network.

可选地，上述基于GA-MADRL-PPO结合的分布式光伏优化调度策略装置还包括指标模块，用于：利用分布式光伏配电网对应的目标灵敏度矩阵以及多个节点对应的电压幅值变化量和无功变化量，计算得到多个节点中任意两个节点之间的电气距离，其中，目标灵敏度矩阵用于确定多个节点的节点电压与无功功率之间的敏感度关系；基于多个节点中任意两个节点之间的电气距离和边权，构建模块度指标；基于分布式光伏配电网对应的净功率数据，构建有功平衡度指标。Optionally, the above-mentioned distributed photovoltaic optimization scheduling strategy device based on the combination of GA-MADRL-PPO also includes an indicator module, which is used to: use the target sensitivity matrix corresponding to the distributed photovoltaic distribution network and the voltage amplitude change and reactive power change corresponding to multiple nodes to calculate the electrical distance between any two nodes among the multiple nodes, wherein the target sensitivity matrix is used to determine the sensitivity relationship between the node voltage and reactive power of the multiple nodes; construct a modularity index based on the electrical distance and edge weight between any two nodes among the multiple nodes; and construct an active power balance index based on the net power data corresponding to the distributed photovoltaic distribution network.

可选地，上述划分模块，还用于：对模块度指标和有功平衡度指标进行加权计算，得到综合性能指标；利用综合性能指标和目标遗传算法，对多个节点进行集群划分，得到集群划分结果。Optionally, the above-mentioned division module is also used to: perform weighted calculation on the modularity index and the active power balance index to obtain a comprehensive performance index; and use the comprehensive performance index and the target genetic algorithm to cluster multiple nodes to obtain a cluster division result.

可选地，上述划分模块，还用于：对多个节点的邻接矩阵进行编码，得到编码结果；基于综合性能指标和编码结果，对多个节点进行适应度评估，得到评估结果；响应于评估结果未满足预设适应度条件，根据评估结果和预设调整策略，对编码结果对应的节点交叉概率和节点编译概率进行调整，更新编码结果并重新进行适应度评估；响应于评估结果满足预设适应度条件，基于编码结果确定集群划分结果。Optionally, the above-mentioned division module is also used to: encode the adjacency matrix of multiple nodes to obtain a coding result; based on the comprehensive performance index and the coding result, perform fitness evaluation on the multiple nodes to obtain an evaluation result; in response to the evaluation result not meeting the preset fitness condition, adjust the node crossover probability and node compilation probability corresponding to the coding result according to the evaluation result and the preset adjustment strategy, update the coding result and re-evaluate the fitness; in response to the evaluation result meeting the preset fitness condition, determine the cluster division result based on the coding result.

可选地，上述构建模块，还用于：根据集群划分结果，确定多个节点对应的多个区域智能体，其中，多个区域智能体具备马尔可夫决策控制功能；基于多个区域智能体构建深度强化学习框架，其中，深度强化学习框架至少包括通信层，通信层用于实现多个区域智能体的协作决策；基于深度强化学习框架，为分布式光伏配电网构建初始调度模型。Optionally, the above-mentioned construction module is also used to: determine multiple regional agents corresponding to multiple nodes according to the cluster division results, wherein the multiple regional agents have Markov decision control functions; construct a deep reinforcement learning framework based on multiple regional agents, wherein the deep reinforcement learning framework includes at least a communication layer, and the communication layer is used to realize collaborative decision-making of multiple regional agents; based on the deep reinforcement learning framework, construct an initial scheduling model for the distributed photovoltaic distribution network.

可选地，初始调度模型包括状态空间；上述构建模块，还用于：对于深度强化学习框架中的任意一个目标区域智能体，根据目标区域智能体对应的自治区域内的目标发电功率、目标负荷功率、可中断负荷量、电储能电量和调度时段，确定初始调度模型中目标区域智能体对应的状态空间。Optionally, the initial scheduling model includes a state space; the above-mentioned building module is also used to: for any target area intelligent agent in the deep reinforcement learning framework, determine the state space corresponding to the target area intelligent agent in the initial scheduling model according to the target power generation power, target load power, interruptible load, electric energy storage capacity and scheduling period in the autonomous area corresponding to the target area intelligent agent.

可选地，上述基于GA-MADRL-PPO结合的分布式光伏优化调度策略装置还包括偏差模块，用于：获取目标区域智能体对应的自治区域在调度时段内的实际发电功率、实际负荷功率、可中断负荷量和电储能电量；对实际发电功率叠加预设预测偏差得到目标发电功率，对实际负荷功率叠加预设预测偏差得到目标负荷功率，其中，预设预测偏差服从正态分布。Optionally, the above-mentioned distributed photovoltaic optimization scheduling strategy device based on the combination of GA-MADRL-PPO also includes a deviation module, which is used to: obtain the actual power generation power, actual load power, interruptible load and electric energy storage capacity of the autonomous area corresponding to the target area intelligent body during the scheduling period; superimpose a preset prediction deviation on the actual power generation power to obtain the target power generation power, and superimpose a preset prediction deviation on the actual load power to obtain the target load power, wherein the preset prediction deviation obeys a normal distribution.

可选地，初始调度模型包括动作空间，动作空间中包括多个决策变量；上述构建模块，还用于：对于深度强化学习框架中的任意一个目标区域智能体，利用目标区域智能体对应的状态空间中的状态信息和目标约束，计算得到目标区域智能体对应的多个决策变量，其中，目标约束包括预设的电储能约束和可中断负荷约束，多个决策变量包括：目标区域智能体的自治区域内的可中断负荷功率、电储能发出的有功功率和电储能发出的无功功率。Optionally, the initial scheduling model includes an action space, which includes multiple decision variables. The above-mentioned building module is also used to: for any target area intelligent agent in the deep reinforcement learning framework, use the state information and target constraints in the state space corresponding to the target area intelligent agent to calculate multiple decision variables corresponding to the target area intelligent agent, wherein the target constraints include preset energy storage constraints and interruptible load constraints, and the multiple decision variables include: the interruptible load power within the autonomous area of the target area intelligent agent, the active power generated by the energy storage, and the reactive power generated by the energy storage.

可选地，初始调度模型包括奖励函数；上述构建模块，还用于：对于深度强化学习框架中多个区域智能体，基于周期运行成本约束和节点电压约束，确定奖励函数，其中，周期运行成本由多个区域智能体在目标调度周期内的目标运行成本确定，目标运行成本包括电储能运行成本、可中断负荷成本和购电成本，节点电压约束由分布式光伏配电网的系统额定电压和多个区域智能体内节点电压幅值的目标取值范围确定。Optionally, the initial scheduling model includes a reward function; the above-mentioned building module is also used to: for multiple regional agents in the deep reinforcement learning framework, determine the reward function based on the periodic operating cost constraint and the node voltage constraint, wherein the periodic operating cost is determined by the target operating cost of the multiple regional agents within the target scheduling cycle, the target operating cost includes the electric energy storage operating cost, the interruptible load cost and the electricity purchase cost, and the node voltage constraint is determined by the system rated voltage of the distributed photovoltaic distribution network and the target value range of the node voltage amplitude in the multiple regional agents.

可选地，上述训练模块，还用于：对分布式光伏配电网在多个历史调度周期内的历史调度数据进行采样，得到调度经验数据；基于调度经验数据和近端策略优化算法对初始调度模型进行多个轮次的离线训练，以更新初始调度模型中多个区域智能体对应的策略网络参数和价值网络参数，得到目标调度模型。Optionally, the above-mentioned training module is also used to: sample historical scheduling data of the distributed photovoltaic distribution network in multiple historical scheduling cycles to obtain scheduling experience data; perform multiple rounds of offline training on the initial scheduling model based on the scheduling experience data and the proximal strategy optimization algorithm to update the strategy network parameters and value network parameters corresponding to multiple regional agents in the initial scheduling model to obtain the target scheduling model.

可选地，上述调度模块，还用于：获取实时运行数据，实时运行数据至少包括：分布式光伏配电网在当前调度时段内的光伏发电功率、实时负荷功率、可中断负荷量和电储能电量；将实时运行数据输入至目标调度模型，并获取目标调度模型输出的决策信息，其中，决策信息用于确定分布式光伏配电网的多个自治区域对应的目标调度功率，目标调度功率包括：目标可中断负荷功率、电储能发出的目标有功功率和电储能发出的目标无功功率；根据目标调度功率生成调度策略。Optionally, the above-mentioned scheduling module is also used to: obtain real-time operation data, the real-time operation data at least includes: the photovoltaic power generation power, real-time load power, interruptible load and electric energy storage power of the distributed photovoltaic distribution network in the current scheduling period; input the real-time operation data into the target scheduling model, and obtain decision information output by the target scheduling model, wherein the decision information is used to determine the target scheduling power corresponding to multiple autonomous areas of the distributed photovoltaic distribution network, the target scheduling power includes: the target interruptible load power, the target active power emitted by the electric energy storage and the target reactive power emitted by the electric energy storage; generate a scheduling strategy according to the target scheduling power.

根据本发明实施例的另一方面，还提供了一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略系统，包括存储器和处理器，存储器中存储有计算机程序，处理器被设置为运行计算机程序以执行上述任意一项的基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法。According to another aspect of an embodiment of the present invention, a distributed photovoltaic optimization scheduling strategy system based on the combination of GA-MADRL-PPO is also provided, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute any one of the above-mentioned distributed photovoltaic optimization scheduling strategy methods based on the combination of GA-MADRL-PPO.

在本发明实施例中，根据分布式光伏配电网的模块度指标和有功平衡度指标，对分布式光伏配电网中的多个节点进行集群划分，得到集群划分结果；基于集群划分结果，构建分布式光伏配电网对应的初始调度模型，其中，初始调度模型采用多智能体深度强化学习的模型框架；利用分布式光伏配电网的调度经验数据，对初始调度模型进行近端策略优化训练，得到目标调度模型；将分布式光伏配电网的实时运行数据输入至目标调度模型，生成调度策略，其中，调度策略用于调整分布式光伏配电网的运行功率。由此，本发明达到了结合多智能体深度强化学习和近端策略优化构建调度模型来完成对分布式光伏配电网的调度的目的，从而实现了增强分布式光伏系统对不确定情况的适应能力、提高分布式光伏系统的安全可靠性的技术效果，进而解决了相关技术提供的分布式光伏调度方案难以适应分布式光伏出力和负荷不确定的情况导致分布式光伏系统的安全可靠性较差技术问题。In an embodiment of the present invention, according to the modularity index and active balance index of the distributed photovoltaic distribution network, multiple nodes in the distributed photovoltaic distribution network are clustered to obtain clustering results; based on the clustering results, an initial scheduling model corresponding to the distributed photovoltaic distribution network is constructed, wherein the initial scheduling model adopts a model framework of multi-agent deep reinforcement learning; using the scheduling experience data of the distributed photovoltaic distribution network, the initial scheduling model is trained for proximal strategy optimization to obtain a target scheduling model; the real-time operation data of the distributed photovoltaic distribution network is input into the target scheduling model to generate a scheduling strategy, wherein the scheduling strategy is used to adjust the operating power of the distributed photovoltaic distribution network. Thus, the present invention achieves the purpose of combining multi-agent deep reinforcement learning and proximal strategy optimization to construct a scheduling model to complete the scheduling of the distributed photovoltaic distribution network, thereby achieving the technical effect of enhancing the adaptability of the distributed photovoltaic system to uncertain situations and improving the safety and reliability of the distributed photovoltaic system, thereby solving the technical problem that the distributed photovoltaic scheduling scheme provided by the related technology is difficult to adapt to the situation of distributed photovoltaic output and load uncertainty, resulting in poor safety and reliability of the distributed photovoltaic system.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本发明的进一步理解，构成本发明的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The drawings described herein are used to provide a further understanding of the present invention and constitute a part of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the drawings:

图1是根据本发明实施例的一种可选的用于基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法的终端设备的硬件结构框图；1 is a hardware structure block diagram of a terminal device for an optional distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO combination according to an embodiment of the present invention;

图2是根据本发明实施例的一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法的流程图；2 is a flow chart of a distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO combination according to an embodiment of the present invention;

图3是根据本发明实施例的一种可选的基于GA-MADRL-PPO结合的分布式光伏优化调度过程的示意图；3 is a schematic diagram of an optional distributed photovoltaic optimization scheduling process based on GA-MADRL-PPO combination according to an embodiment of the present invention;

图4是根据本发明实施例的一种可选的基于MADRL的分布式光伏优化调度框架的示意图；FIG4 is a schematic diagram of an optional MADRL-based distributed photovoltaic optimization scheduling framework according to an embodiment of the present invention;

图5是根据本发明实施例的一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略装置的结构框图。FIG5 is a structural block diagram of a distributed photovoltaic optimization scheduling strategy device based on GA-MADRL-PPO combination according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the scheme of the present invention, the technical scheme in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged where appropriate, so that the embodiments of the present invention described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

根据本发明实施例，提供了一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法的实施例，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以按照不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of a distributed photovoltaic optimization scheduling strategy method based on the combination of GA-MADRL-PPO is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although the logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.

图1是根据本发明实施例的一种可选的用于基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法的终端设备的硬件结构框图，如图1所示，终端设备10可以包括一个或多个处理器102(处理器102可以包括但不限于微处理器(Microcontroller Unit，简称MCU)或可编程逻辑器件(Field Programmable Gate Array，简称FPGA)等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输设备106。除此以外，还可以包括：显示设备110、输入/输出设备108(即I/O设备)、通用串行总线(Universal Serial Bus，简称USB)端口(可以作为计算机总线的端口中的一个端口被包括，图中未示出)、网络接口(图中未示出)、电源(图中未示出)和/或相机(图中未示出)。本领域普通技术人员可以理解，图1所示的结构仅为示意，其并不对上述终端设备10的结构造成限定。例如，终端设备10还可包括比图1中所示更多或者更少的组件，或者具有与图1所示不同的配置。FIG. 1 is a hardware structure block diagram of a terminal device for a distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO combination according to an embodiment of the present invention. As shown in FIG. 1 , the terminal device 10 may include one or more processors 102 (the processor 102 may include but is not limited to a processing device such as a microprocessor (Microcontroller Unit, referred to as MCU) or a programmable logic device (Field Programmable Gate Array, referred to as FPGA)), a memory 104 for storing data, and a transmission device 106 for communication functions. In addition, it may also include: a display device 110, an input/output device 108 (i.e., an I/O device), a Universal Serial Bus (USB) port (which may be included as one of the ports of a computer bus, not shown in the figure), a network interface (not shown in the figure), a power supply (not shown in the figure) and/or a camera (not shown in the figure). It can be understood by those skilled in the art that the structure shown in FIG. 1 is only for illustration, and it does not limit the structure of the above-mentioned terminal device 10. For example, the terminal device 10 may also include more or fewer components than those shown in FIG. 1 , or have a configuration different from that shown in FIG. 1 .

应当注意到的是上述一个或多个处理器102和/或其他数据处理电路可以全部或部分的体现为软件、硬件、固件或其他任意组合。此外，数据处理电路可为单个独立的处理模块，或全部或部分的结合到终端设备10(或移动设备)中的其他元件中的任意一个内。It should be noted that the one or more processors 102 and/or other data processing circuits may be embodied in whole or in part as software, hardware, firmware or any other combination thereof. In addition, the data processing circuit may be a single independent processing module, or may be fully or partially integrated into any of the other components in the terminal device 10 (or mobile device).

存储器104可用于存储应用软件的软件程序以及模块，如本发明实施例中的基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法对应的程序指令/数据存储装置，处理器102通过运行存储在存储器104内的软件程序以及模块，从而执行各种功能应用以及数据处理，即实现上述的基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法。存储器104可包括高速随机存储器，还可包括非易失性存储器，如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中，存储器104可进一步包括相对于处理器102远程设置的存储器，这些远程存储器可以通过网络连接至终端设备10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 104 can be used to store software programs and modules of application software, such as the program instructions/data storage device corresponding to the distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO in the embodiment of the present invention. The processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, the distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO is realized. The memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include a memory remotely arranged relative to the processor 102, and these remote memories may be connected to the terminal device 10 via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and a combination thereof.

传输设备106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括终端设备10的通信供应商提供的无线网络。在一个实例中，传输设备106包括一个网络适配器(Network Interface Controller，简称NIC)，其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中，传输设备106可以为射频(Radio Frequency，简称RF)模块，其用于通过无线方式与互联网进行通讯。The transmission device 106 is used to receive or send data via a network. The specific example of the above network may include a wireless network provided by a communication provider of the terminal device 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, referred to as NIC), which can be connected to other network devices through a base station so as to communicate with the Internet. In one example, the transmission device 106 can be a radio frequency (Radio Frequency, referred to as RF) module, which is used to communicate with the Internet wirelessly.

在上述运行环境下，本发明实施例提供了如图2所示的基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法，图2是根据本发明实施例的一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法的流程图，如图2所示，该方法包括如下实施步骤：Under the above operating environment, an embodiment of the present invention provides a distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO as shown in FIG2. FIG2 is a flow chart of a distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO according to an embodiment of the present invention. As shown in FIG2, the method includes the following implementation steps:

步骤S201，根据分布式光伏配电网的模块度指标和有功平衡度指标，对分布式光伏配电网中的多个节点进行集群划分，得到集群划分结果；Step S201, clustering multiple nodes in the distributed photovoltaic distribution network according to the modularity index and active power balance index of the distributed photovoltaic distribution network to obtain a clustering result;

步骤S202，基于集群划分结果，构建分布式光伏配电网对应的初始调度模型，其中，初始调度模型采用多智能体深度强化学习的模型框架；Step S202, based on the cluster division result, construct an initial scheduling model corresponding to the distributed photovoltaic distribution network, wherein the initial scheduling model adopts a model framework of multi-agent deep reinforcement learning;

步骤S203，利用分布式光伏配电网的调度经验数据，对初始调度模型进行近端策略优化训练，得到目标调度模型；Step S203, using the dispatching experience data of the distributed photovoltaic distribution network, performing proximal strategy optimization training on the initial dispatching model to obtain a target dispatching model;

步骤S204，将分布式光伏配电网的实时运行数据输入至目标调度模型，生成调度策略，其中，调度策略用于调整分布式光伏配电网的运行功率。Step S204: input the real-time operation data of the distributed photovoltaic distribution network into the target scheduling model to generate a scheduling strategy, wherein the scheduling strategy is used to adjust the operating power of the distributed photovoltaic distribution network.

为了保证大规模分布式光伏安全可靠地接入电网，一种有效的解决方法是配置储能。目前已有部分研究成果用于解决配电网的灵活性评估问题，包括安装储能可以在一定程度上缓解光伏出力波动引起的电压和功率波动，减少因分布式光伏功率倒送引发的消纳问题；考虑光伏消纳的储能容量优化方法，有效提高了光伏消纳率；利用储能在不同时刻的充放电来缓解分布式电源出力的波动性；利用储能的无功调节能力对配电网进行动态无功优化，明显改善了配电网的有功损耗。In order to ensure the safe and reliable access of large-scale distributed photovoltaic power to the power grid, an effective solution is to configure energy storage. At present, some research results have been used to solve the problem of flexibility assessment of distribution networks, including the installation of energy storage, which can alleviate the voltage and power fluctuations caused by photovoltaic output fluctuations to a certain extent, and reduce the absorption problems caused by distributed photovoltaic power reverse transmission; considering the energy storage capacity optimization method for photovoltaic absorption, effectively improving the photovoltaic absorption rate; using energy storage to charge and discharge at different times to alleviate the volatility of distributed power output; using the reactive power regulation capability of energy storage to dynamically optimize the distribution network, which significantly improves the active power loss of the distribution network.

另一种有效手段是对配网分布式光伏进行集群划分，并统一进行调度技术研究，实现高渗透率分布式电源与电网友好互动和最大程度并网消纳。高渗透率新能源的集群管理模式有助于平抑新能源出力波动、提高新能源就地消纳率，从根本上解决分布式光伏渗透率升高引发的各种问题。根据本发明实施例提供的上述方法步骤，能够在调度过程中具备对分布式光伏出力和负荷不确定性问题的自适应能力，根据源荷的随机变化实时给出优化调度结果，实现了多个自治区域协同运行，提高经济性。Another effective means is to divide the distributed photovoltaic power distribution network into clusters and conduct unified dispatching technology research to achieve friendly interaction between high-penetration distributed power sources and the power grid and maximize grid-connected consumption. The cluster management mode of high-penetration new energy helps to smooth out fluctuations in new energy output, increase the local absorption rate of new energy, and fundamentally solve various problems caused by the increase in distributed photovoltaic penetration. According to the above-mentioned method steps provided in the embodiment of the present invention, it is possible to have the ability to adapt to the uncertainty of distributed photovoltaic output and load during the dispatching process, and give optimized dispatching results in real time according to the random changes in source and load, thereby realizing the coordinated operation of multiple autonomous regions and improving economic efficiency.

下面对本发明实施例上述方法的部分可选实施方式进行进一步介绍。Some optional implementation methods of the above method in the embodiment of the present invention are further introduced below.

可选地，上述基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法还可以包括如下方法步骤：Optionally, the above-mentioned distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO combination may also include the following method steps:

步骤S251，利用分布式光伏配电网对应的目标灵敏度矩阵以及多个节点对应的电压幅值变化量和无功变化量，计算得到多个节点中任意两个节点之间的电气距离，其中，目标灵敏度矩阵用于确定多个节点的节点电压与无功功率之间的敏感度关系；Step S251, using the target sensitivity matrix corresponding to the distributed photovoltaic distribution network and the voltage amplitude change and reactive power change corresponding to the multiple nodes, the electrical distance between any two nodes among the multiple nodes is calculated, wherein the target sensitivity matrix is used to determine the sensitivity relationship between the node voltage and reactive power of the multiple nodes;

步骤S252，基于多个节点中任意两个节点之间的电气距离和边权，构建模块度指标；Step S252, constructing a modularity index based on the electrical distance and edge weight between any two nodes among the plurality of nodes;

步骤S253，基于分布式光伏配电网对应的净功率数据，构建有功平衡度指标。Step S253: constructing an active power balance index based on the net power data corresponding to the distributed photovoltaic distribution network.

可选地，上述步骤S201中，根据分布式光伏配电网的模块度指标和有功平衡度指标，对分布式光伏配电网中的多个节点进行集群划分，得到集群划分结果，还可以包括如下执行步骤：Optionally, in the above step S201, clustering is performed on multiple nodes in the distributed photovoltaic distribution network according to the modularity index and the active power balance index of the distributed photovoltaic distribution network to obtain a clustering result, and the following execution steps may also be included:

步骤S211，对模块度指标和有功平衡度指标进行加权计算，得到综合性能指标；Step S211, performing weighted calculation on the modularity index and the active power balance index to obtain a comprehensive performance index;

步骤S212，利用综合性能指标和目标遗传算法，对多个节点进行集群划分，得到集群划分结果。Step S212, clustering multiple nodes using comprehensive performance indicators and target genetic algorithm to obtain clustering results.

可选地，上述步骤S212中，利用综合性能指标和目标遗传算法，对多个节点进行集群划分，得到集群划分结果，还可以包括如下执行步骤：Optionally, in the above step S212, clustering is performed on multiple nodes using comprehensive performance indicators and a target genetic algorithm to obtain a clustering result, and the following execution steps may also be included:

步骤S2121，对多个节点的邻接矩阵进行编码，得到编码结果；Step S2121, encoding the adjacency matrices of multiple nodes to obtain encoding results;

步骤S2122，基于综合性能指标和编码结果，对多个节点进行适应度评估，得到评估结果；Step S2122, based on the comprehensive performance index and the encoding result, fitness evaluation is performed on multiple nodes to obtain evaluation results;

步骤S2123，响应于评估结果未满足预设适应度条件，根据评估结果和预设调整策略，对编码结果对应的节点交叉概率和节点编译概率进行调整，更新编码结果并重新进行适应度评估；Step S2123, in response to the evaluation result not meeting the preset fitness condition, adjusting the node crossover probability and the node compilation probability corresponding to the coding result according to the evaluation result and the preset adjustment strategy, updating the coding result and re-evaluating the fitness;

步骤S2124，响应于评估结果满足预设适应度条件，基于编码结果确定集群划分结果。Step S2124: in response to the evaluation result satisfying the preset fitness condition, determining a cluster division result based on the encoding result.

可选地，上述步骤S202中，基于集群划分结果，构建分布式光伏配电网对应的初始调度模型，还可以包括如下执行步骤：Optionally, in the above step S202, based on the cluster division result, constructing an initial scheduling model corresponding to the distributed photovoltaic distribution network may further include the following execution steps:

步骤S221，根据集群划分结果，确定多个节点对应的多个区域智能体，其中，多个区域智能体具备马尔可夫决策控制功能；Step S221, determining multiple regional agents corresponding to multiple nodes according to the cluster division result, wherein the multiple regional agents have a Markov decision control function;

步骤S222，基于多个区域智能体构建深度强化学习框架，其中，深度强化学习框架至少包括通信层，通信层用于实现多个区域智能体的协作决策；Step S222, constructing a deep reinforcement learning framework based on the multiple regional agents, wherein the deep reinforcement learning framework includes at least a communication layer, and the communication layer is used to realize collaborative decision-making of the multiple regional agents;

步骤S223，基于深度强化学习框架，为分布式光伏配电网构建初始调度模型。Step S223: construct an initial scheduling model for the distributed photovoltaic distribution network based on a deep reinforcement learning framework.

可选地，初始调度模型包括状态空间；上述步骤S223中，基于深度强化学习框架，为分布式光伏配电网构建初始调度模型，还可以包括如下执行步骤：Optionally, the initial scheduling model includes a state space; in the above step S223, based on the deep reinforcement learning framework, constructing an initial scheduling model for the distributed photovoltaic distribution network may also include the following execution steps:

步骤S2231，对于深度强化学习框架中的任意一个目标区域智能体，根据目标区域智能体对应的自治区域内的目标发电功率、目标负荷功率、可中断负荷量、电储能电量和调度时段，确定初始调度模型中目标区域智能体对应的状态空间。Step S2231, for any target area agent in the deep reinforcement learning framework, determine the state space corresponding to the target area agent in the initial scheduling model according to the target power generation power, target load power, interruptible load, energy storage capacity and scheduling period in the autonomous area corresponding to the target area agent.

步骤S261，获取目标区域智能体对应的自治区域在调度时段内的实际发电功率、实际负荷功率、可中断负荷量和电储能电量；Step S261, obtaining the actual power generation power, actual load power, interruptible load and electric energy storage capacity of the autonomous area corresponding to the target area agent during the scheduling period;

步骤S262，对实际发电功率叠加预设预测偏差得到目标发电功率，对实际负荷功率叠加预设预测偏差得到目标负荷功率，其中，预设预测偏差服从正态分布。Step S262, superimposing the preset prediction deviation on the actual power generation power to obtain the target power generation power, and superimposing the preset prediction deviation on the actual load power to obtain the target load power, wherein the preset prediction deviation obeys a normal distribution.

可选地，初始调度模型包括动作空间，动作空间中包括多个决策变量；上述步骤S223中，基于深度强化学习框架，为分布式光伏配电网构建初始调度模型，还可以包括如下执行步骤：Optionally, the initial scheduling model includes an action space, and the action space includes multiple decision variables; in the above step S223, based on the deep reinforcement learning framework, the initial scheduling model is constructed for the distributed photovoltaic distribution network, and the following execution steps may also be included:

步骤S2232，对于深度强化学习框架中的任意一个目标区域智能体，利用目标区域智能体对应的状态空间中的状态信息和目标约束，计算得到目标区域智能体对应的多个决策变量，其中，目标约束包括预设的电储能约束和可中断负荷约束，多个决策变量包括：目标区域智能体的自治区域内的可中断负荷功率、电储能发出的有功功率和电储能发出的无功功率。Step S2232, for any target area intelligent agent in the deep reinforcement learning framework, multiple decision variables corresponding to the target area intelligent agent are calculated using the state information and target constraints in the state space corresponding to the target area intelligent agent, wherein the target constraints include preset energy storage constraints and interruptible load constraints, and the multiple decision variables include: the interruptible load power within the autonomous area of the target area intelligent agent, the active power generated by the energy storage, and the reactive power generated by the energy storage.

可选地，初始调度模型包括奖励函数；上述步骤S223中，基于深度强化学习框架，为分布式光伏配电网构建初始调度模型，还可以包括如下执行步骤：Optionally, the initial scheduling model includes a reward function; in the above step S223, based on the deep reinforcement learning framework, constructing an initial scheduling model for the distributed photovoltaic distribution network may also include the following execution steps:

步骤S2233，对于深度强化学习框架中多个区域智能体，基于周期运行成本约束和节点电压约束，确定奖励函数，其中，周期运行成本由多个区域智能体在目标调度周期内的目标运行成本确定，目标运行成本包括电储能运行成本、可中断负荷成本和购电成本，节点电压约束由分布式光伏配电网的系统额定电压和多个区域智能体内节点电压幅值的目标取值范围确定。Step S2233, for multiple regional agents in the deep reinforcement learning framework, determine the reward function based on the periodic operating cost constraint and the node voltage constraint, wherein the periodic operating cost is determined by the target operating cost of the multiple regional agents within the target scheduling cycle, and the target operating cost includes the electric energy storage operating cost, the interruptible load cost and the electricity purchase cost, and the node voltage constraint is determined by the system rated voltage of the distributed photovoltaic distribution network and the target value range of the node voltage amplitude in the multiple regional agents.

可选地，上述步骤S203中，利用分布式光伏配电网的调度经验数据，对初始调度模型进行近端策略优化训练，得到目标调度模型，还可以包括如下执行步骤：Optionally, in the above step S203, the dispatching experience data of the distributed photovoltaic distribution network is used to perform proximal strategy optimization training on the initial dispatching model to obtain the target dispatching model, and the following execution steps may also be included:

步骤S231，对分布式光伏配电网在多个历史调度周期内的历史调度数据进行采样，得到调度经验数据；Step S231, sampling historical dispatching data of the distributed photovoltaic distribution network in multiple historical dispatching cycles to obtain dispatching experience data;

步骤S232，基于调度经验数据和近端策略优化算法对初始调度模型进行多个轮次的离线训练，以更新初始调度模型中多个区域智能体对应的策略网络参数和价值网络参数，得到目标调度模型。Step S232, based on the scheduling experience data and the proximal strategy optimization algorithm, the initial scheduling model is trained offline for multiple rounds to update the strategy network parameters and value network parameters corresponding to multiple regional agents in the initial scheduling model to obtain the target scheduling model.

可选地，上述步骤S204中，将分布式光伏配电网的实时运行数据输入至目标调度模型，生成调度策略，还可以包括如下执行步骤：Optionally, in the above step S204, the real-time operation data of the distributed photovoltaic distribution network is input into the target scheduling model to generate a scheduling strategy, which may also include the following execution steps:

步骤S241，获取实时运行数据，实时运行数据至少包括：分布式光伏配电网在当前调度时段内的光伏发电功率、实时负荷功率、可中断负荷量和电储能电量；Step S241, obtaining real-time operation data, the real-time operation data at least includes: photovoltaic power generation power, real-time load power, interruptible load and electric energy storage power of the distributed photovoltaic distribution network in the current scheduling period;

步骤S242，将实时运行数据输入至目标调度模型，并获取目标调度模型输出的决策信息，其中，决策信息用于确定分布式光伏配电网的多个自治区域对应的目标调度功率，目标调度功率包括：目标可中断负荷功率、电储能发出的目标有功功率和电储能发出的目标无功功率；Step S242, inputting the real-time operation data into the target scheduling model, and obtaining the decision information output by the target scheduling model, wherein the decision information is used to determine the target scheduling power corresponding to the multiple autonomous areas of the distributed photovoltaic distribution network, and the target scheduling power includes: the target interruptible load power, the target active power generated by the electric energy storage, and the target reactive power generated by the electric energy storage;

步骤S243，根据目标调度功率生成调度策略。Step S243: generating a scheduling strategy according to the target scheduling power.

基于上述方法步骤，以下结合应用场景对更加具体的实施方式进行进一步说明。Based on the above method steps, a more specific implementation method is further described below in combination with application scenarios.

为了解决含有高渗透率的分布式光伏配电网调度问题，本发明提出了一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略。具体地，在应用场景中，按照上述分布式光伏优化调度策略进行如图3所示的在线优化调度流程。如图3所示，首先，按照集群划分原则，利用遗传算法对配网中的分布式光伏进行集群划分，基于电气距离的模块度指标和有功平衡度指标构建综合性能指标作为综合划分依据。其次，根据集群划分结果，将分布式光伏高渗透率的配电网分为若干区域智能体实现自治控制，并根据CommNet架构构建多智能体深度强化学习的分区分布式光伏优化调度框架。然后，以日运行成本最低为目标，并考虑相应约束条件，构建包含状态空间、动作空间和奖励函数的日前优化调度模型。最后，采用近端策略优化算法对该模型进行离线训练，利用训练好的模型进行在线优化调度决策。这一技术方案解决了分布式光伏与负载出力波动影响配电网正常运行的问题，提高了系统运行的经济性。In order to solve the scheduling problem of distributed photovoltaic distribution network with high penetration, the present invention proposes a distributed photovoltaic optimization scheduling strategy based on GA-MADRL-PPO combination. Specifically, in the application scenario, the online optimization scheduling process shown in Figure 3 is performed according to the above distributed photovoltaic optimization scheduling strategy. As shown in Figure 3, first, according to the cluster division principle, the distributed photovoltaic in the distribution network is clustered by using a genetic algorithm, and a comprehensive performance index based on the modularity index of the electrical distance and the active balance index is constructed as a comprehensive division basis. Secondly, according to the cluster division results, the distribution network with high penetration of distributed photovoltaic is divided into several regional intelligent agents to realize autonomous control, and a partitioned distributed photovoltaic optimization scheduling framework of multi-agent deep reinforcement learning is constructed according to the CommNet architecture. Then, with the lowest daily operating cost as the goal, and considering the corresponding constraints, a day-ahead optimization scheduling model containing state space, action space and reward function is constructed. Finally, the model is trained offline using the proximal strategy optimization algorithm, and the trained model is used for online optimization scheduling decisions. This technical solution solves the problem that the fluctuation of distributed photovoltaic and load output affects the normal operation of the distribution network, and improves the economy of system operation.

基于电气距离的模块度指标与有功平衡度指标的综合性能指标构建方法包括下述步骤A01至步骤A06。The method for constructing a comprehensive performance index based on the modularity index and the active power balance index of the electrical distance includes the following steps A01 to A06.

步骤A01：模块度用于度量集群划分的结果是否较为合理。数值越大表示划分的集群越合理。当网络整体被划分为一个集群时，模块度为0；当各自节点被划分为不同的集群时，模块度为负值，此时不存在社团结构。Step A01: Modularity is used to measure whether the cluster division result is reasonable. The larger the value, the more reasonable the cluster division. When the entire network is divided into one cluster, the modularity is 0; when each node is divided into different clusters, the modularity is negative, and there is no community structure at this time.

具体地，上述模块度通过以下公式(1)至公式(3)定义。Specifically, the above modularity It is defined by the following formulas (1) to (3).

在上述公式(1)至公式(3)中，A_ij表示连接节点i和j的边权，当节点i和j直接相连时取1，不相连时取0；k_i表示与节点i相连的所有边的边权之和；m表示整个网络的边权和；δ为0-1矩阵，若节点i和节点j位于集群内，则δ(i,j)＝1，不在同一集群内则δ(i,j)＝0。In the above formulas (1) to (3), _Aij represents the edge weight connecting nodes i and j, which is 1 when nodes i and j are directly connected and 0 when they are not connected; _kij represents the sum of the edge weights of all edges connected to node i; m represents the sum of the edge weights of the entire network; δ is a 0-1 matrix, if nodes i and j are in the same cluster, then δ(i,j)=1, if they are not in the same cluster, then δ(i,j)=0.

步骤A02，采用灵敏度计算电气距离，具体计算方式如下述公式(4)和公式(5)所示。Step A02, using sensitivity to calculate the electrical distance, the specific calculation method is shown in the following formula (4) and formula (5).

ΔV＝S_VQΔQ 公式(4)ΔV＝ _SVQΔQ Formula (4)

在上述公式(4)和公式(5)中，ΔV表示电压幅值变化量；ΔQ表示无功变化量；S_VQ表示系统的电压-无功灵敏度矩阵；d_ij表示节点j的无功变化时，节点j与节点i的电压幅值改变量之比，比率越小说明节点j对节点i的影响越大，即距离越近。In the above formulas (4) and (5), ΔV represents the change in voltage amplitude; ΔQ represents the change in reactive power; _SVQ represents the voltage-reactive sensitivity matrix of the system; _dij represents the ratio of the change in voltage amplitude of node j to that of node i when the reactive power of node j changes. The smaller the ratio, the greater the influence of node j on node i, that is, the closer the distance.

假设网络中共有n个节点，按照下述公式(6)采用欧氏距离节点j与节点i间的电气距离e_ij。Assuming that there are n nodes in the network, the electrical distance e _ij between node j and node i is calculated using the Euclidean distance according to the following formula (6).

进一步地，按照下述公式(7)定义基于节点i与节点j间的电气距离的边权A_ij。Furthermore, the edge weight A _ij based on the electrical distance between the node i and the node j is defined according to the following formula (7).

在上述公式(7)中，e表示由网络中任意两节点间电气距离e_ij组成的电气距离矩阵。将此公式(7)带入公式(1)即可得到基于电气距离的模块度指标 In the above formula (7), e represents the electrical distance matrix composed of the electrical distances e _ij between any two nodes in the network. Substituting this formula (7) into formula (1) can obtain the modularity index based on electrical distance:

步骤A03，按照下述公式(8)和公式(9)对有功平衡度进行定义。Step A03, define the active power balance according to the following formula (8) and formula (9).

在上述公式(8)和公式(9)中，N_c表示集群划分的个数；M表示所有集群的集合；表示集群c的有功平衡度；P_c,t表示集群c在t时刻的净功率；T为规定场景的时间长度，表示网络整体的有功平衡度。In the above formula (8) and formula (9), N _c represents the number of cluster divisions; M represents the set of all clusters; represents the active balance of cluster c; P _c,t represents the net power of cluster c at time t; T is the time length of the specified scenario, Indicates the overall active balance of the network.

步骤A04，按照下述公式(10)定义综合性能指标。Step A04, define the comprehensive performance index according to the following formula (10).

在上述公式(10)中，w₁、w₂分别表示模块度指标和有功平衡度指标的权重。In the above formula (10), w ₁ and w ₂ represent the weights of the modularity index and the active power balance index respectively.

步骤A05，以综合性能指标为适应度函数，指标大小衡量划分结果的优劣，利用配电网节点间的邻接矩阵进行编码，首先根据配网节点的连接情况得到“0-1”的邻接矩阵。在编码时，遗传算法通过对矩阵中元素1的搜索和修改形成新的邻接矩阵，0表示断开，1表示连接，修改后的矩阵表示网络集群的一种划分结果，即编码的新个体。Step A05, taking the comprehensive performance index as the fitness function, the index size measures the quality of the division result, and using the adjacency matrix between the distribution network nodes for encoding, firstly obtain the "0-1" adjacency matrix according to the connection status of the distribution network nodes. During encoding, the genetic algorithm forms a new adjacency matrix by searching and modifying the element 1 in the matrix, 0 represents disconnection, 1 represents connection, and the modified matrix represents a division result of the network cluster, that is, the new individual encoded.

步骤A06，选用了自适应的方法来调整交叉概率和变异概率，调整原则为：若适应度小于平均适应度，则赋予较大的交叉概率和变异概率；若适应度大于平均适应度，则根据迭代状态赋予交叉概率和变异概率。上述调整交叉概率和变异概率时可以采用下述公式(11)和公式(12)所示的调节公式。Step A06, an adaptive method is used to adjust the crossover probability and mutation probability. The adjustment principle is: if the fitness is less than the average fitness, a larger crossover probability and mutation probability are assigned; if the fitness is greater than the average fitness, a crossover probability and mutation probability are assigned according to the iteration state. The above adjustment of the crossover probability and mutation probability can be performed using the adjustment formulas shown in the following formulas (11) and (12).

在上述公式(11)和公式(12)中，P_c表示交叉概率；P_c,max、P_c,min分别表示交叉概率的最大值和最小值；P_m表示变异概率；P_m,max、P_m,min分别表示变异概率的最大值和最小值；I表示当前迭代的次数；I_max表示设定的最大迭代数；f_c表示进行交叉操作的2个个体中较大的适应度；f_m表示进行变异操作的个体适应度；f_avg表示种群的平均适应度。In the above formulas (11) and (12), _Pc represents the crossover probability; _Pc,max and _Pc,min represent the maximum and minimum values of the crossover probability respectively; _Pm represents the mutation probability; _Pm,max and Pm _,min represent the maximum and minimum values of the mutation probability respectively; I represents the number of current iterations; _Imax represents the set maximum number of iterations; _fc represents the larger fitness of the two individuals performing the crossover operation; _fm represents the fitness of the individual performing the mutation operation; and _favg represents the average fitness of the population.

根据本发明实施例提供的上述方法步骤，所涉及的构建多智能体深度强化学习的分区分布式光伏优化调度框架包括以下步骤B01至步骤B02。According to the above method steps provided by an embodiment of the present invention, the construction of a partitioned distributed photovoltaic optimization scheduling framework for multi-agent deep reinforcement learning includes the following steps B01 to B02.

步骤B01，MADRL以多智能体马尔可夫决策过程为基本框架，该框架可以表示为一个元组<N,{Sⁿ},{Aⁿ},P,R,γ>。其中N是智能体数量；Sⁿ为智能体n的状态集合，表示该智能体在t时段的状态，所有智能体的状态联合在一起组成了联合状态向量S＝S¹×S²×...×S^N，且有s_t∈S；A^N是智能体n的动作集合，表示该智能体在t时段选择的动作,所有智能体的动作组合在一起组成联合动作向量A＝A¹×A²×...×A^N,且有a_t∈A；P:S×A×S→[0,1]为状态转移概率，表示在s_t下执行动作a_t后环境状态转移到s_t+1的概率；R为奖励函数，表示在s_t下执行动作a_t后环境给出的奖励，且有r_t∈R,γ为折扣因子。在时段t，各智能体根据所观测的状态执行动作作用于环境，获得环境的奖励r_t，并使环境状态转变为s_t+1，如此与环境循环交互，利用产生的数据修改联合的动作策略，即联合动作和联合状态之间的映射关系π:S→A＝a～π(a|s)，以最大化累积回报。Step B01, MADRL uses the multi-agent Markov decision process as the basic framework, which can be expressed as a tuple <N,{S ⁿ },{A ⁿ },P,R,γ>. Where N is the number of agents; S ⁿ is the state set of agent n, represents the state of the agent in time period t. The states of all agents are combined to form a joint state vector S = S ¹ ×S ² ×... ×S ^N , and s _t ∈S; A ^N is the action set of agent n, represents the action selected by the agent in time period t. The actions of all agents are combined to form a joint action vector A = A ¹ × A ² × ... × A ^N , and a _t ∈ A; P: S × A × S → [0, 1] is the state transition probability, which represents the probability that the state of the environment is transferred to s _{t + 1} after executing action a _t under s _t ; R is the reward function, which represents the reward given by the environment after executing action a _t under s _t , and r _t ∈ R, γ is the discount factor. In time period t, each agent performs actions on the environment according to the observed state, obtains the reward r _t of the environment, and changes the state of the environment to s _{t + 1} , so as to interact with the environment cyclically and use the generated data to modify the joint action strategy, that is, the mapping relationship between joint action and joint state π: S → A = a ~ π (a | s) to maximize the cumulative return.

步骤B02，以CommNet作为多智能体深度强化学习神经网络架构。根据该网络结构，每个含有分布式光伏的智能体都有结构相同的神经网络进行决策，在时段t，对于第n个智能体，为保护隐私，其网络输入为自身的观测状态输出为决策动作编码层中，编码函数,将输入信息转化为隐层状态信息输入通信层。其中通信层是实现各智能体协作的关键，每个智能体将隐层状态信息h_t,m-1(m代表第m次通信)送入通信层网络f_m，并对相邻智能体通信层网络的输出h_t,m做均值池化处理，将所得的结果和h_t,m作为各相邻智能体下一层神经网络的输入。Step B02, CommNet is used as the multi-agent deep reinforcement learning neural network architecture. According to the network structure, each agent with distributed photovoltaics has a neural network with the same structure to make decisions. In time period t, for the nth agent, in order to protect privacy, its network input is its own observation state Output is decision action In the encoding layer, the encoding function is The input information is converted into hidden state information and input into the communication layer. The communication layer is the key to realize the cooperation of each intelligent agent. Each intelligent agent sends the hidden state information h _t,m-1 (m represents the mth communication) to the communication layer network f _m , and performs mean pooling on the output h _t,m of the communication layer network of the adjacent intelligent agent. The obtained result and h _t,m are used as the input of the next layer of neural network of each adjacent intelligent agent.

具体地，上述各智能体通信层的每一层输入和输出对应的迭代关系可以通过以下公式(13)和公式(14)表示。Specifically, the iterative relationship between the input and output of each layer of the above-mentioned intelligent agent communication layer can be expressed by the following formula (13) and formula (14).

在上述公式(13)和公式(14)中，H_m和C_m表示第m层通信层网络待更新参数，σ表示非线性激活函数；N(n)表示与智能体n相邻智能体的集合。最后一层网络为解码层，将隐层状态信息转化为实际动作。In the above formulas (13) and (14), _Hm and _Cm represent the parameters to be updated in the mth communication layer network, σ represents the nonlinear activation function, and N(n) represents the set of agents adjacent to agent n. The last layer of the network is the decoding layer, which converts the hidden state information into actual actions.

根据本发明实施例提供的上述方法步骤，所涉及的日前优化调度模型包括状态空间、动作空间和奖励函数。According to the above method steps provided by the embodiment of the present invention, the day-ahead optimization scheduling model involved includes a state space, an action space and a reward function.

具体地，状态空间是智能体所感知的环境信息。设立每个智能体状态空间包括本空间内光伏发电功率，负荷功率，可中断负荷量，电储能电量以及所处调度时段t，基于此，t时段第n个区域智能体的状态表示为下述公式(15)。Specifically, the state space is the environmental information perceived by the agent. The state space of each agent is established to include the photovoltaic power generation, load power, interruptible load, energy storage capacity and the scheduling period t in this space. Based on this, the state of the agent in the nth area in period t is expressed as the following formula (15).

在上述公式(15)中，P_t ^n,PV、P_t ^n,load分别表示t时段自治区域n内光伏发电功率、负荷功率；分别表示t-1时段自治区域n内电储能电量、可中断负荷量。In the above formula (15), P _t ^n,PV and P _t ^n,load represent the photovoltaic power generation and load power in autonomous area n in period t, respectively; They represent the amount of energy storage and interruptible load in the autonomous region n during period t-1 respectively.

为应对分布式光伏出力和负载的不确定性，将光伏、负荷历史预测数据叠加预测偏差的随机性，作为公式(15)中光伏与负载的输入，并假设预测偏差服从下述公式(16)和公式(17)所示的正态分布。In order to deal with the uncertainty of distributed photovoltaic output and load, the historical forecast data of photovoltaic and load are superimposed with the randomness of the forecast deviation as the input of photovoltaic and load in formula (15), and it is assumed that the forecast deviation obeys the normal distribution shown in the following formulas (16) and (17).

σ＝εP_t ⁿ 公式(17)σ＝εP _t ⁿ Formula (17)

在上述公式(16)和公式(17)中，ΔP_t ⁿ表示t时段自治区域n光伏与负荷之间的预测偏差；μ、σ表示预测偏差的期望与标准差；ε为预测偏差占预测值的百分比。In the above formulas (16) and (17), ΔP _t ⁿ represents the prediction deviation between the photovoltaic and load in autonomous area n during period t; μ and σ represent the expectation and standard deviation of the prediction deviation; ε is the percentage of the prediction deviation in the prediction value.

具体地，动作空间为相关决策变量，在动作空间中，每个区域智能体的动作空间为其控制区域内可控设备出力，具体可表示为下述公式(18)。Specifically, the action space is a relevant decision variable. In the action space, the action space of each regional agent is the output of the controllable devices in its control area, which can be specifically expressed as the following formula (18).

在上述公式(18)中，P_t ^n,L表示t时段自治区域n内可中断负荷功率；P_t ^n,es,分别表示t时段自治区域n内电储能发出的有功功率和无功功率。In the above formula (18), _Ptn ^,L represents the interruptible load power in autonomous area n during period t; _Ptn ^,es , They respectively represent the active power and reactive power generated by the energy storage in the autonomous region n during time period t.

进一步地，动作空间中考虑到的电储能约束可以表示为下述公式(19)至公式(22)。Furthermore, the electric energy storage constraints considered in the action space can be expressed as the following formulas (19) to (22).

在上述公式(19)至公式(22)中，和分别表示电储能最大充电功率、电储能最大放电功率；和分别表示最大储电量、最小储电量；η^n,ch和η^n,dis分别表示电储能的充电效率、放电效率；S^n,es表示区域n电储能所接逆变器的最大视在功率。In the above formulas (19) to (22), and They represent the maximum charging power and the maximum discharging power of the electric energy storage respectively; and They represent the maximum storage capacity and the minimum storage capacity respectively; η ^n,ch and η ^n,dis represent the charging efficiency and discharging efficiency of the energy storage respectively; Sn ^,es represents the maximum apparent power of the inverter connected to the energy storage in area n.

进一步地，动作空间中考虑到的可中断负荷约束表示为下述公式(23)。Furthermore, the interruptible load constraint considered in the action space is expressed as the following formula (23).

P_t ^n,L＜P_t ^n,Lmax 公式(23) _Ptn ^,L ＜ _Ptn ^,Lmax Formula (23)

具体地，针对奖励函数，多智能体深度强化学习的训练目标为寻找最优策略π^*使得累积回报期望值最大，具体表示为下述公式(24)。Specifically, for the reward function, the training goal of multi-agent deep reinforcement learning is to find the optimal strategy π ^* so that the expected value of the cumulative return is maximized, which is specifically expressed as the following formula (24).

在上述公式(24)中，ρ_π表示策略π形成的轨迹；T表示一天总调度时段数。In the above formula (24), _ρπ represents the trajectory formed by strategy π; T represents the total number of scheduling periods in a day.

进一步地，本发明实施例提供的上述方法步骤中，基于以下涉及构思考虑运行成本：以主动配电网运行成本最低为优化目标，目标函数表示为下述公式(25)。Furthermore, in the above method steps provided by the embodiment of the present invention, the operating cost is considered based on the following concept: the optimization goal is to minimize the operating cost of the active distribution network, and the objective function is expressed as the following formula (25).

在上述公式(25)中，N表示划分区域的数量；分别表示t时段自治区域n内电储能运行成本与可中断负荷成本；c_grid(t)表示t时段从上级电网购电成本。In the above formula (25), N represents the number of divided areas; They represent the energy storage operation cost and interruptible load cost in autonomous region n during period t respectively; c _grid (t) represents the cost of purchasing electricity from the upper-level power grid during period t.

对于电储能考虑其度电成本，定义度电成本系数为ρ，则电储能运行成本可以表示为下述公式(26)。For electric energy storage, considering its cost per kilowatt-hour, the cost per kilowatt-hour coefficient is defined as ρ, then the operating cost of the electric energy storage can be expressed as the following formula (26).

从上级电网购电成本可以表示为下述公式(27)和公式(28)。The cost of purchasing electricity from the upper power grid can be expressed as the following formula (27) and formula (28).

在上述公式(27)和公式(28)中，λ_buy(t),λ_sell(t)分别表示t时段向上级购电、售电价格；P_t ^grid表示从上级电网购买的电功率，P_t ^grid＞0表示购电，P_t ^grid＜0表示售电。In the above formulas (27) and (28), λ _buy (t) and λ _sell (t) represent the prices of electricity purchased from the superior power grid and electricity sold to the superior power grid in period t, respectively; P _t ^grid represents the electric power purchased from the superior power grid, P _t ^grid >0 represents purchasing electricity, and P _t ^grid <0 represents selling electricity.

可中断负荷成本可以表示为下述公式(29)。The interruptible load cost can be expressed as the following formula (29).

在上述公式(29)中，c_cu为单位可中断负荷补偿成本。In the above formula (29), c _cu is the unit interruptible load compensation cost.

综上可知，考虑到的总运行成本可以表示为下述公式(30)。In summary, the total operating cost considered can be expressed as the following formula (30).

考虑节点电压约束项，各节点电压幅值需限定在安全范围内，具体可以表示为下述公式(31)。Considering the node voltage constraint, the voltage amplitude of each node needs to be limited to a safe range, which can be specifically expressed as the following formula (31).

0.95u_N≤u_j,t≤1.05u_N 公式(31)0.95u _N ≤u _j,t ≤1.05u _N Formula (31)

在上述公式(31)中，u_j,t为t时刻节点j电压标幺值；u_N为系统额定电压。In the above formula (31), u _j,t is the per unit value of the voltage at node j at time t; u _N is the rated voltage of the system.

计算各智能体作为调度决策后系统潮流分布，根据求得各节点电压智能体获得的奖励可以表示为下述公式(32)。Calculate the system power flow distribution after the scheduling decision for each intelligent agent. The reward obtained by the intelligent agent based on the voltage of each node can be expressed as the following formula (32).

在上述公式(32)中，k表示惩罚函数。In the above formula (32), k represents the penalty function.

各区域智能体为合作关系，因此环境每时段反馈给各区域智能体相同的全局奖励可以表示为下述公式(33)。The regional agents are in a cooperative relationship, so the environment feeds back the same global reward to the regional agents in each period, which can be expressed as the following formula (33).

r(t)＝F₁(t)+F₂(t) 公式(33)r(t)＝F ₁ (t)+F ₂ (t) Formula (33)

进一步的，采用PPO算法对调度模型离线训练的步骤包括以下步骤D01至步骤D05。Furthermore, the step of using the PPO algorithm to train the scheduling model offline includes the following steps D01 to D05.

步骤D01，确定每个调度周期总时段数T和区域智能体神经网络训练轮数M，并随机初始化各区域智能体策略网络参数θ_a和价值网络参数θ_c。Step D01, determine the total number of time periods T and the number of training rounds M of the regional agent neural network in each scheduling cycle, and randomly initialize the strategy network parameters θ _a and value network parameters θ _c of each regional agent.

步骤D02，初始化环境，以每天00:00作为初始调度时刻，对该时刻调度之前设备状态进行初始化。Step D02, initialize the environment, take 00:00 every day as the initial scheduling time, and initialize the device status before scheduling at this time.

步骤D03，各区域智能体与环境交互，收集区域状态信息作为网络输入，输出动作信息作为调度指令，该时段调度结束后计算全局奖励r_t；将各时段采样经验(s_t,a_t,r_t,s_t+1)存储于经验池中，用于网络参数更新。Step D03: Each regional agent interacts with the environment and collects regional status information As network input, output action information As a scheduling instruction, the global reward r _t is calculated after the scheduling of this period ends; the sampled experience (s _t , a _t , r _t , st ₊₁ ) of each period is stored in the experience pool for network parameter update.

步骤D04，区域智能体训练。一个调度周期采样结束后，利用经验池中T条经验，采用梯度下降法更新各智能体策略网络和价值网络，更新目标为最大化一个调度周期累积全局奖励两类网络学习率分别记为l_a和l_c。Step D04: Regional agent training. After a scheduling cycle sampling is completed, the T experience in the experience pool is used to update the strategy network and value network of each agent using the gradient descent method. The update goal is to maximize the cumulative global reward of a scheduling cycle. The learning rates of the two types of networks are denoted as _la and l _c respectively.

步骤D05，判断是否达到所设定的最大训练轮数M，若满足则结束训练；若未满足则进行下一轮网络参数更新。Step D05, determining whether the set maximum number of training rounds M is reached, if so, the training is terminated; if not, the next round of network parameter update is performed.

根据本发明实施例提供的上述方法步骤，在一种具体的应用场景中，按照如图4所示的基于MADRL的分布式电源优化调度框架为分布式光伏系统建立调度模型。如图4所示，构建环境模型，将分布式光伏系统配电网的主网上的设备分为n个区域，每个区域内的设备包括：光伏设备、负荷设备、储能设备和可中断设备。According to the above method steps provided by an embodiment of the present invention, in a specific application scenario, a scheduling model is established for a distributed photovoltaic system according to the MADRL-based distributed power optimization scheduling framework as shown in FIG4. As shown in FIG4, an environmental model is constructed to divide the devices on the main network of the distributed photovoltaic system distribution network into n areas, and the devices in each area include: photovoltaic devices, load devices, energy storage devices, and interruptible devices.

进一步地，根据上述步骤D03，如图4所示N个区域智能体与环境模型交互，环境模型中每个区域将区域状态信息输入网络，网络向每个区域返回该区域的区域状态信息对应的区域动作信息作为调度指令。在每个调度时段(例如表示为t)结束后，环境模型计算该时段调度结束后的全局奖励r_t，并且，将当前调度时段对应的采样经验(s_t,a_t,r_t,s_t+1)存储于经验池中以用于网络参数更新。Further, according to the above step D03, as shown in FIG4 , N regional agents interact with the environment model, and each region in the environment model sends regional state information Input the network, and the network returns the regional action information corresponding to the regional state information of each region to each region After each scheduling period (e.g., denoted as t) ends, the environment model calculates the global reward r _t after the scheduling of the period ends, and stores the sampled experience (s _t , a _t , r _t , s _t+1 ) corresponding to the current scheduling period in the experience pool for network parameter update.

进一步地，根据上述步骤D04和步骤D05，如图4所示，基于经验池中存储的数据，采用梯度下降法不断更新各智能体策略网络和价值网络，直至训练完成。Further, according to the above-mentioned step D04 and step D05, as shown in FIG4 , based on the data stored in the experience pool, the gradient descent method is used to continuously update the strategy network and value network of each agent until the training is completed.

综上，本发明达到了结合多智能体深度强化学习和近端策略优化构建调度模型来完成对分布式光伏配电网的调度的目的，从而实现了增强分布式光伏系统对不确定情况的适应能力、提高分布式光伏系统的安全可靠性的技术效果，进而解决了相关技术提供的分布式光伏调度方案难以适应分布式光伏出力和负荷不确定的情况导致分布式光伏系统的安全可靠性较差技术问题。In summary, the present invention achieves the purpose of combining multi-agent deep reinforcement learning and proximal strategy optimization to construct a scheduling model to complete the scheduling of distributed photovoltaic distribution networks, thereby achieving the technical effect of enhancing the adaptability of distributed photovoltaic systems to uncertain situations and improving the safety and reliability of distributed photovoltaic systems, and further solves the technical problem that the distributed photovoltaic scheduling scheme provided by the relevant technology is difficult to adapt to the situation of uncertain distributed photovoltaic output and load, resulting in poor safety and reliability of distributed photovoltaic systems.

在本实施例中，还提供了一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略装置，该装置用于实现上述实施例及优选实施方式，已经进行过说明的不再赘述。如以下所使用的，属于“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现，但是硬件，或者软件和硬件的组合的实现也是可能并被构想的。In this embodiment, a distributed photovoltaic optimization scheduling strategy device based on GA-MADRL-PPO is also provided, which is used to implement the above embodiments and preferred implementations, and will not be repeated for what has been described. As used below, a "module" is a combination of software and/or hardware that can implement a predetermined function. Although the device described in the following embodiments is preferably implemented in software, the implementation of hardware, or a combination of software and hardware, is also possible and conceivable.

图5是根据本发明实施例的一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略装置的结构框图，如图5所示，该装置包括：划分模块501，用于根据分布式光伏配电网的模块度指标和有功平衡度指标，对分布式光伏配电网中的多个节点进行集群划分，得到集群划分结果；FIG5 is a structural block diagram of a distributed photovoltaic optimization scheduling strategy device based on GA-MADRL-PPO combination according to an embodiment of the present invention. As shown in FIG5 , the device includes: a division module 501, which is used to cluster multiple nodes in a distributed photovoltaic distribution network according to a modularity index and an active power balance index of the distributed photovoltaic distribution network to obtain a cluster division result;

构建模块502，用于基于集群划分结果，构建分布式光伏配电网对应的初始调度模型，其中，初始调度模型采用多智能体深度强化学习的模型框架；A construction module 502 is used to construct an initial scheduling model corresponding to the distributed photovoltaic distribution network based on the cluster division result, wherein the initial scheduling model adopts a model framework of multi-agent deep reinforcement learning;

训练模块503，用于利用分布式光伏配电网的调度经验数据，对初始调度模型进行近端策略优化训练，得到目标调度模型；The training module 503 is used to use the dispatching experience data of the distributed photovoltaic distribution network to perform proximal strategy optimization training on the initial dispatching model to obtain the target dispatching model;

调度模块504，用于将分布式光伏配电网的实时运行数据输入至目标调度模型，生成调度策略，其中，调度策略用于调整分布式光伏配电网的运行功率。The scheduling module 504 is used to input the real-time operation data of the distributed photovoltaic distribution network into the target scheduling model to generate a scheduling strategy, wherein the scheduling strategy is used to adjust the operating power of the distributed photovoltaic distribution network.

可选地，上述基于GA-MADRL-PPO结合的分布式光伏优化调度策略装置除包括上述所有模块之外还包括指标模块505(图中未示出)，用于：利用分布式光伏配电网对应的目标灵敏度矩阵以及多个节点对应的电压幅值变化量和无功变化量，计算得到多个节点中任意两个节点之间的电气距离，其中，目标灵敏度矩阵用于确定多个节点的节点电压与无功功率之间的敏感度关系；基于多个节点中任意两个节点之间的电气距离和边权，构建模块度指标；基于分布式光伏配电网对应的净功率数据，构建有功平衡度指标。Optionally, the distributed photovoltaic optimization scheduling strategy device based on the GA-MADRL-PPO combination includes, in addition to all the above modules, an indicator module 505 (not shown in the figure), which is used to: use the target sensitivity matrix corresponding to the distributed photovoltaic distribution network and the voltage amplitude change and reactive power change corresponding to multiple nodes to calculate the electrical distance between any two nodes among the multiple nodes, wherein the target sensitivity matrix is used to determine the sensitivity relationship between the node voltage and reactive power of the multiple nodes; construct a modularity index based on the electrical distance and edge weight between any two nodes among the multiple nodes; and construct an active balance index based on the net power data corresponding to the distributed photovoltaic distribution network.

可选地，上述划分模块501，还用于：对模块度指标和有功平衡度指标进行加权计算，得到综合性能指标；利用综合性能指标和目标遗传算法，对多个节点进行集群划分，得到集群划分结果。Optionally, the above-mentioned division module 501 is also used to: perform weighted calculation on the modularity index and the active power balance index to obtain a comprehensive performance index; and perform cluster division on multiple nodes using the comprehensive performance index and the target genetic algorithm to obtain a cluster division result.

可选地，上述划分模块501，还用于：对多个节点的邻接矩阵进行编码，得到编码结果；基于综合性能指标和编码结果，对多个节点进行适应度评估，得到评估结果；响应于评估结果未满足预设适应度条件，根据评估结果和预设调整策略，对编码结果对应的节点交叉概率和节点编译概率进行调整，更新编码结果并重新进行适应度评估；响应于评估结果满足预设适应度条件，基于编码结果确定集群划分结果。Optionally, the above-mentioned division module 501 is also used to: encode the adjacency matrix of multiple nodes to obtain a coding result; based on the comprehensive performance index and the coding result, perform fitness evaluation on the multiple nodes to obtain an evaluation result; in response to the evaluation result not meeting the preset fitness condition, adjust the node crossover probability and node compilation probability corresponding to the coding result according to the evaluation result and the preset adjustment strategy, update the coding result and re-evaluate the fitness; in response to the evaluation result meeting the preset fitness condition, determine the cluster division result based on the coding result.

可选地，上述构建模块502，还用于：根据集群划分结果，确定多个节点对应的多个区域智能体，其中，多个区域智能体具备马尔可夫决策控制功能；基于多个区域智能体构建深度强化学习框架，其中，深度强化学习框架至少包括通信层，通信层用于实现多个区域智能体的协作决策；基于深度强化学习框架，为分布式光伏配电网构建初始调度模型。Optionally, the above-mentioned construction module 502 is also used to: determine multiple regional agents corresponding to multiple nodes according to the cluster division results, wherein the multiple regional agents have Markov decision control functions; construct a deep reinforcement learning framework based on the multiple regional agents, wherein the deep reinforcement learning framework includes at least a communication layer, and the communication layer is used to realize collaborative decision-making of multiple regional agents; based on the deep reinforcement learning framework, construct an initial scheduling model for the distributed photovoltaic distribution network.

可选地，初始调度模型包括状态空间；上述构建模块502，还用于：对于深度强化学习框架中的任意一个目标区域智能体，根据目标区域智能体对应的自治区域内的目标发电功率、目标负荷功率、可中断负荷量、电储能电量和调度时段，确定初始调度模型中目标区域智能体对应的状态空间。Optionally, the initial scheduling model includes a state space; the above-mentioned construction module 502 is also used to: for any target area intelligent agent in the deep reinforcement learning framework, determine the state space corresponding to the target area intelligent agent in the initial scheduling model according to the target power generation power, target load power, interruptible load, electric energy storage capacity and scheduling period in the autonomous area corresponding to the target area intelligent agent.

可选地，上述基于GA-MADRL-PPO结合的分布式光伏优化调度策略装置除包括上述所有模块之外还包括偏差模块506(图中未示出)，用于：获取目标区域智能体对应的自治区域在调度时段内的实际发电功率、实际负荷功率、可中断负荷量和电储能电量；对实际发电功率叠加预设预测偏差得到目标发电功率，对实际负荷功率叠加预设预测偏差得到目标负荷功率，其中，预设预测偏差服从正态分布。Optionally, the distributed photovoltaic optimization scheduling strategy device based on the GA-MADRL-PPO combination includes, in addition to all the above modules, a deviation module 506 (not shown in the figure), which is used to: obtain the actual power generation power, actual load power, interruptible load and energy storage capacity of the autonomous area corresponding to the target area intelligent body during the scheduling period; superimpose a preset prediction deviation on the actual power generation power to obtain the target power generation power, and superimpose a preset prediction deviation on the actual load power to obtain the target load power, wherein the preset prediction deviation obeys a normal distribution.

可选地，初始调度模型包括动作空间，动作空间中包括多个决策变量；上述构建模块502，还用于：对于深度强化学习框架中的任意一个目标区域智能体，利用目标区域智能体对应的状态空间中的状态信息和目标约束，计算得到目标区域智能体对应的多个决策变量，其中，目标约束包括预设的电储能约束和可中断负荷约束，多个决策变量包括：目标区域智能体的自治区域内的可中断负荷功率、电储能发出的有功功率和电储能发出的无功功率。Optionally, the initial scheduling model includes an action space, which includes multiple decision variables. The above-mentioned construction module 502 is also used to: for any target area intelligent agent in the deep reinforcement learning framework, use the state information and target constraints in the state space corresponding to the target area intelligent agent to calculate multiple decision variables corresponding to the target area intelligent agent, wherein the target constraints include preset electric energy storage constraints and interruptible load constraints, and the multiple decision variables include: the interruptible load power within the autonomous area of the target area intelligent agent, the active power generated by the electric energy storage, and the reactive power generated by the electric energy storage.

可选地，初始调度模型包括奖励函数；上述构建模块502，还用于：对于深度强化学习框架中多个区域智能体，基于周期运行成本约束和节点电压约束，确定奖励函数，其中，周期运行成本由多个区域智能体在目标调度周期内的目标运行成本确定，目标运行成本包括电储能运行成本、可中断负荷成本和购电成本，节点电压约束由分布式光伏配电网的系统额定电压和多个区域智能体内节点电压幅值的目标取值范围确定。Optionally, the initial scheduling model includes a reward function; the above-mentioned building module 502 is also used to: for multiple regional agents in the deep reinforcement learning framework, determine the reward function based on the periodic operating cost constraint and the node voltage constraint, wherein the periodic operating cost is determined by the target operating cost of the multiple regional agents within the target scheduling cycle, the target operating cost includes the electric energy storage operating cost, the interruptible load cost and the electricity purchase cost, and the node voltage constraint is determined by the system rated voltage of the distributed photovoltaic distribution network and the target value range of the node voltage amplitude in the multiple regional agents.

可选地，上述训练模块503，还用于：对分布式光伏配电网在多个历史调度周期内的历史调度数据进行采样，得到调度经验数据；基于调度经验数据和近端策略优化算法对初始调度模型进行多个轮次的离线训练，以更新初始调度模型中多个区域智能体对应的策略网络参数和价值网络参数，得到目标调度模型。Optionally, the above-mentioned training module 503 is also used to: sample historical scheduling data of the distributed photovoltaic distribution network in multiple historical scheduling cycles to obtain scheduling experience data; perform multiple rounds of offline training on the initial scheduling model based on the scheduling experience data and the proximal strategy optimization algorithm to update the strategy network parameters and value network parameters corresponding to multiple regional intelligent agents in the initial scheduling model to obtain the target scheduling model.

可选地，上述调度模块504，还用于：获取实时运行数据，实时运行数据至少包括：分布式光伏配电网在当前调度时段内的光伏发电功率、实时负荷功率、可中断负荷量和电储能电量；将实时运行数据输入至目标调度模型，并获取目标调度模型输出的决策信息，其中，决策信息用于确定分布式光伏配电网的多个自治区域对应的目标调度功率，目标调度功率包括：目标可中断负荷功率、电储能发出的目标有功功率和电储能发出的目标无功功率；根据目标调度功率生成调度策略。Optionally, the above-mentioned scheduling module 504 is also used to: obtain real-time operation data, the real-time operation data at least includes: the photovoltaic power generation power, real-time load power, interruptible load and electric energy storage power of the distributed photovoltaic distribution network in the current scheduling period; input the real-time operation data into the target scheduling model, and obtain decision information output by the target scheduling model, wherein the decision information is used to determine the target scheduling power corresponding to multiple autonomous areas of the distributed photovoltaic distribution network, the target scheduling power includes: the target interruptible load power, the target active power emitted by the electric energy storage and the target reactive power emitted by the electric energy storage; generate a scheduling strategy according to the target scheduling power.

需要说明的是，上述各个模块是可以通过软件或硬件来实现的，对于后者，可以通过以下方式实现，但不限于此：上述模块均位于同一处理器中；或者，上述各个模块以任意组合的形式分别位于不同的处理器中。It should be noted that the above modules can be implemented by software or hardware. For the latter, it can be implemented in the following ways, but not limited to: the above modules are all located in the same processor; or the above modules are located in different processors in any combination.

根据本发明实施例的又一方面，还提供了一种基于GA-MADRL-PPO结合的分布式光伏优化调度策略系统，包括存储器和处理器，存储器中存储有计算机程序，处理器被设置为运行计算机程序以执行前述任意一项的基于GA-MADRL-PPO结合的分布式光伏优化调度策略方法。According to another aspect of an embodiment of the present invention, a distributed photovoltaic optimization scheduling strategy system based on the combination of GA-MADRL-PPO is also provided, including a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to run the computer program to execute any one of the aforementioned distributed photovoltaic optimization scheduling strategy methods based on the combination of GA-MADRL-PPO.

可选地，在本实施例中，上述处理器可以被设置为通过计算机程序执行以下步骤：根据分布式光伏配电网的模块度指标和有功平衡度指标，对分布式光伏配电网中的多个节点进行集群划分，得到集群划分结果；基于集群划分结果，构建分布式光伏配电网对应的初始调度模型，其中，初始调度模型采用多智能体深度强化学习的模型框架；利用分布式光伏配电网的调度经验数据，对初始调度模型进行近端策略优化训练，得到目标调度模型；将分布式光伏配电网的实时运行数据输入至目标调度模型，生成调度策略，其中，调度策略用于调整分布式光伏配电网的运行功率。Optionally, in this embodiment, the processor may be configured to perform the following steps through a computer program: clustering multiple nodes in the distributed photovoltaic distribution network according to the modularity index and active power balance index of the distributed photovoltaic distribution network to obtain clustering results; constructing an initial scheduling model corresponding to the distributed photovoltaic distribution network based on the clustering results, wherein the initial scheduling model adopts a model framework of multi-agent deep reinforcement learning; performing proximal strategy optimization training on the initial scheduling model using the scheduling experience data of the distributed photovoltaic distribution network to obtain a target scheduling model; inputting the real-time operation data of the distributed photovoltaic distribution network into the target scheduling model to generate a scheduling strategy, wherein the scheduling strategy is used to adjust the operating power of the distributed photovoltaic distribution network.

可选地，在本实施例中的具体示例可以参考上述实施例及其可选实施方式中所描述的示例，在此不再赘述。Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiment and its optional implementation manners, which will not be described in detail here.

上述本发明实施例序号仅为了描述，不代表实施例的优劣。The serial numbers of the above embodiments of the present invention are for description only and do not represent the advantages or disadvantages of the embodiments.

在本发明的上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above embodiments of the present invention, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本发明所提供的几个实施例中，应该理解到，所揭露的技术内容，可通过其它的方式实现。其中，以上所描述的装置实施例仅是示意性的，例如所述单元的划分，可以为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，单元或模块的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed technical content can be implemented in other ways. Among them, the device embodiments described above are only schematic. For example, the division of the units can be a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以通过软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、ROM、RAM、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, server or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: various media that can store program codes, such as USB flash drives, ROM, RAM, mobile hard disks, magnetic disks or optical disks.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principle of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.

Claims

1. A distributed photovoltaic optimization scheduling strategy method based on GA-MADRL-PPO combination is characterized by comprising the following steps:

According to the modularity index and the active balance index of the distributed photovoltaic power distribution network, performing cluster division on a plurality of nodes in the distributed photovoltaic power distribution network to obtain a cluster division result;

Based on the cluster division result, constructing an initial scheduling model corresponding to the distributed photovoltaic power distribution network, wherein the initial scheduling model adopts a model framework of multi-agent deep reinforcement learning;

Performing near-end strategy optimization training on the initial scheduling model by using scheduling experience data of the distributed photovoltaic power distribution network to obtain a target scheduling model;

And inputting the real-time operation data of the distributed photovoltaic power distribution network to the target scheduling model to generate a scheduling strategy, wherein the scheduling strategy is used for adjusting the operation power of the distributed photovoltaic power distribution network.

2. The GA-MADRL-PPO-combination-based distributed photovoltaic optimal scheduling policy method of claim 1, further comprising:

calculating to obtain the electrical distance between any two nodes in the plurality of nodes by using a target sensitivity matrix corresponding to the distributed photovoltaic power distribution network and voltage amplitude variation and reactive power variation corresponding to the plurality of nodes, wherein the target sensitivity matrix is used for determining the sensitivity relationship between node voltages and reactive power of the plurality of nodes;

Constructing the modularity index based on the electrical distance and the side weight between any two nodes in the plurality of nodes;

And constructing the active balance index based on the net power data corresponding to the distributed photovoltaic power distribution network.

3. The distributed photovoltaic optimization scheduling policy method based on GA-MADRL-PPO combination according to claim 1, wherein performing cluster division on the plurality of nodes in the distributed photovoltaic power distribution network according to the modularity index and the active balance index of the distributed photovoltaic power distribution network, to obtain the cluster division result includes:

Weighting and calculating the module degree index and the active balance degree index to obtain a comprehensive performance index;

and carrying out cluster division on the plurality of nodes by utilizing the comprehensive performance index and the target genetic algorithm to obtain a cluster division result.

4. The GA-MADRL-PPO-combination-based distributed photovoltaic optimization scheduling policy method of claim 3, wherein performing cluster division on the plurality of nodes by using the comprehensive performance index and the target genetic algorithm to obtain the cluster division result comprises:

Encoding the adjacent matrixes of the plurality of nodes to obtain an encoding result;

based on the comprehensive performance index and the coding result, carrying out fitness evaluation on the plurality of nodes to obtain an evaluation result;

Responding to the evaluation result that the preset fitness condition is not met, adjusting the node crossing probability and the node compiling probability corresponding to the coding result according to the evaluation result and a preset adjustment strategy, updating the coding result and carrying out the fitness evaluation again;

and responding to the evaluation result to meet the preset fitness condition, and determining the cluster division result based on the coding result.

5. The distributed photovoltaic optimization scheduling policy method based on GA-MADRL-PPO combination according to claim 1, wherein constructing the initial scheduling model corresponding to the distributed photovoltaic power distribution network based on the cluster division result includes:

determining a plurality of regional agents corresponding to the plurality of nodes according to the cluster division result, wherein the regional agents have a Markov decision control function;

Constructing a deep reinforcement learning framework based on the plurality of regional agents, wherein the deep reinforcement learning framework comprises at least a communication layer for implementing collaborative decisions of the plurality of regional agents;

And constructing the initial scheduling model for the distributed photovoltaic power distribution network based on the deep reinforcement learning framework.

6. The GA-MADRL-PPO-combination-based distributed photovoltaic optimization scheduling policy method of claim 5, wherein the initial scheduling model comprises a state space; based on the deep reinforcement learning framework, constructing the initial scheduling model for the distributed photovoltaic power distribution network includes:

And for any one target area intelligent agent in the deep reinforcement learning framework, determining the state space corresponding to the target area intelligent agent in the initial scheduling model according to the target power generation, the target load power, the interruptible load quantity, the electric energy storage electric quantity and the scheduling period in the autonomous area corresponding to the target area intelligent agent.

7. The GA-MADRL-PPO-combination-based distributed photovoltaic optimal scheduling policy method of claim 6, further comprising:

Acquiring actual power generation, actual load power, the interruptible load quantity and the electric energy storage electric quantity of an autonomous region corresponding to the target region intelligent agent in the scheduling period;

and superposing a preset prediction deviation on the actual power to obtain the target power, and superposing the preset prediction deviation on the actual load power to obtain the target load power, wherein the preset prediction deviation obeys normal distribution.

8. The GA-MADRL-PPO-combination-based distributed photovoltaic optimization scheduling policy method of claim 6, wherein the initial scheduling model comprises an action space comprising a plurality of decision variables therein; based on the deep reinforcement learning framework, constructing the initial scheduling model for the distributed photovoltaic power distribution network includes:

For any one target area agent in the deep reinforcement learning framework, calculating to obtain the plurality of decision variables corresponding to the target area agent by using state information and target constraints in the state space corresponding to the target area agent, wherein the target constraints comprise preset electric energy storage constraints and interruptible load constraints, and the plurality of decision variables comprise: and the interruptible load power, the active power emitted by the electric energy storage and the reactive power emitted by the electric energy storage in the autonomous region of the target region intelligent agent.

9. The GA-MADRL-PPO-combination-based distributed photovoltaic optimization scheduling policy method of claim 6, wherein the initial scheduling model comprises a reward function; based on the deep reinforcement learning framework, constructing the initial scheduling model for the distributed photovoltaic power distribution network includes:

Determining the rewarding function for the plurality of regional agents in the deep reinforcement learning framework based on a periodic operation cost constraint and a node voltage constraint, wherein the periodic operation cost is determined by a target operation cost of the plurality of regional agents in a target scheduling period, the target operation cost comprises an electric energy storage operation cost, an interruptible load cost and a power purchase cost, and the node voltage constraint is determined by a system rated voltage of the distributed photovoltaic power distribution network and a target value range of node voltage amplitude values in the plurality of regional agents.

10. The GA-MADRL-PPO-combination-based distributed photovoltaic optimization scheduling policy method of claim 1, wherein performing near-end policy optimization training on the initial scheduling model using the scheduling experience data of the distributed photovoltaic power distribution network, obtaining the target scheduling model comprises:

Sampling historical scheduling data of the distributed photovoltaic power distribution network in a plurality of historical scheduling periods to obtain scheduling experience data;

and performing offline training on the initial scheduling model for a plurality of rounds based on the scheduling experience data and a near-end strategy optimization algorithm so as to update strategy network parameters and value network parameters corresponding to a plurality of regional agents in the initial scheduling model and obtain a target scheduling model.

11. The GA-MADRL-PPO-combination-based distributed photovoltaic optimization scheduling policy method of claim 1, wherein inputting the real-time operational data of the distributed photovoltaic power distribution network to the target scheduling model, generating the scheduling policy comprises:

Acquiring the real-time operation data, wherein the real-time operation data at least comprises: photovoltaic power generation power, real-time load power, interruptible load quantity and electric energy storage electric quantity of the distributed photovoltaic power distribution network in the current scheduling period;

Inputting the real-time operation data into the target scheduling model, and acquiring decision information output by the target scheduling model, wherein the decision information is used for determining target scheduling power corresponding to a plurality of autonomous areas of the distributed photovoltaic power distribution network, and the target scheduling power comprises: the target can interrupt load power, target active power sent by the electric energy storage and target reactive power sent by the electric energy storage;

and generating the scheduling strategy according to the target scheduling power.

12. A distributed photovoltaic optimal scheduling strategy device based on GA-MADRL-PPO combination is characterized by comprising:

the dividing module is used for carrying out cluster division on a plurality of nodes in the distributed photovoltaic power distribution network according to the modularity index and the active balance index of the distributed photovoltaic power distribution network to obtain a cluster division result;

the construction module is used for constructing an initial scheduling model corresponding to the distributed photovoltaic power distribution network based on the cluster division result, wherein the initial scheduling model adopts a model frame of multi-agent deep reinforcement learning;

The training module is used for performing near-end strategy optimization training on the initial scheduling model by using scheduling experience data of the distributed photovoltaic power distribution network to obtain a target scheduling model;

the scheduling module is used for inputting the real-time operation data of the distributed photovoltaic power distribution network to the target scheduling model to generate a scheduling strategy, wherein the scheduling strategy is used for adjusting the operation power of the distributed photovoltaic power distribution network.

13. A GA-MADRL-PPO-combination based distributed photovoltaic optimal scheduling policy system comprising a memory and a processor, the memory having stored therein a computer program, the processor being arranged to run the computer program to perform the GA-MADRL-PPO-combination based distributed photovoltaic optimal scheduling policy method of any of the claims 1 to 11.