CN107015861A

CN107015861A - A kind of Cascade Reservoirs Optimized Operation multi-core parallel concurrent based on Fork/Join frameworks calculates design method

Info

Publication number: CN107015861A
Application number: CN201611005315.0A
Authority: CN
Inventors: 王森; 马志鹏; 李善综; 张康; 石赟赟; 黄燕; 严黎; 陈娟; 朱秋菊; 郑江丽; 熊静; 刘学明; 陈丽
Original assignee: Pearl River Hydraulic Research Institute of PRWRC
Current assignee: Pearl River Hydraulic Research Institute of PRWRC
Priority date: 2016-11-07
Filing date: 2016-11-07
Publication date: 2017-08-04

Abstract

The invention discloses a multi-core parallel computing design method for cascade reservoir group optimal scheduling based on the Fork/Join framework, comprising the following steps: (1) constructing the Fork/Join parallel framework; (2) realizing the Fork/Join parallel framework; (3) Parallel design of typical intelligent methods in coarse-grained mode; (4) Parallel design of typical dynamic programming methods in fine-grained mode. Through the test results of PSCWAGA, PAHPSO, PDP, and PDDDP method examples, the use of Fork/Join multi-core parallel framework can give full play to the parallel performance of multi-core CPU, greatly reduce the calculation time consumption, and significantly improve the calculation efficiency of the algorithm; the larger the calculation scale of the parallel method, The more time-consuming the calculation is reduced, the more obvious the advantages of parallel computing are; and as the calculation scale gradually increases, the speedup ratio and parallel efficiency gradually increase, and the speedup ratio is closer to the ideal speedup ratio.

Description

A multi-core parallel computer for cascade reservoir group optimal scheduling based on Fork/Join framework Computational design method

技术领域technical field

本发明涉及梯级水库群精细化调度领域，具体是一种基于Fork/Join框架的梯级水库群优化调度多核并行计算设计方法。The invention relates to the field of refined scheduling of cascade reservoir groups, in particular to a multi-core parallel computing design method for optimal scheduling of cascade reservoir groups based on the Fork/Join framework.

背景技术Background technique

近年来，我国水电事业发展非常迅猛。截止2015年底，全国水电总装机水电装机规模达到3.2亿kW，约占全球水电总装机的1/4，是当之无愧的世界超级水电大国。当前，我国水电能源主要集中分布在十三大水电基地，这些水电基地的开发正逐渐形成一大批特大干流梯级水库群，普遍呈现了梯级水库数量多、装机容量大等特点，尤其是西南富水电流域，动辄十几级甚至几十级电站规模。比如，乌江流域规划11座电站，已建9座；澜沧江规划15级电站，已建6座；红水河规划10级电站，已建9座；金沙江中下游规划12级电站，已建6座；大渡河规划28级电站，已建成超过10座。对于这类规模庞大的梯级水库群联合优化调度问题，面临的求解困难主要体现在以下3个方面：In recent years, my country's hydropower industry has developed very rapidly. By the end of 2015, the total installed capacity of hydropower in China had reached 320 million kW, accounting for about 1/4 of the total installed capacity of hydropower in the world, making it a well-deserved super power in the world. At present, my country's hydropower energy is mainly distributed in the 13 major hydropower bases. The development of these hydropower bases is gradually forming a large number of cascade reservoirs on the main stream, which generally present the characteristics of a large number of cascade reservoirs and large installed capacity. The scale of power stations in river basins is often more than ten or even dozens of levels. For example, 11 power stations are planned in the Wujiang River Basin, and 9 have been built; 15 power stations are planned for the Lancang River, and 6 have been built; 10-level power stations are planned for the Hongshui River, and 9 power stations have been built; 28 power stations are planned for Dadu River, and more than 10 have been built. For such a large-scale cascade reservoir group joint optimal dispatching problem, the difficulties in solving it are mainly reflected in the following three aspects:

(1)大规模高维性：高维数问题是制约大规模水电站群优化调度问题求解的主要难点。随着水电系统计算规模不断扩大，每增加1级电站，求解需要的计算次数、计算机存储量以及计算耗时迅速增长，对于某些方法而言，增长速度呈指数方式上升，导致算法求解非常困难，计算效率显著下降，而且对算法的可扩展性造成很大影响，这一问题在动态规划中被称为“维数灾”。实际上在其它求解算法中，系统规模对求解的限制同样不同程度地存在，并成为制约水电优化调度工程应用的主要瓶颈。(1) Large-scale high-dimensionality: The high-dimensionality problem is the main difficulty that restricts the solution of the optimal dispatching problem of large-scale hydropower station groups. With the continuous expansion of the calculation scale of the hydropower system, the number of calculations required to solve the problem, the amount of computer storage, and the time spent on calculation increase rapidly for each additional level of power station. For some methods, the growth rate increases exponentially, making the algorithm solution very difficult. , the computational efficiency is significantly reduced, and it has a great impact on the scalability of the algorithm. This problem is called the "curse of dimensionality" in dynamic programming. In fact, in other solving algorithms, the limitation of the system scale to the solution also exists to varying degrees, and has become the main bottleneck restricting the application of hydropower optimal dispatching engineering.

(2)多阶段动态优化：水电站群优化调度属于多阶段动态优化问题。通常在时间维度上离散为多个时段，各时段之间需满足水库水量平衡约束或者水位出力变幅等衔接性约束，而且前面时段运行方式直接影响后面时段发电水头以及发电量，因而各阶段紧密联系；同时调度期末水位、水电系统总出力限制等全局性等式控制条件使得系统分解和多变量关联约束松弛难度很大。(2) Multi-stage dynamic optimization: The optimal scheduling of hydropower station groups is a multi-stage dynamic optimization problem. Usually, it is discretized into multiple time periods in the time dimension, and each time period needs to meet the connection constraints such as reservoir water balance constraints or water level output fluctuations, and the operation mode of the previous time period directly affects the power generation head and power generation amount of the subsequent time period, so each stage is closely related. Connection; at the same time, the global equality control conditions such as the water level at the end of the dispatching period and the total output limit of the hydropower system make the system decomposition and the relaxation of multivariable correlation constraints very difficult.

(3)复杂水力电力联系：水电站群优化调度需要考虑电站之间的水力联系，位于上游且具有较高调节性能的电站对天然来水起调节作用，因而改变了下游电站的入库流量。当调度上游电站产生出库流量时，其下游电站的入库流量则由上游电站出库流量和两电站之间的区间流量共同构成，而入库流量作为优化调度的主要输入，直接影响当前电站的调度决策。因此，每一个电站的入库流量均受到其直属上游电站调度的影响。若当前电站有多个上游电站，优化问题更加复杂。另外，由于水电在电力系统中发挥非常重要的调度任务，致使水电站群优化调度需考虑与电力系统相关的约束条件，比如平衡电力负荷需满足的系统最小出力、输电断面的安全送电极限等，导致水电站群存在密切的电力联系，同时也加大了水电站群优化调度的求解困难。(3) Complex hydroelectric power connection: The optimal scheduling of hydropower station groups needs to consider the hydraulic connection between power stations. The power station located upstream and with high regulation performance regulates the natural water flow, thus changing the inflow of downstream power stations. When dispatching the outbound flow of the upstream power station, the inbound flow of the downstream power station is composed of the outbound flow of the upstream power station and the interval flow between the two power stations, and the inbound flow is the main input for optimal scheduling, which directly affects the current power station scheduling decision. Therefore, the inbound flow of each power station is affected by the scheduling of its direct upstream power station. If the current power station has multiple upstream power stations, the optimization problem is more complicated. In addition, because hydropower plays a very important scheduling task in the power system, the optimal scheduling of hydropower station groups needs to consider the constraints related to the power system, such as the minimum output of the system that needs to be met to balance the power load, the safe power transmission limit of the transmission section, etc. As a result, there is a close power connection in the hydropower station group, and it also increases the difficulty of solving the optimal dispatching of the hydropower station group.

面对我国水电系统如此大规模库群联合调度的形势，采用传统优化方法求解呈现一定的局限性，已无法满足电网、水电公司及流域机构的精细化调度需求，探索合理高效的求解算法是水电调度工作亟待解决的重要科学问题。Facing the situation of such a large-scale joint scheduling of reservoirs and groups in my country's hydropower system, the use of traditional optimization methods has certain limitations, and it has been unable to meet the refined scheduling needs of power grids, hydropower companies, and river basin organizations. Important scientific problems to be solved urgently in scheduling work.

目前，动态规划类方法和新兴智能算法是梯级水库群优化调度领域主要应用的两大类方法。但是，动态规划方法受求解规模限制，大规模问题易导致维数灾问题，计算规模随着电站计算数量的增加呈指数增长；群体算法的求解精度与计算规模存在一定竞争关系，即种群规模或迭代次数越大，在解空间搜索到全局优化解的概率越大，求解精度越高，但是同时也意味着计算耗时越多，求解效率越低，尤其是应用于我国当前大规模梯级水电站群优化调度时，求解效率下降非常显著。因此，为了使在有限时间内求解大规模水电站群优化调度问题成为可能，并同时保证优化结果的质量，研究如何改进这两类常用优化方法的计算效率具有重要的科学意义。At present, dynamic programming methods and emerging intelligent algorithms are two major types of methods mainly used in the field of cascade reservoir group optimal dispatching. However, the dynamic programming method is limited by the solution scale. Large-scale problems easily lead to the curse of dimensionality. The calculation scale increases exponentially with the increase in the number of power station calculations. There is a certain competition between the solution accuracy of the swarm algorithm and the calculation scale, that is, the population size or The greater the number of iterations, the greater the probability of finding a global optimal solution in the solution space, and the higher the accuracy of the solution, but it also means that the calculation will take more time and the solution efficiency will be lower, especially for the current large-scale cascade hydropower station groups in my country. When optimizing the schedule, the solution efficiency drops very significantly. Therefore, in order to make it possible to solve the optimal dispatching problem of large-scale hydropower station groups in a limited time, and at the same time ensure the quality of the optimization results, it is of great scientific significance to study how to improve the computational efficiency of these two common optimization methods.

并行计算一直是计算机科学领域的研究热点。基于多核处理器的多核并行技术以其并行实现容易、运行环境稳定、成本低廉等独特的优势被广泛应用于实际工程，对于水电系统而言，寻求优化调度问题、求解算法与并行技术之间的切入点，设计合适粗细粒度的并行优化方式将是一种提高系统求解效率的可行途径。目前，在梯级水电站群优化调度领域并行优化方法的研究成果较多，方法并行化方式主要通过粗粒度设计和细粒度设计两种模式实现，而且成果普遍利用了已成熟的OpenMP、MPI或.NET等能够有效兼容Fortran、C++、C#等语言的并行模式或框架，为Fortran、C++、C#等语言开发的串行方法并行化提供了便利。但是这些框架并不能有效适应Java语言(目前主流的汇编语言之一)开发的串行方法，难以应用于J ava编码的程序并行化开发和改造。Fork/Join并行框架是一种基于Java源码开发的多核并行框架，并已作为标准程序集成到Java版本7的并发程序包中，能够最大效率地简化开发人员的编程工作，便于Java编码开发的串行方法并行化改造，而且，由于该框架提出时间较晚，目前应用于梯级水库群优化调度方法并行化计算的成果较少，未见系统性阐述其实现方法以及性能影响因素的相关成果。因此，申请人以Java语言为基础开发语言，应用Fork/Join多核并行框架，分别采用粗粒度设计模式和细粒度设计模式，实现了多种方法的并行化计算，发明了一种基于Fork/Join框架的梯级水库群优化调度多核并行计算设计方法。Parallel computing has always been a research hotspot in the field of computer science. Multi-core parallel technology based on multi-core processors has been widely used in practical projects due to its unique advantages such as easy parallel implementation, stable operating environment, and low cost. As an entry point, designing a parallel optimization method with appropriate granularity will be a feasible way to improve the efficiency of the system solution. At present, there are many research results on parallel optimization methods in the field of optimal dispatching of cascade hydropower station groups. The method parallelization method is mainly realized through two modes: coarse-grained design and fine-grained design, and the results generally use the mature OpenMP, MPI or .NET Parallel models or frameworks that are effectively compatible with languages such as Fortran, C++, and C# provide convenience for the parallelization of serial methods developed in languages such as Fortran, C++, and C#. However, these frameworks cannot effectively adapt to the serial method developed by the Java language (one of the mainstream assembly languages), and are difficult to apply to the parallel development and transformation of Java coded programs. The Fork/Join parallel framework is a multi-core parallel framework developed based on Java source code, and has been integrated into the concurrent program package of Java version 7 as a standard program, which can simplify the programming work of developers to the greatest extent and facilitate the serialization of Java coding development. Moreover, since the framework was put forward late, there are few achievements in the parallel calculation of the cascade reservoir group optimal dispatching method, and there is no related achievement that systematically expounds its implementation method and performance influencing factors. Therefore, the applicant developed the language based on the Java language, applied the Fork/Join multi-core parallel framework, adopted the coarse-grained design mode and the fine-grained design mode respectively, realized the parallel computing of various methods, and invented a method based on Fork/Join Multi-core parallel computing design method for optimal dispatching of cascade reservoir groups in the framework.

发明内容Contents of the invention

本发明的目的在于提供一种基于Fork/Join框架的梯级水库群优化调度多核并行计算设计方法，该工艺可以在不改变现有计算机配置的基础上，充分发挥多核配置的加速性能，大幅度缩减计算时间，提高求解效率。The purpose of the present invention is to provide a multi-core parallel computing design method based on the Fork/Join framework for cascade reservoir group optimization scheduling. This process can give full play to the acceleration performance of the multi-core configuration without changing the existing computer configuration. Calculation time, improve solution efficiency.

为实现上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

一种基于Fork/Join框架的梯级水库群优化调度多核并行计算设计方法，包括以下步骤：A multi-core parallel computing design method for cascade reservoir group optimal scheduling based on Fork/Join framework, comprising the following steps:

(1)构建Fork/Join并行框架：Fork/Join并行框架的核心继承了“分治法”的特性，通过递归分割原问题，形成若干个规模更小、相互独立且可并行计算的子问题；当各子问题进行独立并行计算后，组合各子问题的子结果即可输出原问题的最终结果；Fork/Join框架设计了独特的线程池技术，当程序开始执行时，默认创建与可用的处理器数目相同的活动线程数；在“分治法”执行过程中，定义了一个可自由设置的控制子问题规模大小的阈值作为子问题规模的上限值，即当子问题规模小于或等于阈值时，“分治法”执行结束，各子问题被平均分配到不同线程中开始并行计算；另外，在子问题并行计算过程中，Fork/Join设计了独特的双端队列排序模式，并采用了“工作窃取”算法，即当一个线程的队列计算任务为空时，将从其它处于工作状态的线程队列尾部“窃取”计算任务；(1) Constructing the Fork/Join parallel framework: The core of the Fork/Join parallel framework inherits the characteristics of the "divide and conquer method". By recursively dividing the original problem, several smaller, independent and parallel computing sub-problems are formed; After each sub-problem is independently and parallelly calculated, the sub-results of each sub-problem can be combined to output the final result of the original problem; the Fork/Join framework has designed a unique thread pool technology. When the program starts to execute, the default created and available processing The number of active threads with the same number of processors; in the execution process of the "divide and conquer method", a threshold that can be freely set to control the size of the sub-problem is defined as the upper limit of the sub-problem size, that is, when the sub-problem size is less than or equal to the threshold When the execution of the "divide and conquer method" ends, each sub-problem is evenly distributed to different threads to start parallel computing; in addition, in the process of sub-problem parallel computing, Fork/Join designed a unique double-ended queue sorting mode, and adopted "Work stealing" algorithm, that is, when a thread's queue computing tasks are empty, it will "steal" computing tasks from the tail of other thread queues in the working state;

(2)实现Fork/Join并行框架：1)算法实现的代码类需继承Fork/Join应用接口类ja va.util.concurrent.RecursiveAction或者java.util.concurrent.RecursiveTask；2)选择阈值划分任务；3)实现Fork/Join接口类的void compute()方法；4)设置任务划分方式；(2) Realize the Fork/Join parallel framework: 1) The code class implemented by the algorithm needs to inherit the Fork/Join application interface class java.util.concurrent.RecursiveAction or java.util.concurrent.RecursiveTask; 2) Select the threshold to divide tasks; 3 ) Implement the void compute() method of the Fork/Join interface class; 4) Set the task division method;

(3)粗粒度模式下典型智能方法并行化设计：1)并行自适应混沌整体退火遗传算法：(3) Parallel design of typical intelligent methods in coarse-grained mode: 1) Parallel adaptive chaotic global annealing genetic algorithm:

Step 1参数初始化。设定种群规模m、混沌序列个数d、种群最大迭代次数Kmax、初始温度T0以及自适应参数Pc1、Pc2、Pm1、Pm2；Step 1 parameter initialization. Set the population size m, the number of chaotic sequences d, the maximum number of population iterations Kmax, the initial temperature T0 and the adaptive parameters Pc1, Pc2, Pm1, Pm2;

Step 2种群初始化。根据Logistic映射公式，在混沌空间随机生成n组混沌序列，映射到解空间内生成m个不同的个体构成种群，种群个体由各电站不同时段的水位值(zil，zi2，…，zin)构成；Step 2 population initialization. According to the Logistic mapping formula, n groups of chaotic sequences are randomly generated in the chaotic space, which are mapped to the solution space to generate m different individuals to form a population. The population individuals are composed of the water level values (zil, zi2, ..., zin) of each power station in different periods;

Step 3创建线程池，默认生成的线程池的工作线程数目与CPU逻辑线程数目相同，同时设置Fork/Join的计算阈值；Step 3 Create a thread pool. The number of worker threads in the default generated thread pool is the same as the number of CPU logical threads, and set the calculation threshold of Fork/Join at the same time;

Step 4启动并行计算流程；Step 4 starts the parallel computing process;

Parallel Step①根据设置的计算阈值将父种群按递归模式划分为多个规模更小的子种群；Parallel Step① According to the set calculation threshold, the parent population is divided into multiple smaller sub-populations according to the recursive mode;

Parallel Step②划分的子种群集合平均分配到不同逻辑线程。为了保持CPU负载平衡，确保每个逻辑线程分配的子任务数相同；The subpopulation sets divided by Parallel Step② are evenly distributed to different logical threads. To maintain CPU load balance, ensure that each logical thread is assigned the same number of subtasks;

Parallel Step③每个线程分配到的子种群独立运行计算，主要计算步骤如下：Parallel Step③ The subpopulation assigned to each thread runs calculations independently, and the main calculation steps are as follows:

A评价个体适应度；A evaluates individual fitness;

B选择运算：采用整体退火选择机制，允许父代参与竞争，并筛选出下一代个体；B selection operation: adopt the overall annealing selection mechanism, allow the parent to participate in the competition, and screen out the next generation of individuals;

C交叉运算：采用算术交叉方法执行交叉操作；C interleave operation: use the arithmetic interleave method to perform the interleave operation;

D变异运算：采用非均匀变异方式执行变异操作；D mutation operation: use non-uniform mutation method to perform mutation operation;

E计算父代和子代个体适应度：当父代个体Xi经过交叉、变异生成子代X′_i，若f(X′_i)＞f(X_i)，则用X′_i代替Xi；否则，以概率exp[(f(X′_i)-f(X_i))/T_k]保留Xi；E Calculate the individual fitness of the parent and offspring: when the parent individual Xi generates offspring X′ _i through crossover and mutation, if f(X′ _i )>f(X _i ), replace Xi with X′ _i ; otherwise, Retain Xi with probability exp[(f(X′ _i )-f(X _i ))/T _k ];

F参数更新。迭代次数k＝k+1，温度Tk＝1/ln(k/T0+1)。F parameter update. The number of iterations k=k+1, the temperature Tk=1/ln(k/T0+1).

G判断子种群迭代是否结束；根据退火温度Tk或最大迭代次数作为收敛条件，当两者任意其一达到初始设置的收敛条件时，返回A；否则，子种群计算收敛结束；G judges whether the subpopulation iteration is over; according to the annealing temperature Tk or the maximum number of iterations as the convergence condition, when any one of the two reaches the initially set convergence condition, return A; otherwise, the subpopulation calculation convergence ends;

Parallel Step④汇总合并各子种群的优化解作为结果集，返回主线程。Parallel Step④ summarizes and merges the optimized solutions of each subpopulation as a result set and returns to the main thread.

Step 5从结果集中筛选出最优解，计算结束并销毁线程池。Step 5 Filter out the optimal solution from the result set, and the calculation is completed and the thread pool is destroyed.

2)并行自适应混合粒子群算法(PAHPSO)2) Parallel Adaptive Hybrid Particle Swarm Optimization (PAHPSO)

Step1参数初始化；Step1 parameter initialization;

Step2种群初始化；Step2 population initialization;

Step3创建线程池，默认生成的线程池的工作线程数目与CPU逻辑线程数目相同，同时设置Fork/Join的计算阈值；Step3 Create a thread pool. The number of working threads in the default generated thread pool is the same as the number of CPU logical threads, and set the calculation threshold of Fork/Join at the same time;

Step4启动并行计算流程；Step4 starts the parallel computing process;

A计算粒子适应度、粒子个体最优解以及种群全局最优解；粒子适应度与其个体最优解比较，若粒子适应度比其个体最优解更优，则当前粒子位置作为个体最优位置；粒子适应度与种群全局最优解比较，若粒子适应度比种群全局最优解更优，则当前粒子位置作为种群全局最优位置；A Calculate particle fitness, particle individual optimal solution and population global optimal solution; particle fitness is compared with its individual optimal solution, if the particle fitness is better than its individual optimal solution, the current particle position is taken as the individual optimal position ; Particle fitness is compared with the global optimal solution of the population, if the particle fitness is better than the global optimal solution of the population, the current particle position is taken as the global optimal position of the population;

B计算粒子能量以及粒子能量阈值；若粒子能量低于当前粒子能量阈值，对粒子当前位置与速度执行变异操作；B Calculate the particle energy and the particle energy threshold; if the particle energy is lower than the current particle energy threshold, perform a mutation operation on the current position and velocity of the particle;

C计算粒子相似度以及粒子相似度阈值；若两相邻粒子相似度小于当前粒子相似度阈值，则对较差粒子的历史最优位置执行变异操作；C calculates the particle similarity and the particle similarity threshold; if the similarity of two adjacent particles is less than the current particle similarity threshold, perform a mutation operation on the historical optimal position of the worse particle;

D引入基于邻域的贪心随机搜索策略对粒子的个体最优位置进行更新；若在邻域搜索到的当前位置比搜索前粒子适应度更大，则代替搜索前粒子个体位置，再用搜索后粒子个体位置与粒子历史最优位置以及种群全局最优位置作比较，更新粒子历史最优位置以及种群最优位置；D introduces a neighborhood-based greedy random search strategy to update the individual optimal position of the particle; if the current position searched in the neighborhood is greater than the fitness of the particle before the search, replace the individual position of the particle before the search, and then use the post-search The individual particle position is compared with the optimal position of the particle history and the global optimal position of the population, and the optimal position of the particle history and the optimal position of the population are updated;

E更新粒子种群的速度和位置；E updates the speed and position of the particle population;

F判断子种群迭代是否结束；若当前迭代次数小于最大迭代次数，返回(1)；否则，子种群计算收敛结束；F judges whether the subpopulation iteration is over; if the current iteration number is less than the maximum iteration number, return (1); otherwise, the subpopulation calculation convergence ends;

Parallel Step④汇总合并各子种群的计算结果作为结果集，返回主线程；Parallel Step④ Summarize and merge the calculation results of each subpopulation as a result set and return to the main thread;

Step5从结果集中筛选出最优解，计算结束并销毁线程池；Step5 Screen out the optimal solution from the result set, and the calculation is completed and the thread pool is destroyed;

3)Fork/Join框架粗粒度并行设计模式及方法3) Coarse-grained parallel design pattern and method of Fork/Join framework

根据PSCWAGA和PAHPSO方法计算流程，归纳总结Fork/Join框架粗粒度并行设计模式及方法：According to the calculation process of PSCWAGA and PAHPSO methods, the coarse-grained parallel design mode and method of the Fork/Join framework are summarized:

Step1初始化算法参数及种群规模；Step1 initialize algorithm parameters and population size;

Step2创建线程池以及设置计算阈值；Step2 Create a thread pool and set the calculation threshold;

Step3利用Fork/Join并行框架按递归模式将父种群划分为多个子种群，并平均分配到各子线程；Step3 uses the Fork/Join parallel framework to divide the parent population into multiple sub-populations in a recursive mode, and distribute them equally to each child thread;

Step4各子种群按原串行方法流程优化计算，直至各子种群迭代结束；Step4 Each sub-population is optimized and calculated according to the original serial method flow until the iteration of each sub-population ends;

Step5汇总合并各子种群的计算结果作为结果集，返回主线程；Step5 summarizes and merges the calculation results of each subpopulation as a result set and returns to the main thread;

Step6从结果集中筛选出最优解，计算结束并销毁线程池；Step6 Filter out the optimal solution from the result set, finish the calculation and destroy the thread pool;

(4)细粒度模式下典型动态规划方法并行化设计：1)并行动态规划方法：(4) Parallel design of typical dynamic programming method in fine-grained mode: 1) Parallel dynamic programming method:

Step1数据准备；获取计算所需的电站基础属性特征值及特征曲线，包括电站特征水位、出力系数、水位-库容曲线、泄流-尾水位曲线等，并根据水库各时段蓄水上下限、离散个数确定离散状态变量St；Step1 data preparation; obtain the characteristic values and characteristic curves of the basic properties of the power station required for calculation, including the characteristic water level of the power station, output coefficient, water level-storage capacity curve, discharge-tail water level curve, etc. The number determines the discrete state variable St;

Step2创建线程池，默认生成的线程池的工作线程数目与CPU逻辑线程数目相同，同时设置Fork/Join的计算阈值；Step2 Create a thread pool. The number of working threads in the default generated thread pool is the same as the number of CPU logic threads, and set the calculation threshold of Fork/Join at the same time;

Step3启动并行计算流程；Step3 starts the parallel computing process;

Parallel Step①构建并行计算的父任务；将迭代周期内所有蓄水状态变量组合在Bt(St，It，Nt)中的求解计算作为父任务，并为计算过程中水位、出力、发电量等指标创建存储空间；Parallel Step① Construct the parent task of parallel calculation; the solution calculation of all water storage state variables combined in Bt(St, It, Nt) in the iterative cycle is taken as the parent task, and the water level, output, power generation and other indicators during the calculation process are created storage;

Parallel Step②根据设置的计算阈值将父任务按递归模式划分为多个规模更小的子任务；Parallel Step② divides the parent task into multiple smaller sub-tasks in a recursive mode according to the set calculation threshold;

Parallel Step③划分的子任务集合平均分配到不同逻辑线程。为了保持CPU负载平衡，确保每个逻辑线程分配的子任务数相同；The set of subtasks divided by Parallel Step③ is evenly distributed to different logical threads. To maintain CPU load balance, ensure that each logical thread is assigned the same number of subtasks;

Parallel Step④每个线程分配到的子任务独立运行计算，即子任务中的所有状态离散组合在递推公式中求解计算；Parallel Step④ The subtasks assigned to each thread run calculations independently, that is, all the discrete combinations of states in the subtasks are solved and calculated in the recursive formula;

Parallel Step⑤汇总合并各子任务的计算结果作为结果集，返回主线程；Parallel Step⑤ Summarize and merge the calculation results of each subtask as a result set and return to the main thread;

Step4从结果集中筛选出最优解，计算结束并销毁线程池；Step4 Filter out the optimal solution from the result set, and the calculation is completed and the thread pool is destroyed;

2)并行离散微分动态规划方法：2) Parallel discrete differential dynamic programming method:

Step2设置迭代廊道个数，并生成初始可行解作为计算初始轨迹；Step2 sets the number of iterative corridors, and generates an initial feasible solution as the initial trajectory for calculation;

Step4选取最大廊道宽度作为当前廊道；Step4 selects the maximum corridor width as the current corridor;

Step5启动并行计算流程；Step5 starts the parallel computing process;

Parallel Step①在当前廊道内构建并行计算的父任务；将廊道内所有蓄水状态变量组合在Bt(St，It，Nt)中的求解计算作为父任务，并为计算过程中水位、出力、发电量等指标创建存储空间；Parallel Step①Construct the parent task of parallel computing in the current corridor; combine all the water storage state variables in the corridor in Bt(St, It, Nt) to solve the calculation as the parent task, and provide the calculation process for the water level, output, and power generation and other indicators to create storage space;

Parallel Step③划分的子任务集合平均分配到不同逻辑线程；为了保持CPU负载平衡，确保每个逻辑线程分配的子任务数相同；The set of subtasks divided by Parallel Step③ is evenly distributed to different logical threads; in order to maintain CPU load balance, ensure that the number of subtasks assigned to each logical thread is the same;

Step 5根据结果集输出当前最优轨迹，并判断当前最优轨迹是否与初始轨迹相同；若相同，则进入Step 6；若不相同，进入Step 7；Step 5 Output the current optimal trajectory according to the result set, and judge whether the current optimal trajectory is the same as the initial trajectory; if they are the same, enter Step 6; if not, enter Step 7;

Step 6判断当前廊道是否为最后一个廊道；若不是，设置下一个更小的廊道宽度作为当前廊道宽度，进入Step 7；若是，计算终止，输出最优轨迹作为计算最优解并销毁线程池；Step 6 Determine whether the current corridor is the last corridor; if not, set the next smaller corridor width as the current corridor width, and enter Step 7; if so, the calculation is terminated, and the optimal trajectory is output as the calculated optimal solution and Destroy the thread pool;

Step 7设置当前最优轨迹作为下一次迭代的初始轨迹，返回Step 5；Step 7 sets the current optimal trajectory as the initial trajectory for the next iteration, and returns to Step 5;

3)Fork/Join框架细粒度并行设计模式及方法：3) Fork/Join framework fine-grained parallel design pattern and method:

根据PDP和PDDDP方法计算流程，归纳总结Fork/Join框架细粒度并行设计模式及方法：According to the calculation process of PDP and PDDDP method, the fine-grained parallel design mode and method of Fork/Join framework are summarized:

Step1数据准备。包括初始化参数以及设置状态离散点数；Step1 data preparation. Including initialization parameters and setting the number of state discrete points;

Step3按串行流程执行直至开始计算状态离散点组合的返回值，进入并行计算；Step3 is executed according to the serial process until the return value of the combination of state discrete points starts to be calculated, and enters into parallel calculation;

Step4利用Fork/Join并行框架按递归模式将父任务划分为多个子任务，并平均分配到各子线程；Step4 uses the Fork/Join parallel framework to divide the parent task into multiple sub-tasks in a recursive mode, and distribute them equally to each sub-thread;

Step5各子任务中的状态离散点组合按动态规划递推公式求解运算，直至各子任务计算结束；Step5 The combination of state discrete points in each subtask is solved and calculated according to the recursive formula of dynamic programming until the calculation of each subtask ends;

Step6汇总合并各子任务的计算结果作为结果集，返回主线程；Step6 summarizes and merges the calculation results of each subtask as a result set and returns to the main thread;

Step7从结果集中筛选出当前最优解，并继续按方法串行流程执行直至计算结束，并销毁线程池。Step7 Filter out the current optimal solution from the result set, and continue to execute the method serially until the calculation ends, and destroy the thread pool.

作为本发明进一步的方案：步骤(2)中阈值划分任务具体为当阈值设置偏小，意味着子问题的规模更小且分割数量更多，易造成资源管理消耗过大；当阈值设置偏大，意味着子问题的规模更大且分割数量更少，甚至当子问题数量小于工作线程数时，易导致部分工作线程处于闲置状态。因此，为了保证所有工作线程均能分配到子问题，通用的阈值设置公式(I)如下：As a further solution of the present invention: the threshold division task in step (2) is specifically that when the threshold setting is too small, it means that the scale of the sub-problems is smaller and the number of divisions is larger, which is likely to cause excessive resource management consumption; when the threshold setting is too large , which means that the scale of sub-problems is larger and the number of divisions is less, even when the number of sub-problems is smaller than the number of worker threads, it is easy to cause some worker threads to be idle. Therefore, in order to ensure that all worker threads can be assigned to sub-problems, the general threshold setting formula (I) is as follows:

公式(I)中：为计算结果取上整数；μ为规模控制阈值；S_c为原问题规模；w为多核处理器逻辑线程数。In the formula (I): Take an integer for the calculation result; μ is the scale control threshold; S _c is the original problem scale; w is the number of logical threads of the multi-core processor.

作为本发明进一步的方案：步骤(2)中设置任务划分方式具体为每次任务划分可将父任务划分为多个子任务，而划分的子任务个数由开发人员编写代码实现。As a further solution of the present invention: the task division method set in step (2) is specifically that the parent task can be divided into multiple subtasks for each task division, and the number of divided subtasks is implemented by developers writing codes.

与现有技术相比，本发明的有益效果是：Compared with prior art, the beneficial effect of the present invention is:

本发明通过PSCWAGA、PAHPSO、PDP、PDDDP方法实例测试结果，采用Fork/Join多核并行框架，能充分发挥多核CPU并行性能，大幅度缩减计算耗时，显著提高算法计算效率；并行方法的计算规模越大，缩减计算耗时越多，并行计算的优势越明显；而且随着计算规模逐渐增大，加速比及并行效率逐步增大，加速比更加接近理想加速比。The present invention adopts the PSCWAGA, PAHPSO, PDP, PDDDP method instance test results, adopts the Fork/Join multi-core parallel framework, can give full play to the multi-core CPU parallel performance, greatly reduce the calculation time consumption, and significantly improve the calculation efficiency of the algorithm; the calculation scale of the parallel method is larger The larger the calculation time is, the more obvious the advantages of parallel computing will be; and as the calculation scale gradually increases, the speedup ratio and parallel efficiency will gradually increase, and the speedup ratio will be closer to the ideal speedup ratio.

附图说明Description of drawings

图1为“分治法”示意图；Figure 1 is a schematic diagram of the "divide and conquer" method;

图2为阈值控制方式示意图；Fig. 2 is a schematic diagram of the threshold control mode;

图3为“工作窃取”算法示意图；Figure 3 is a schematic diagram of the "work stealing" algorithm;

图4为Fork/Join伪代码示意图；Figure 4 is a schematic diagram of Fork/Join pseudocode;

图5为Fork/Join框架粗粒度并行设计模式及方法；Figure 5 shows the coarse-grained parallel design pattern and method of the Fork/Join framework;

图6为Fork/Join框架细粒度并行设计模式及方法；Figure 6 shows the fine-grained parallel design pattern and method of the Fork/Join framework;

图7为红水河梯级水库拓扑结构图；Figure 7 is a topological structure diagram of the Hongshuihe cascade reservoir;

图8为3离散状态点廊道示意图。Fig. 8 is a schematic diagram of a corridor with 3 discrete state points.

具体实施方式detailed description

下面将结合本发明实施例，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

一、梯级水电站群优化调度模型1. Optimal dispatching model of cascade hydropower station group

(1)目标函数(1) Objective function

以梯级水库群总发电量最大模型为例。该模型可描述为：在已知调度期内各电站入库流量过程条件下，考虑水量平衡、蓄水量、泄量、出力等各种约束，兼顾防洪、灌溉、航运等综合要求，制定最优调度决策过程，使调度期内水库群发电量最大。目标函数数学表达式如下：Take the maximum total power generation model of cascade reservoirs as an example. The model can be described as: under the conditions of the inflow flow of each power station in the known dispatch period, considering various constraints such as water balance, water storage, discharge, and output, and taking into account the comprehensive requirements of flood control, irrigation, and shipping, the optimal Optimizing the scheduling decision-making process to maximize the power generation of the reservoir group during the scheduling period. The mathematical expression of the objective function is as follows:

式中：E为水电站群总发电量(亿kW.h)；N为电站数目；i为电站序号；T为时段数；t为时段序号；为水电站i第t时段的平均出力；Δ_t为第t时段的小时数。In the formula: E is the total power generation of the hydropower station group (100 million kW.h); N is the number of power stations; i is the serial number of the power station; T is the number of time periods; t is the number of time periods; is the average output of hydropower station i in period t; _Δt is the number of hours in period t.

(2)约束条件(2) Constraints

水量平衡约束： Water balance constraints:

水库蓄水量约束： Reservoir water storage constraints:

发电流量约束： Generation flow constraints:

出库流量约束： Outbound flow constraints:

出力约束： Output constraints:

末水位约束： End water level constraints:

水电系统总出力约束： The total output constraints of the hydropower system:

式中：为水电站i第t时段初水库蓄水量(m3)；为水电站i第t时段入库流量(m3/s)；为水电站i第t时段出库流量(m3/s)；为水电站i第t时段区间流量(m3/s)；K_i为水电站i的上游电站数；k为上游电站序号；为水电站i第t时段的上游水库k的出库流量(m3/s)；为水电站i第t时段的发电流量(m3/s)；为水电站i第t时段的弃水流量(m3/s)；分别为水电站i第t时段水库蓄水量上下限(m3)；分别为水电站i第t时段发电流量上下限(m3/s)；分别为水电站i第t时段出库流量上下限(m3/s)；分别为水电站i第t时段出力上下限(MW)；分别为水电站i的初始时刻和T时刻的蓄水状态；SB_i、SE_i分别为水电站i指定的初始蓄水状态和末时段预期的蓄水状态；分别为水电系统第t时段总出力上下限(MW)。In the formula: is the water storage capacity of the reservoir at the beginning of period t of hydropower station i (m3); is the inflow flow of hydropower station i in period t (m3/s); is the discharge flow of hydropower station i in period t (m3/s); is the interval flow of hydropower station i in the t period (m3/s); K _i is the number of upstream power stations of hydropower station i; k is the serial number of upstream power stations; is the outflow flow of the upstream reservoir k of the hydropower station i in the period t (m3/s); is the power generation flow (m3/s) of hydropower station i in period t; is the discarded water flow (m3/s) of hydropower station i in the tth period; Respectively, the upper and lower limits of reservoir water storage capacity of hydropower station i in period t (m3); Respectively, the upper and lower limits of the power generation flow of hydropower station i in the tth period (m3/s); Respectively, the upper and lower limits of outflow flow of hydropower station i in period t (m3/s); Respectively, the upper and lower limits of output of hydropower station i in period t (MW); are the initial water storage state of hydropower station i and the water storage state at T time respectively; SB _i and SE _i are the initial water storage state specified by hydropower station i and the expected water storage state at the end of the period, respectively; Respectively, the upper and lower limits of the total output of the hydropower system in the t-th period (MW).

二、Fork/Join并行框架2. Fork/Join Parallel Framework

Fork/Join并行框架的接口类以及实现类封装在Java.util.concurrent中。虽然Fork/Join的并行效率与其他并行框架相比较并非最优，但其最大优点在于给开发人员提供了非常简便的应用程序实现接口。开发人员不再需要处理各种并行相关事务，例如同步、通信等，同时可避免难以调试的死锁和数据争用等错误，只需按照给定接口的实现方式，编写很少的程序即可完成算法与框架接口的对接，极大地简化了编写并发程序的琐碎工作，可大量节省开发人员的工作量，提高工作效率，Fork/Join伪代码示意图见图4。对于开发人员而言，应用Fork/Join并行框架实现算法并行化计算主要需关注以下4个问题。The interface classes and implementation classes of the Fork/Join parallel framework are encapsulated in Java.util.concurrent. Although the parallel efficiency of Fork/Join is not optimal compared with other parallel frameworks, its biggest advantage is that it provides developers with a very simple interface for implementing application programs. Developers no longer need to deal with various parallel related affairs, such as synchronization, communication, etc., and can avoid errors such as deadlock and data contention that are difficult to debug, and only need to write a few programs according to the implementation of a given interface Completing the connection between the algorithm and the framework interface greatly simplifies the trivial work of writing concurrent programs, which can greatly save the workload of developers and improve work efficiency. Fork/Join pseudo-code schematic diagram is shown in Figure 4. For developers, the application of the Fork/Join parallel framework to realize algorithm parallel computing mainly needs to pay attention to the following four issues.

1)算法实现的代码类需继承Fork/Join应用接口类java.util.concurrent.RecursiveA ction或者java.util.concurrent.RecursiveTask。1) The code class implemented by the algorithm needs to inherit the Fork/Join application interface class java.util.concurrent.RecursiveAction or java.util.concurrent.RecursiveTask.

2)选择合适的阈值划分任务。根据图2中的实例，阈值的大小直接影响子问题的分割数目。当阈值设置偏小，意味着子问题的规模更小且分割数量更多，易造成资源管理消耗过大；当阈值设置偏大，意味着子问题的规模更大且分割数量更少，甚至当子问题数量小于工作线程数时，易导致部分工作线程处于闲置状态。因此，为了保证所有工作线程均能分配到子问题，通用的阈值设置公式如下：2) Choose an appropriate threshold to divide the task. According to the example in Figure 2, the size of the threshold directly affects the number of sub-problems divided. When the threshold is set too small, it means that the sub-problems are smaller in scale and the number of divisions is larger, which can easily lead to excessive consumption of resource management; When the number of sub-problems is less than the number of worker threads, it is easy to cause some worker threads to be idle. Therefore, in order to ensure that all worker threads can be assigned to sub-problems, the general threshold setting formula is as follows:

式中：为计算结果取上整数；μ为规模控制阈值；S_c为原问题规模；w为多核处理器逻辑线程数(具有“超线程”技术的处理器可在1个内核中模拟2个逻辑线程)。另外，框架应用接口类提供了划分子问题的方法接口，因此，开发人员只需提供合适的阈值实现提供的接口，至于程序如何实现任务划分以及线程分配，开发人员不需要关心。In the formula: Take an integer for the calculation result; μ is the scale control threshold; S _c is the scale of the original problem; w is the number of logical threads of the multi-core processor (the processor with "hyper-threading" technology can simulate 2 logical threads in one core) . In addition, the framework application interface class provides a method interface for dividing sub-problems. Therefore, developers only need to provide an appropriate threshold to implement the provided interface. As for how the program implements task division and thread allocation, developers do not need to care.

3)实现Fork/Join接口类的void compute()方法。该方法接口主要实现各子任务的计算，即将算法可并行化部分的代码程序写入该方法接口。3) Implement the void compute() method of the Fork/Join interface class. This method interface mainly implements the calculation of each subtask, that is, the code program of the parallelizable part of the algorithm is written into this method interface.

4)设置任务划分方式。每次任务划分可将父任务划分为多个子任务，而划分的子任务个数由开发人员编写代码实现。一般来说，采用对半分解父任务的方式有利于硬件实现负载均衡。4) Set the task division method. Each task division can divide the parent task into multiple subtasks, and the number of divided subtasks is implemented by the developer writing code. Generally speaking, the method of decomposing the parent task in half is beneficial to the hardware to achieve load balancing.

三、实例方法重要参数定义3. Definition of important parameters of instance method

(1)自适应混沌整体退火遗传算法(PSCWAGA)(1) Adaptive Chaotic Global Annealing Genetic Algorithm (PSCWAGA)

Logistic映射公式：x_n+1＝μx_n(1-x_n) (3.1)Logistic mapping formula: x _n+1 ＝μx _n (1-x _n ) (3.1)

J_k(f_i)＝exp(f_i/T_k) (3.4)J _k (f _i )＝exp(f _i /T _k ) (3.4)

式中：xn是[0，1]上的随机数，是变量x第n次的迭代值；μ为控制参数，且μ∈[0，4]。其中当＝4时，Logistic映射公式完全处于混沌状态，而且，μ的初始取值不能为0、0.25、0.5、0.75和1；Pcl、Pc2、Pm1、Pm2为区间(0，1)上的控制参数；f′为两交叉个体中的适应度较大值；f为当前个体的适应度；fmax、favg分别表示种群中最大适应度和种群平均适应度；P(Xi)为选择概率，Jk(fi)为适应度函数；fi是个体Xi的目标函数值；m为种群大小；k为遗传算法迭代次数；Tk为渐趋于0的退火温度。其中，温度按Tk＝1/ln(k/T0+1)下降。In the formula: xn is a random number on [0, 1], which is the nth iteration value of variable x; μ is a control parameter, and μ∈[0,4]. Wherein when =4, the Logistic mapping formula is completely in a chaotic state, and the initial value of μ cannot be 0, 0.25, 0.5, 0.75 and 1; Pcl, Pc2, Pm1, Pm2 are the control on the interval (0, 1) Parameters; f' is the larger fitness value of the two crossover individuals; f is the fitness of the current individual; fmax and favg respectively represent the maximum fitness and the average fitness of the population; P(Xi) is the selection probability, Jk( fi) is the fitness function; fi is the objective function value of individual Xi; m is the population size; k is the number of iterations of the genetic algorithm; Tk is the annealing temperature tending to zero. Wherein, the temperature decreases according to Tk=1/ln(k/T0+1).

(2)自适应混合粒子群算法(PAHPSO)(2) Adaptive Hybrid Particle Swarm Optimization (PAHPSO)

粒子能量 particle energy

其中 in

式中：Pibest为粒子xi的历史最优位置；Pgbest为种群全局最优位置；n为维数；Xi(k)为粒子位置；Vi(k)为粒子飞行速度。可见，e(P_i)∈[0，1]。粒子能量描述的是粒子的搜索能力，并且对算法自适应起到了关键作用。公式(3.6)中看到粒子能量计算与粒子当前位置和速度的相似程度以及粒子历史最优位置和种群全局最优位置的相似程度有关。In the formula: Pibest is the historical optimal position of particle xi; Pgbest is the global optimal position of the population; n is the dimension; Xi(k) is the particle position; Vi(k) is the particle flight speed. It can be seen that e(P _i )∈[0,1]. Particle energy describes the search ability of particles and plays a key role in algorithm adaptation. In the formula (3.6), it can be seen that the particle energy calculation is related to the similarity of the particle's current position and velocity, as well as the similarity of the particle's historical optimal position and the population's global optimal position.

粒子能量阈值 particle energy threshold

其中：speed(P_i(curG))＝P_ibest(curG)/P_ibest(curG-1)Where: speed(P _i (curG))=P _ibest (curG)/P _ibest (curG-1)

式中：maxG为最大迭代次数；curG为当前迭代次数；e为预先给定常量，用于控制能量阈值的变化趋势；eIni为粒子能量上限；eFin为粒子能量下限。In the formula: maxG is the maximum number of iterations; curG is the current number of iterations; e is a predetermined constant used to control the change trend of the energy threshold; eIni is the upper limit of particle energy; eFin is the lower limit of particle energy.

粒子能量阈值与种群进化程度及进化速度密切相关。从公式(3.5)中可以看到，粒子能量阈值与粒子最优位置和粒子能量上下限有关。迭代过程中粒子能量阈值不断变化，当粒子能量小于当前粒子能量阈值时，对速度和位置进行变异操作，见公式(3.8)、(3.9)。The particle energy threshold is closely related to the degree and speed of population evolution. It can be seen from the formula (3.5) that the particle energy threshold is related to the optimal position of the particle and the upper and lower limits of the particle energy. The particle energy threshold is constantly changing during the iterative process. When the particle energy is less than the current particle energy threshold, the velocity and position are mutated, see formulas (3.8) and (3.9).

V_i(k)＝mutation(V_i(k)) (3.8)V _i (k) = mutation (V _i (k)) (3.8)

X_i(k)＝mutation(X_i(k)) (3.9)X _i (k) = mutation (X _i (k)) (3.9)

粒子相似度 particle similarity

其中 in

式中：Pibest为粒子xi的历史最优位置；Pjbest为粒子xj的历史最优位置。从公式(3.10)中可看出，相邻粒子相似度的计算与其相对应的历史最优位置有关。In the formula: Pibest is the historical optimal position of particle xi; Pjbest is the historical optimal position of particle xj. It can be seen from the formula (3.10) that the calculation of the similarity of adjacent particles is related to its corresponding historical optimal position.

粒子相似度阈值 Particle Similarity Threshold

式中：sIni是粒子相似度上限；sFin是粒子相似度下限；s为常量，控制相似度阈值的变化幅度。In the formula: sIni is the upper limit of particle similarity; sFin is the lower limit of particle similarity; s is a constant, which controls the variation range of the similarity threshold.

(3)动态规划方法(DP)(3) Dynamic programming method (DP)

S_t＝T_t+1(S_t+1，Q_t，N_t)；t＝T，T-1，…，1 (3.13)S _t =T _t+1 (S _t+1 , Q _t , N _t ); t=T, T-1, . . . , 1 (3.13)

式中：t、T分别为时段序号和时段数；St、Qt、Nt分别为M维(电站个数)水电站库容、入库流量、出力，均为矢量；f_t ^*(S_t)为时段t状态为St时，从时段t到末时段的系统最大发电量，亿kWh；Bt(St，Qt，Nt)为时段t初始蓄水状态为St，入库流量为Qt，决策出力为Nt时的本时段系统发电量，亿kWh；Tt+1(St+1，Qt，Nt)为时段t+1到t的状态转移方程，通常为水量平衡方程，见公式(1.2)。In the formula: t and T are the serial number and number of time periods respectively; St, Qt and Nt are the storage capacity, inflow flow and output of hydropower stations in M dimension (the number of power stations), all of which are vectors; f _t ^* (S _t ) is the time period When the t state is St, the maximum power generation of the system from period t to the end period is 100 million kWh; Bt(St, Qt, Nt) is when the initial water storage state is St in period t, the inflow flow is Qt, and the decision-making output is Nt The power generation of the system in this period is 100 million kWh; Tt+1(St+1, Qt, Nt) is the state transition equation from t+1 to t, usually a water balance equation, see formula (1.2).

(4)离散微分动态规划方法(DDDP)(4) Discrete Differential Dynamic Programming (DDDP)

廊道：当每次迭代搜索开始时，在状态可行域内对当前初始解构造一定范围的“廊道”，并在当前“廊道”内搜索最优轨迹，在廊道内一般离散较少的状态数，通常离散3个状态点，即可大幅度缩小计算规模，见图8。图8中，Δ₁和Δ₂分别为上廊道宽度和下廊道宽度，整个廊道宽度为Δ＝Δ₁+Δ₂。为了方便计算，可设置Δ₁＝Δ₂，另外，廊道上边界与下边界不能超出状态变量的可行域。Corridor: When each iterative search starts, a certain range of "corridors" is constructed for the current initial solution within the state feasible region, and the optimal trajectory is searched in the current "corridor". In the corridor, there are generally less discrete states Number, usually discrete 3 state points, can greatly reduce the calculation scale, see Figure 8. In Fig. 8, Δ ₁ and Δ ₂ are the width of the upper corridor and the width of the lower corridor respectively, and the width of the entire corridor is Δ=Δ ₁ +Δ ₂ . For the convenience of calculation, Δ ₁ = Δ ₂ can be set. In addition, the upper boundary and lower boundary of the corridor cannot exceed the feasible region of the state variable.

四、粗粒度模式下典型智能方法并行化设计实现方式4. Parallel design and implementation of typical intelligent methods in coarse-grained mode

对于目前众多新兴智能算法而言，不同的优化算法求解过程有所不同，比如遗传算法主要过程是通过种群个体选择、交叉、变异等步骤寻优；粒子群算法主要过程是通过个体极值点和全局极值点不断更新个体位置来搜索全局最优解；蚁群算法主要过程通过个体遗留的信息素不断向全局最优解进化。但是，这些智能算法的优化机制普遍为从个体最优逐步搜索至全局最优，其共同特点是在算法开始执行时，初始生成一定规模的个体种群，种群中的每一个个体为单一独立的初始解，且每个个体在寻优过程中的迭代计算是相互独立的，因此，均可采用将初始种群分解为规模更小种群寻优汇总的粗粒度模式进行并行化设计与计算。本发明主要以并行自适应混沌整体退火遗传算法和并行自适应混合粒子群算法两种典型智能方法的并行化设计为例，总结基于Fork/Join多核并行框架的粗粒度并行化设计方法。For many emerging intelligent algorithms, the solution process of different optimization algorithms is different. For example, the main process of genetic algorithm is to optimize through steps such as population individual selection, crossover, and mutation; The global extreme point continuously updates the individual position to search for the global optimal solution; the main process of the ant colony algorithm continuously evolves to the global optimal solution through the pheromone left by the individual. However, the optimization mechanism of these intelligent algorithms is generally to gradually search from the individual optimum to the global optimum. Their common feature is that when the algorithm starts to execute, a certain-scale individual population is initially generated, and each individual in the population is a single independent initial solution, and the iterative calculation of each individual in the optimization process is independent of each other, therefore, the coarse-grained mode of decomposing the initial population into smaller populations can be used for parallel design and calculation. The present invention mainly takes the parallelized design of two typical intelligent methods of parallel adaptive chaotic global annealing genetic algorithm and parallel adaptive mixed particle swarm algorithm as examples, and summarizes the coarse-grained parallelized design method based on the Fork/Join multi-core parallel framework.

(1)并行自适应混沌整体退火遗传算法(PSCWAGA)(1) Parallel adaptive chaotic global annealing genetic algorithm (PSCWAGA)

Step 1参数初始化。设定种群规模m、混沌序列个数d、种群最大迭代次数Kmax、初始温度T0以及自适应参数Pc1、Pc2、Pm1、Pm2。Step 1 parameter initialization. Set the population size m, the number of chaotic sequences d, the maximum iteration number Kmax of the population, the initial temperature T0 and the adaptive parameters Pc1, Pc2, Pm1, Pm2.

Step 2种群初始化。根据Logistic映射公式，在混沌空间随机生成n组混沌序列，映射到解空间内生成m个不同的个体构成种群，种群个体由各电站不同时段的水位值(zi1，zi2，…，zin)构成。Step 2 population initialization. According to the Logistic mapping formula, n groups of chaotic sequences are randomly generated in the chaotic space, which are mapped to the solution space to generate m different individuals to form a population.

Step 3创建线程池，默认生成的线程池的工作线程数目与CPU逻辑线程数目相同，同时设置Fork/Join的计算阈值。Step 3 Create a thread pool. The number of working threads in the default generated thread pool is the same as the number of CPU logical threads, and set the calculation threshold of Fork/Join.

Step 4启动并行计算流程。Step 4 starts the parallel computing process.

Parallel Step①根据设置的计算阈值将父种群按递归模式划分为多个规模更小的子种群。Parallel Step① divides the parent population into multiple smaller sub-populations according to the recursive mode according to the set calculation threshold.

Parallel Step②划分的子种群集合平均分配到不同逻辑线程。为了保持CPU负载平衡，确保每个逻辑线程分配的子任务数相同。The subpopulation sets divided by Parallel Step② are evenly distributed to different logical threads. To keep the CPU load balanced, make sure each logical thread is assigned the same number of subtasks.

(1)评价个体适应度。直接采用目标函数作为适应度评价函数，同时采用最优保存策略，即记录每一代最优个体，不参与交叉运算和变异运算，在本代计算结束后，替换掉适应度最差的个体，确保其顺利进入下一代。(1) Evaluate individual fitness. Directly use the objective function as the fitness evaluation function, and adopt the optimal preservation strategy at the same time, that is, record the best individual of each generation, do not participate in the crossover operation and mutation operation, and replace the individual with the worst fitness after the calculation of this generation is completed to ensure It passed smoothly into the next generation.

(2)选择运算。采用整体退火选择机制，允许父代参与竞争，每个个体通过相对适应度函数公式(5)和选择概率公式(6)筛选出下一代。(2) Select operation. The overall annealing selection mechanism is adopted to allow the parents to participate in the competition, and each individual selects the next generation through the relative fitness function formula (5) and the selection probability formula (6).

(3)交叉运算。采用算术交叉方法执行交叉操作，交叉算子由公式(2)得到。(3) Cross operation. The arithmetic crossover method is used to perform the crossover operation, and the crossover operator is obtained by formula (2).

(4)变异运算。采用非均匀变异方式执行变异操作，变异算子由公式(3)得到。(4) Variation operation. The non-uniform mutation method is used to perform the mutation operation, and the mutation operator is obtained from the formula (3).

(5)计算父代和子代个体适应度。假设父代个体Xi经过交叉、变异生成子代X′_i，若f(X′_i)＞f(X_i)，则用X′_i代替Xi；否则，以概率exp[(f(X′_i)-f(X_i))/T_k]保留Xi。(5) Calculate the individual fitness of the parent and offspring. Assuming that parent individual Xi generates offspring X′ _i through crossover and mutation, if f(X′ _i )>f(X _i ), replace Xi with X′ _i ; otherwise, use probability exp[(f(X′ _i )-f(X _i ))/T _k ] retain Xi.

(6)参数更新。迭代次数k＝k+1，温度Tk＝1/ln(k/T0+1)。(6) Parameter update. The number of iterations k=k+1, the temperature Tk=1/ln(k/T0+1).

(7)判断子种群迭代是否结束。根据退火温度Tk或最大迭代次数作为收敛条件。当两者任意其一达到初始设置的收敛条件时，返回(1)；否则，子种群计算收敛结束。(7) Determine whether the subpopulation iteration is over. According to the annealing temperature Tk or the maximum number of iterations as the convergence condition. When any one of the two reaches the initially set convergence condition, return (1); otherwise, the convergence of the subpopulation calculation ends.

PSCWAGA算法流程见图5。The flow chart of PSCWAGA algorithm is shown in Figure 5.

(2)并行自适应混合粒子群算法(PAHPSO)(2) Parallel Adaptive Hybrid Particle Swarm Optimization (PAHPSO)

Step1参数初始化。设定种群规模m、混沌序列个数d、种群最大迭代次数Kmax、飞行加速度c1，c2、惯性因子w、常量e、常量s、粒子能量上限eIni、粒子能量下限eFin以及粒子相似度上限sIni和粒子相似度下限sFin。Step1 parameter initialization. Set the population size m, the number of chaotic sequences d, the maximum number of population iterations Kmax, flight acceleration c1, c2, inertia factor w, constant e, constant s, particle energy upper limit eIni, particle energy lower limit eFin and particle similarity upper limit sIni and The lower limit of particle similarity sFin.

Step2种群初始化。根据Logistic映射公式，在各时段水位允许范围内，随机初始化粒子种群个体位置(zi1，zi2，…，zin)以及粒子飞行速度(Vi1，Vi2，…，Vin)。其中，粒子位置元素为水位，飞行速度元素为水位涨落速度。Step2 population initialization. According to the Logistic mapping formula, the individual positions (zi1, zi2, ..., zin) and particle flight speeds (Vi1, Vi2, ..., Vin) of the particle population are randomly initialized within the allowable range of the water level in each time period. Among them, the particle position element is the water level, and the flight speed element is the water level fluctuation speed.

Step3创建线程池，默认生成的线程池的工作线程数目与CPU逻辑线程数目相同，同时设置Fork/Join的计算阈值。Step3 Create a thread pool. The number of working threads in the default generated thread pool is the same as the number of CPU logical threads, and set the calculation threshold of Fork/Join.

Step4启动并行计算流程。Step4 starts the parallel computing process.

(1)计算粒子适应度、粒子个体最优解以及种群全局最优解。粒子适应度与其个体最优解比较，若粒子适应度比其个体最优解更优，则当前粒子位置作为个体最优位置；粒子适应度与种群全局最优解比较，若粒子适应度比种群全局最优解更优，则当前粒子位置作为种群全局最优位置。(1) Calculate particle fitness, particle individual optimal solution and population global optimal solution. The particle fitness is compared with its individual optimal solution, if the particle fitness is better than its individual optimal solution, the current particle position is taken as the individual optimal position; the particle fitness is compared with the global optimal solution of the population, if the particle fitness is better than the population If the global optimal solution is better, the current particle position is taken as the global optimal position of the population.

(2)计算粒子能量以及粒子能量阈值。若粒子能量低于当前粒子能量阈值，对粒子当前位置与速度执行变异操作。(2) Calculate particle energy and particle energy threshold. If the particle energy is lower than the current particle energy threshold, perform a mutation operation on the particle's current position and velocity.

(3)计算粒子相似度以及粒子相似度阈值。若两相邻粒子相似度小于当前粒子相似度阈值，则对较差粒子的历史最优位置执行变异操作。(3) Calculate particle similarity and particle similarity threshold. If the similarity between two adjacent particles is less than the current particle similarity threshold, the mutation operation is performed on the historical optimal position of the worse particle.

(4)引入基于邻域的贪心随机搜索策略对粒子的个体最优位置进行更新。若在邻域搜索到的当前位置比搜索前粒子适应度更大，则代替搜索前粒子个体位置，再用搜索后粒子个体位置与粒子历史最优位置以及种群全局最优位置作比较，更新粒子历史最优位置以及种群最优位置。(4) A neighborhood-based greedy random search strategy is introduced to update the individual optimal position of particles. If the current position searched in the neighborhood is greater than the fitness of the particle before the search, replace the individual particle position before the search, and then compare the individual particle position after the search with the optimal position of the particle history and the global optimal position of the population to update the particle The historical optimal position and the population optimal position.

(5)更新粒子种群的速度和位置。(5) Update the velocity and position of the particle population.

(6)判断子种群迭代是否结束。若当前迭代次数小于最大迭代次数，返回(1)；否则，子种群计算收敛结束。(6) Determine whether the subpopulation iteration is over. If the current number of iterations is less than the maximum number of iterations, return (1); otherwise, the convergence of the subpopulation calculation ends.

Parallel Step④汇总合并各子种群的计算结果作为结果集，返回主线程。Parallel Step④ summarizes and merges the calculation results of each subpopulation as a result set and returns to the main thread.

Step5从结果集中筛选出最优解，计算结束并销毁线程池。Step5 filters out the optimal solution from the result set, and the calculation is completed and the thread pool is destroyed.

PAHPSO算法流程见图5。The flow chart of PAHPSO algorithm is shown in Fig. 5.

(3)Fork/Join框架粗粒度并行设计模式及方法(3) Coarse-grained parallel design pattern and method of Fork/Join framework

Step1初始化算法参数及种群规模。Step1 initialize algorithm parameters and population size.

Step2创建线程池以及设置计算阈值。Step2 Create a thread pool and set the calculation threshold.

Step3利用Fork/Join并行框架按递归模式将父种群划分为多个子种群，并平均分配到各子线程。Step3 uses the Fork/Join parallel framework to divide the parent population into multiple sub-populations in a recursive mode, and distribute them equally to each child thread.

Step4各子种群按原串行方法流程优化计算，直至各子种群迭代结束。Step4 Each sub-population is optimized and calculated according to the original serial method flow until the iteration of each sub-population ends.

Step5汇总合并各子种群的计算结果作为结果集，返回主线程。Step5 summarizes and merges the calculation results of each subpopulation as a result set and returns to the main thread.

Step6从结果集中筛选出最优解，计算结束并销毁线程池。Step6 Filter out the optimal solution from the result set, and destroy the thread pool after the calculation is completed.

Fork/Join框架粗粒度并行设计模式见图5。The coarse-grained parallel design pattern of the Fork/Join framework is shown in Figure 5.

五、细粒度模式下典型动态规划方法并行化设计实现方式5. Parallel design implementation of typical dynamic programming method in fine-grained mode

经典动态规划类方法是目前水库群优化调度应用最广泛的优化方法之一，主要有传统动态规划方法、离散微分动态规划方法、逐次逼近动态规划方法等动态规划类方法。这类动态规划方法寻优迭代方式各有不同，但是方法的核心均为不同的状态离散点组合在动态规划递推公式中的求解寻优，而且不同的状态离散点组合迭代计算是相互独立的。因此，经典动态规划类方法均适用于细粒度模式下的并行设计。本发明主要以并行动态规划方法和并行离散微分动态规划方法两种典型动态规划类方法的并行化设计为例，总结基于Fork/Join多核并行框架的细粒度并行化设计方法。Classical dynamic programming methods are currently one of the most widely used optimization methods for optimal scheduling of reservoir groups, mainly including traditional dynamic programming methods, discrete differential dynamic programming methods, successive approximation dynamic programming methods and other dynamic programming methods. This type of dynamic programming method has different optimization iteration methods, but the core of the method is the solution optimization of different state discrete point combinations in the dynamic programming recursive formula, and the different state discrete point combination iterative calculations are independent of each other. . Therefore, classical dynamic programming methods are suitable for parallel design in fine-grained mode. The present invention mainly takes the parallel design of two typical dynamic programming methods, the parallel dynamic programming method and the parallel discrete differential dynamic programming method, as examples, and summarizes the fine-grained parallel design method based on the Fork/Join multi-core parallel framework.

(1)并行动态规划方法(PDP)(1) Parallel dynamic programming method (PDP)

Step1数据准备。获取计算所需的电站基础属性特征值及特征曲线，包括电站特征水位、出力系数、水位-库容曲线、泄流-尾水位曲线等，并根据水库各时段蓄水上下限、离散个数确定离散状态变量St。Step1 data preparation. Obtain the characteristic values and characteristic curves of the basic properties of the power station required for calculation, including the characteristic water level of the power station, output coefficient, water level-storage capacity curve, discharge-tail water level curve, etc., and determine the discrete The state variable St.

Step2创建线程池，默认生成的线程池的工作线程数目与CPU逻辑线程数目相同，同时设置Fork/Join的计算阈值。Step2 Create a thread pool. The number of working threads in the default generated thread pool is the same as the number of CPU logical threads, and set the calculation threshold of Fork/Join.

Step3启动并行计算流程。Step3 starts the parallel computing process.

Parallel Step①构建并行计算的父任务。将迭代周期内所有蓄水状态变量组合在Bt(S t，It，Nt)中的求解计算作为父任务，并为计算过程中水位、出力、发电量等指标创建存储空间。Parallel Step① Construct the parent task of parallel computing. The solution calculation of all water storage state variables combined in Bt(S t, It, Nt) in the iterative cycle is taken as the parent task, and storage space is created for indicators such as water level, output, and power generation during the calculation process.

Parallel Step②根据设置的计算阈值将父任务按递归模式划分为多个规模更小的子任务。Parallel Step② divides the parent task into multiple smaller subtasks in a recursive mode according to the set calculation threshold.

Parallel Step③划分的子任务集合平均分配到不同逻辑线程。为了保持CPU负载平衡，确保每个逻辑线程分配的子任务数相同。The set of subtasks divided by Parallel Step③ is evenly distributed to different logical threads. To keep the CPU load balanced, make sure each logical thread is assigned the same number of subtasks.

Parallel Step④每个线程分配到的子任务独立运行计算，即子任务中的所有状态离散组合在递推公式中求解计算。Parallel Step④ The subtasks assigned to each thread run calculations independently, that is, all the discrete combinations of states in the subtasks are solved and calculated in the recursive formula.

Parallel Step⑤汇总合并各子任务的计算结果作为结果集，返回主线程。Parallel Step ⑤ Summarize and merge the calculation results of each subtask as a result set and return to the main thread.

Step4从结果集中筛选出最优解，计算结束并销毁线程池。Step4 filters out the optimal solution from the result set, and the calculation is completed and the thread pool is destroyed.

PDP算法流程见图6。The PDP algorithm flow chart is shown in Figure 6.

(2)并行离散微分动态规划方法(PDDDP)(2) Parallel discrete differential dynamic programming method (PDDDP)

Step1数据准备。获取计算所需的电站基础属性特征值及特征曲线，包括电站特征水位、出力系数、水位-库容曲线、泄流-尾水位曲线等，并根据水库各时段蓄水上下限、离散个数确定离散状态变量St(DDDP方法的状态离散个数一般选为3)。Step1 data preparation. Obtain the characteristic values and characteristic curves of the basic properties of the power station required for calculation, including the characteristic water level of the power station, output coefficient, water level-storage capacity curve, discharge-tail water level curve, etc., and determine the discrete State variable St (the discrete number of states of the DDDP method is generally selected as 3).

Step2设置迭代廊道个数，并生成初始可行解作为计算初始轨迹。Step2 sets the number of iterative corridors, and generates an initial feasible solution as the initial trajectory for calculation.

Step4选取最大廊道宽度作为当前廊道。Step4 selects the maximum corridor width as the current corridor.

Step5启动并行计算流程。Step5 starts the parallel computing process.

Parallel Step①在当前廊道内构建并行计算的父任务。将廊道内所有蓄水状态变量组合在Bt(St，It，Nt)中的求解计算作为父任务，并为计算过程中水位、出力、发电量等指标创建存储空间。Parallel Step①Construct the parent task of parallel computing in the current corridor. Combine all storage state variables in the corridor in Bt(St, It, Nt) to solve the calculation as the parent task, and create storage space for indicators such as water level, output, and power generation during the calculation process.

Step 5根据结果集输出当前最优轨迹，并判断当前最优轨迹是否与初始轨迹相同。若相同，则进入Step 6；若不相同，进入Step 7。Step 5 Output the current optimal trajectory according to the result set, and judge whether the current optimal trajectory is the same as the initial trajectory. If they are the same, go to Step 6; if not, go to Step 7.

Step 6判断当前廊道是否为最后一个廊道。若不是，设置下一个更小的廊道宽度作为当前廊道宽度，进入Step 7；若是，计算终止，输出最优轨迹作为计算最优解并销毁线程池。Step 6 judges whether the current corridor is the last corridor. If not, set the next smaller corridor width as the current corridor width, and enter Step 7; if so, the calculation is terminated, and the optimal trajectory is output as the optimal solution for calculation and the thread pool is destroyed.

Step 7设置当前最优轨迹作为下一次迭代的初始轨迹，返回Step 5。Step 7 sets the current optimal trajectory as the initial trajectory for the next iteration, and returns to Step 5.

PDDDP算法流程见图6。The flow chart of PDDDP algorithm is shown in Figure 6.

(3)Fork/Join框架细粒度并行设计模式及方法(3) Fork/Join framework fine-grained parallel design pattern and method

Step1数据准备。包括初始化参数以及设置状态离散点数。Step1 data preparation. Including initialization parameters and setting the number of state discrete points.

Step3按串行流程执行直至开始计算状态离散点组合的返回值，进入并行计算。Step3 is executed in a serial process until the return value of the combination of state discrete points is calculated, and then parallel calculation is entered.

Step4利用Fork/Join并行框架按递归模式将父任务划分为多个子任务，并平均分配到各子线程。Step4 uses the Fork/Join parallel framework to divide the parent task into multiple sub-tasks in a recursive mode, and distribute them equally to each sub-thread.

Step5各子任务中的状态离散点组合按动态规划递推公式求解运算，直至各子任务计算结束。Step5 The combination of state discrete points in each subtask is solved and calculated according to the recursive formula of dynamic programming until the calculation of each subtask ends.

Step6汇总合并各子任务的计算结果作为结果集，返回主线程。Step6 summarizes and merges the calculation results of each subtask as a result set and returns to the main thread.

Fork/Join框架细粒度并行设计模式见图6。The fine-grained parallel design pattern of the Fork/Join framework is shown in Figure 6.

(1)基础信息(1) Basic information

为了验证PSCWAGA、PAHPSO、PDP、PDDDP方法的并行效率，以梯级水库群年发电量最大模型(见《具体实施方式》)作为计算实例，并根据各方法的优化特性，选择不同的调度对象(参与优化的水库群)进行测试，用于测试的调度对象为红水河梯级水库群。水库群拓扑结构见附图7。用于测试的计算机CPU配置为Intel Xeon E31245@3.30GHz(4核)，测试状态为“超线程”关闭。In order to verify the parallel efficiency of PSCWAGA, PAHPSO, PDP, and PDDDP methods, the model of the maximum annual power generation of cascade reservoirs (see "Specific Implementation") is used as a calculation example, and according to the optimization characteristics of each method, different dispatching objects (participating Optimized reservoir group) for testing, and the scheduling object used for the test is the Hongshui River cascade reservoir group. See Figure 7 for the topological structure of the reservoir group. The CPU configuration of the computer used for the test is Intel Xeon E31245@3.30GHz (4 cores), and the test status is "hyperthreading" off.

(2)并行性能指标(2) Parallel performance index

加速比Sp和效率Ep是衡量并行算法性能的两个常用指标，其数学表达式分别如下所示：Speedup ratio Sp and efficiency Ep are two commonly used indicators to measure the performance of parallel algorithms, and their mathematical expressions are as follows:

S_p＝T₁/T_p (1)S _p =T ₁ /T _p (1)

E_p＝S_p/p (2)E _p =S _p /p (2)

式中：T1表示串行环境下算法执行时间，s；Tp表示p个内核环境下算法执行时间，s。In the formula: T1 represents the execution time of the algorithm in the serial environment, s; Tp represents the execution time of the algorithm in the environment of p kernels, s.

(3)测试方案(3) Test plan

①PSCWAGA方法①PSCWAGA method

参与优化的水库为鲁布革、云鹏、天生桥一级、光照、龙滩和岩滩6座季调节及以上电站，并设置三组种群规模大小不同的方案进行仿真测试。各方案种群规模大小设置见表1。The reservoirs participating in the optimization are Lubuge, Yunpeng, Tianshengqiao No. 1, Guangzhao, Longtan and Yantan 6 seasonal regulation and above power stations, and set up three groups of schemes with different population sizes for simulation tests. See Table 1 for the population size setting of each scheme.

表1PSCWAGA算法三组仿真方案种群规模大小Table 1 Population size of the three groups of simulation schemes of PSCWAGA algorithm

②PAHPSO方法②PAHPSO method

参与优化的水库为天生桥一级、光照、龙滩和岩滩4座季调节及以上电站，并设置三组种群规模大小不同的方案进行仿真测试。各方案种群规模大小设置见表2。The reservoirs participating in the optimization are four seasonal regulation and above power stations of Tianshengqiao, Guangzhao, Longtan and Yantan, and set up three groups of schemes with different population sizes for simulation tests. See Table 2 for the population size setting of each scheme.

表2PAHPSO算法三组仿真方案种群规模大小Table 2 Population size of three groups of simulation schemes of PAHPSO algorithm

③PDP方法③PDP method

参与优化的水库为龙滩和岩滩2座季调节及以上电站，并设置三组离散点数目不同的方案进行仿真测试。各方案离散点数目设置见表3。The reservoirs participating in the optimization are Longtan and Yantan 2 seasonal regulation and above power stations, and three sets of schemes with different numbers of discrete points are set up for simulation testing. The number of discrete points for each scheme is set in Table 3.

表3PDP算法三组仿真方案离散点数目Table 3 Number of discrete points of the three groups of simulation schemes of PDP algorithm

④PDDDP方法④ PDDDP method

参与优化的水库为天一、龙滩和岩滩3座季调节及以上电站，并设置三组调度周期(计算规模)不同的方案进行仿真测试。各方案调度周期设置见表4。The reservoirs participating in the optimization are Tianyi, Longtan and Yantan 3 seasonal regulation and above power stations, and three groups of different scheduling cycles (calculation scales) are set up for simulation testing. See Table 4 for the scheduling period settings of each scheme.

表4PDDDP算法三组仿真方案离散点数目Table 4 Number of discrete points of the three groups of simulation schemes of PDDDP algorithm

(4)各方案并行结果(4) Parallel results of each program

①PSCWAGA方法①PSCWAGA method

PSCWAGA方法测试结果见表5。3个方案的最少耗时出现在4核环境下，分别为582s、881s以及1173s，比串行时间缩减了1583s、2425s以及3355s，最大加速比分别达到了3.72、3.75以及3.86，充分发挥了多核CPU的加速性能，显著提高了算法的计算效率。The test results of the PSCWAGA method are shown in Table 5. The minimum time consumption of the three schemes appears in the 4-core environment, which are 582s, 881s and 1173s respectively, which is 1583s, 2425s and 3355s shorter than the serial time, and the maximum speedup ratio reaches 3.72, 3.75 and 3.86, give full play to the acceleration performance of multi-core CPU, and significantly improve the calculation efficiency of the algorithm.

表5PSCWAGA方法各仿真方案不同多核环境下串并行结果对比Table 5 Comparison of serial and parallel results in different multi-core environments for each simulation scheme of the PSCWAGA method

②PAHPSO方法②PAHPSO method

PAHPSO方法测试结果见表6。3个方案的最少耗时出现在4核环境下，分别为139.7s、263.9s以及382.7s，比串行时间缩减了363.4s、723.5s以及1075.9s，最大加速比分别达到了3.60、3.74以及3.81，充分发挥了多核CPU的加速性能，显著提高了算法的计算效率。The test results of the PAHPSO method are shown in Table 6. The minimum time consumption of the three schemes appears in the 4-core environment, which are 139.7s, 263.9s and 382.7s respectively, which are 363.4s, 723.5s and 1075.9s shorter than the serial time, and the maximum acceleration The ratios reached 3.60, 3.74 and 3.81 respectively, giving full play to the acceleration performance of the multi-core CPU and significantly improving the computational efficiency of the algorithm.

表6PAHPSO方法各仿真方案不同多核环境下串并行结果对比Table 6 Comparison of serial-parallel results in different multi-core environments for each simulation scheme of the PAHPSO method

③PDP方法③PDP method

PDP方法测试结果见表7。3个方案的最少耗时出现在4核环境下，分别为25.1s、91.5s以及187.5s，比串行时间缩减了64.1s、251.7s以及532.1s，最大加速比分别达到了3.56、3.75以及3.84，充分发挥了多核CPU的加速性能，显著提高了算法的计算效率。The test results of the PDP method are shown in Table 7. The minimum time consumption of the three schemes appears in the 4-core environment, which are 25.1s, 91.5s, and 187.5s, which are 64.1s, 251.7s, and 532.1s shorter than the serial time, and the maximum acceleration The ratios reached 3.56, 3.75 and 3.84 respectively, giving full play to the acceleration performance of the multi-core CPU and significantly improving the computational efficiency of the algorithm.

表7PDP方法各仿真方案不同多核环境下串并行结果对比Table 7 Comparison of serial and parallel results in different multi-core environments for each simulation scheme of the PDP method

④PDDDP方法④ PDDDP method

PDDDP方法测试结果见表8。3个方案的最少耗时出现在4核环境下，分别为15.1s、362.3s以及928.4s，比串行时间缩减了6.7s、258.0s以及669.7s，最大加速比分别达到了1.80、3.48以及3.59，充分发挥了多核CPU的加速性能，显著提高了算法的计算效率。The test results of the PDDDP method are shown in Table 8. The minimum time consumption of the three schemes appears in the 4-core environment, which are 15.1s, 362.3s, and 928.4s, which are 6.7s, 258.0s, and 669.7s shorter than the serial time, and the maximum acceleration The ratios reached 1.80, 3.48, and 3.59 respectively, giving full play to the acceleration performance of the multi-core CPU and significantly improving the computational efficiency of the algorithm.

表8PDDDP方法各仿真方案不同多核环境下串并行结果对比Table 8 Comparison of serial and parallel results in different multi-core environments for each simulation scheme of the PDDDP method

(5)小结(5) Summary

1)通过PSCWAGA、PAHPSO、PDP、PDDDP方法实例测试结果，采用Fork/Join多核并行框架，能充分发挥多核CPU并行性能，大幅度缩减计算耗时，显著提高算法计算效率。1) Through the PSCWAGA, PAHPSO, PDP, PDDDP method instance test results, using the Fork/Join multi-core parallel framework, can give full play to the multi-core CPU parallel performance, greatly reduce the calculation time consumption, and significantly improve the calculation efficiency of the algorithm.

2)并行方法的计算规模越大，缩减计算耗时越多，并行计算的优势越明显；而且随着计算规模逐渐增大，加速比及并行效率逐步增大，加速比更加接近理想加速比。2) The larger the calculation scale of the parallel method, the more time-consuming the calculation will be reduced, and the advantages of parallel computing will be more obvious; and as the calculation scale gradually increases, the speedup ratio and parallel efficiency will gradually increase, and the speedup ratio will be closer to the ideal speedup ratio.

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本发明内。It will be apparent to those skilled in the art that the invention is not limited to the details of the above-described exemplary embodiments, but that the invention can be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Accordingly, the embodiments should be regarded in all points of view as exemplary and not restrictive, the scope of the invention being defined by the appended claims rather than the foregoing description, and it is therefore intended that the scope of the invention be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in the present invention.

此外，应当理解，虽然本说明书按照实施方式加以描述，但并非每个实施方式仅包含一个独立的技术方案，说明书的这种叙述方式仅仅是为清楚起见，本领域技术人员应当将说明书作为一个整体，各实施例中的技术方案也可以经适当组合，形成本领域技术人员可以理解的其他实施方式。In addition, it should be understood that although this specification is described according to implementation modes, not each implementation mode only includes an independent technical solution, and this description in the specification is only for clarity, and those skilled in the art should take the specification as a whole , the technical solutions in the various embodiments can also be properly combined to form other implementations that can be understood by those skilled in the art.

Claims

1. a kind of Cascade Reservoirs Optimized Operation multi-core parallel concurrent based on Fork/Join frameworks calculates design method, its feature exists In comprising the following steps：

(1) the parallel frameworks of Fork/Join are built：The core of the parallel frameworks of Fork/Join inherits the characteristic of " divide and conquer ", passes through Recursive subdivision original problem, formed several scales it is smaller, separate and can parallel computation subproblem；When each subproblem is carried out After independent parallel is calculated, the sub- result for combining each subproblem is the final result of exportable former problem；Fork/Join Frame Designs Unique Thread Pool Technology, when program starts to perform, acquiescence is created and available processor the same number of active threads Number；In " divide and conquer " implementation procedure, the threshold value of the control that can freely set a subproblem scale is defined as son When the higher limit of problem scale, i.e. group problem scale are less than or equal to threshold value, " divide and conquer ", which is performed, to be terminated, and each subproblem is put down It is assigned in different threads and starts parallel computation；In addition, during subproblem parallel computation, Fork/Join is devised solely Special deque sequencing model, and employ " work is stolen " algorithm, i.e., when the queue calculating task of a thread is space-time, Calculating task " will be stolen " from other in running order thread queue afterbodys；

(2) the parallel frameworks of Fork/Join are realized：1) code word that algorithm is realized need to inherit Fork/Join application interface classes Java.util.concurrent.RecursiveAction or java.util.concurrent.RecursiveTask；2) Threshold value is selected to divide task；3) void compute () method of Fork/Join interface classes is realized；4) task division side is set Formula；

(3) typical intelligent method paralell design under coarseness pattern：1) parallel adaptive chaos overall annealing genetic algorithm：

The parameter initializations of Step 1.Set population scale m, chaos sequence number d, population maximum iteration Kmax, initial temperature Spend T0 and auto-adaptive parameter Pc1, Pc2, Pm1, Pm2；

The initialization of population of Step 2.According to Logistic mapping equations, n group chaos sequences are generated at random in chaotic space, are mapped The different individuals of m are generated in solution space and constitute populations, population at individual by each power station different periods water level value (zi1, Zi2 ..., zin) constitute；

Step 3 creates thread pool, and the worker thread number for giving tacit consent to the thread pool of generation is identical with cpu logic number of threads, simultaneously Fork/Join calculating threshold value is set；

Step 4 starts parallel computation flow；

1. father population is divided into the smaller son of multiple scales by Parallel Step according to the calculating threshold value of setting by recursive schema Population；

The sub- population set that 2. Parallel Step divide is evenly distributed to Different Logic thread.In order to keep cpu load to put down Weighing apparatus, it is ensured that the subtask number of each logic thread distribution is identical；

The Parallel Step population independent operatings that 3. each thread is assigned to are calculated, and main calculation procedure is as follows：

A evaluates individual adaptation degree；

B Selecting operations：Using integrally annealed selection mechanism, it is allowed to which parent participates in competition, and filters out individual of future generation；

C crossing operations：Crossover operation is performed using arithmetic crossover method；

D mutation operators：Mutation operation is performed using non-uniform mutation mode；

E calculates parent and offspring individual fitness：When parent individuality Xi is by the generation filial generation X ' that intersects, makes a variation_iIf, f (X '_i) ＞ f (X_i), then with X '_iInstead of Xi；Otherwise, with probability exp [(f (X '_i)-f(X_i))/T_k] retain Xi；

F parameters update.Iterations k=k+1, temperature Tk=1/ln (k/T0+1).

G judges whether sub- population iteration terminates；According to annealing temperature Tk or maximum iteration as the condition of convergence, when both appoint When meaning one reaches the condition of convergence of initial setting up, A is returned；Otherwise, sub- population calculates convergence and terminated；

4. Parallel Step, which collect, merges the optimization solution of each sub- population and collects as a result, returns to main thread.

Step 5 filters out optimal solution from result set, calculates and terminates and destroying threads pond.

2) parallel adaptive Hybrid Particle Swarm (PAHPSO)

Step1 parameter initializations；

Step2 initialization of population；

Step3 creates thread pool, and the worker thread number for giving tacit consent to the thread pool of generation is identical with cpu logic number of threads, simultaneously Fork/Join calculating threshold value is set；

Step4 starts parallel computation flow；

A calculates particle fitness, particle individual optimal solution and population globally optimal solution；Particle fitness and its individual optimal solution Compare, if particle fitness is more excellent than its individual optimal solution, current particle position is used as personal best particle；Particle fitness Compared with population globally optimal solution, if particle fitness is more excellent than population globally optimal solution, current particle position is used as population Global optimum position；

B calculates particle energy and particle energy threshold value；If particle energy is less than current particle energy threshold, to particle present bit Put and perform mutation operation with speed；

C calculates particle similarity and particle similarity threshold；If two adjacent particles similarities are less than current particle similarity threshold Value, then perform mutation operation to the history optimal location of poor particle；

D introduces the greedy random searching strategy based on neighborhood and the personal best particle of particle is updated；If in neighborhood search The current location arrived is bigger than searching for preceding particle fitness, then instead of particle body position before search, then individual with particle after search Body position is made comparisons with particle history optimal location and population global optimum position, more new particle history optimal location and kind Group's optimal location；

E updates speed and the position of particle populations；

F judges whether sub- population iteration terminates；If current iteration number of times is less than maximum iteration, return (1)；Otherwise, sub- kind Group calculates convergence and terminated；

4. Parallel Step, which collect, merges the result of calculation of each sub- population and collects as a result, returns to main thread；

Step5 filters out optimal solution from result set, calculates and terminates and destroying threads pond；

3) Fork/Join frameworks coarse grain parallelism design pattern and method

According to PSCWAGA and PAHPSO method calculation process, induction and conclusion Fork/Join frameworks coarse grain parallelism design pattern and Method：

Step1 initialization algorithms parameter and population scale；

Step2, which creates thread pool and set, calculates threshold value；

Father's population dividing is many sub- populations by recursive schema using the parallel frameworks of Fork/Join by Step3, and is evenly distributed to Each sub-line journey；

Each sub- populations of Step4 press former serial approach process optimization and calculated, until each sub- population iteration terminates；

Step5, which collects, to be merged the result of calculation of each sub- population and collects as a result, returns to main thread；

Step6 filters out optimal solution from result set, calculates and terminates and destroying threads pond；

(4) exemplary dynamic planing method paralell design under fine granularity pattern：1) parallel dynamic programming method：

Step1 data prepare；Obtain the power station primary attribute characteristic value and indicatrix needed for calculating, including power station feature water Position, power factor, water level-storage-capacity curve, aerial drainage-tailwater level curve etc., and according to reservoir day part retaining bound, discrete Number determines discrete state variable St；

Step2 creates thread pool, and the worker thread number for giving tacit consent to the thread pool of generation is identical with cpu logic number of threads, simultaneously Fork/Join calculating threshold value is set；

Step3 starts parallel computation flow；

1. Parallel Step build father's task of parallel computation；By all retaining state variable combinations in iteration cycle in Bt Solution in (St, It, Nt) is calculated as father's task, and for water level in calculating process, exert oneself, the index such as generated energy creates storage Space；

2. father's task is divided into the smaller son of multiple scales by Parallel Step according to the calculating threshold value of setting by recursive schema Task；

The subtask ensemble average that 3. Parallel Step divide is assigned to Different Logic thread.In order to keep cpu load to put down Weighing apparatus, it is ensured that the subtask number of each logic thread distribution is identical；

The Parallel Step subtask independent operatings that 4. each thread is assigned to are calculated, i.e., the institute in subtask it is stateful from Dissipate combination and calculating is solved in recurrence formula；

5. Parallel Step, which collect, merges the result of calculation of each subtask and collects as a result, returns to main thread；

Step4 filters out optimal solution from result set, calculates and terminates and destroying threads pond；

2) parallel discrete differential dynamic programming method：

Step2 sets iteration gallery number, and generates initial feasible solution as calculating initial track；

Step4 chooses maximum width of corridor and is used as current gallery；

Step5 starts parallel computation flow；

1. Parallel Step build father's task of parallel computation in current gallery；By all retaining state variables in gallery Solution in Bt (St, It, Nt) is combined to calculate as father's task, and for water level in calculating process, exert oneself, the index such as generated energy Create memory space；

The subtask ensemble average that 3. Parallel Step divide is assigned to Different Logic thread；In order to keep cpu load to put down Weighing apparatus, it is ensured that the subtask number of each logic thread distribution is identical；

Step 5 exports current optimal trajectory according to result set, and judges whether current optimal trajectory is identical with initial track；If It is identical, then into Step 6；If differing, into Step 7；

Step 6 judges whether current gallery is last gallery；If it is not, setting next smaller width of corridor conduct Current width of corridor, into Step 7；Terminated if so, calculating, output optimal trajectory is used as calculating optimal solution and destroying threads pond；

Step 7 sets current optimal trajectory as the initial track of next iteration, returns to Step 5；

3) Fork/Join frameworks fine grained parallel design pattern and method：

According to PDP and PDDDP method calculation process, induction and conclusion Fork/Join frameworks fine grained parallel design pattern and method：

Step1 data prepare.Counted including initiation parameter and setting state discrete；

Step2, which creates thread pool and set, calculates threshold value；

Step3 is performed by serial flow until starting to calculate the return value of state discrete point combination, into parallel computation；

Father's task is divided into multiple subtasks by Step4 using the parallel frameworks of Fork/Join by recursive schema, and is evenly distributed to Each sub-line journey；

Dynamic Programming recurrence formula derivation is pressed in state discrete point combination in each subtasks of Step5, until each subtask meter Terminate；

Step6, which collects, to be merged the result of calculation of each subtask and collects as a result, returns to main thread；

Step7 filters out current optimal solution from result set, and continues to perform by the serial flow of method until calculating terminates, and sells Ruin thread pool.

Set 2. the Cascade Reservoirs Optimized Operation multi-core parallel concurrent according to claim 1 based on Fork/Join frameworks is calculated Meter method, it is characterised in that it is specially when threshold value sets less than normal that threshold value, which divides task, in step (2), it is meant that the rule of subproblem Mould is smaller and dividing number is more, easily causes resource management consumption excessive；When threshold value sets bigger than normal, it is meant that the scale of subproblem Bigger and dividing number is less, or even group problematic amount is when being less than active line number of passes, is easily caused part worker thread and is in the spare time Configuration state.Therefore, in order to ensure that all working thread can be assigned to subproblem, general threshold value sets public formula (I) as follows：

In public formula (I)：Upper integer is taken for result of calculation；μ is scale domination threshold value；S_cFor former problem scale；W is multinuclear processing Device logic line number of passes.

Set 3. the Cascade Reservoirs Optimized Operation multi-core parallel concurrent according to claim 1 based on Fork/Join frameworks is calculated Meter method, it is characterised in that setting task dividing mode is specially and divides that father's task can be divided into per subtask in step (2) Multiple subtasks, and the subtask number divided is write code by developer and realized.