CN114462772A

CN114462772A - Semiconductor manufacturing scheduling method, system, and computer-readable storage medium

Info

Publication number: CN114462772A
Application number: CN202111609409.XA
Authority: CN
Inventors: 肖俊河; 李�杰; 刘斌; 郭宇翔; 傅慧初
Original assignee: Exxon Industries Guangdong Co ltd
Current assignee: Exxon Industries Guangdong Co ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-05-10

Abstract

The invention discloses a semiconductor manufacturing scheduling method, a system and a computer readable storage medium, wherein the method comprises the following steps: determining a mathematical optimization model according to the data scale; generating a first scheduling sequence according to the mathematical optimization model, and performing iterative computation on the first scheduling sequence under a constraint condition through the mathematical optimization model to obtain a second scheduling sequence; determining an optimal strategy based on the operation result of each training action, determining a target total strategy according to the optimal strategy corresponding to each state parameter, and performing simulation scheduling according to the target total strategy to obtain a third scheduling sequence; and performing scheduling operation according to the third scheduling sequence. The invention realizes the rapid target optimization of semiconductor manufacturing scheduling, effectively and reasonably distributes production capacity, improves the resource utilization rate and reduces the production cost.

Description

Semiconductor manufacturing scheduling method, system, and computer-readable storage medium

技术领域technical field

本发明涉及半导体制造技术领域，具体涉及一种半导体制造排产方法、系统以及计算机可读存储介质。The present invention relates to the technical field of semiconductor manufacturing, and in particular, to a semiconductor manufacturing scheduling method, system, and computer-readable storage medium.

背景技术Background technique

半导体生产制造属于复杂的工业制造链之一，其通常包括四个生产阶段：晶圆制造、晶圆分拣、包装及产品测试。其中，晶圆制造是复杂的工作，机器和硅晶片的制造需要几百过程步骤。而小部分的产品存在于工厂中成百上千的工序中，因此，需要创建复杂的能力分配和作业调度问题，即排程问题是半导体制造生产中密切关心的问题。排程问题一般指n个工件在m台机器上流水线加工，每个工件在每个机器上运行花费的时间不同，且每个机器同一时刻只能加工一个工件，调度的目标是确定工件在每台机器上的加工顺序、每个工序的开工时间，使得最大完工时间最小或其他指标达到最优。Semiconductor manufacturing is one of the complex industrial manufacturing chains that typically include four production stages: wafer fabrication, wafer sorting, packaging, and product testing. Among them, wafer fabrication is a complex job, and the fabrication of machines and silicon wafers requires hundreds of process steps. A small part of the product exists in hundreds or thousands of processes in the factory. Therefore, it is necessary to create complex capacity allocation and job scheduling problems, that is, scheduling problems are closely related issues in semiconductor manufacturing production. The scheduling problem generally refers to the pipeline processing of n workpieces on m machines. Each workpiece takes different time to run on each machine, and each machine can only process one workpiece at a time. The processing sequence on each machine and the start time of each process can minimize the maximum completion time or optimize other indicators.

目前，现有一种半导体生产排程方法是利用进化规律而构造的例如遗传算法的智能寻优算法进行查询的，但是这种方法需要构造序列种群并在种群中进行寻优迭代，需要对种群中每个个体分别进行计算，当种群越大时，计算量也越大，从而造成求解一次需要耗费大量时间。At present, an existing semiconductor production scheduling method uses evolutionary laws to construct intelligent optimization algorithms such as genetic algorithms for querying. However, this method needs to construct a sequence population and perform optimization iterations in the population. Each individual is calculated separately. When the population is larger, the amount of calculation is also larger, so that it takes a lot of time to solve once.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提出一种半导体制造排产方法、系统及计算机可读存储介质，其能实现生产产能有效合理分配，提高资源利用率，降低生产成本。The purpose of the present invention is to provide a semiconductor manufacturing scheduling method, system and computer-readable storage medium, which can achieve effective and reasonable distribution of production capacity, improve resource utilization, and reduce production costs.

为达到上述目的，本发明提出了一种半导体制造排产方法，所述方法包括以下步骤：In order to achieve the above object, the present invention proposes a method for scheduling semiconductor manufacturing, the method comprising the following steps:

获取半导体制造排产的数据规模，根据所述数据规模确定数学优化模型；Obtain the data scale of semiconductor manufacturing scheduling, and determine a mathematical optimization model according to the data scale;

根据所述数学优化模型生成第一排程顺序序列，通过所述数学优化模型在约束条件下对所述第一排程顺序序列进行迭代计算得到第二排程顺序序列，所述第二排程顺序序列为在约束条件下适应值最大的排程顺序序列，所述约束条件包括最大迭代次数；A first scheduling sequence is generated according to the mathematical optimization model, and a second scheduling sequence is obtained by iteratively calculating the first scheduling sequence through the mathematical optimization model under constraints. The sequence sequence is a scheduling sequence sequence with the largest fitness value under the constraint condition, and the constraint condition includes the maximum number of iterations;

确定半导体制造环境的所有状态参数，以及所述第二排程顺序序列的训练动作，遍历各所述状态参数，基于各所述初始策略确定遍历的状态参数对应的所有遍历初始策略；determining all state parameters of the semiconductor manufacturing environment and the training actions of the second scheduling sequence, traversing each of the state parameters, and determining all traversal initial strategies corresponding to the traversed state parameters based on each of the initial strategies;

运行各所述遍历初始策略中的训练动作，基于各所述训练动作的运行结果确定最优策略，根据各所述状态参数对应的最优策略确定目标总策略，并根据所述目标总策略进行仿真排产，得到第三排程顺序序列；Run the training actions in each of the traversal initial strategies, determine the optimal strategy based on the running results of each of the training actions, determine the overall target strategy according to the optimal strategy corresponding to each of the state parameters, and perform the strategy according to the overall target strategy. Simulate production scheduling to obtain the third scheduling sequence;

根据第三排程顺序序列进行排程操作。The scheduling operation is performed according to the third scheduling sequence.

进一步，在上述半导体制造排产方法中，所述数据规模包括待加工产品数量。Further, in the above semiconductor manufacturing scheduling method, the data scale includes the number of products to be processed.

进一步，在上述半导体制造排产方法中，所述通过所述数学优化模型在约束条件下对所述第一排程顺序序列进行迭代计算得到第二排程顺序序列的步骤具体包括：Further, in the above-mentioned semiconductor manufacturing scheduling method, the step of obtaining the second scheduling sequence by iteratively calculating the first scheduling sequence through the mathematical optimization model under constraints specifically includes:

当所述数据规模不超过预设阈值时，生成两个随机整数，所述两个随机整数为1~N之间不同的整数，N为所述第一排程顺序序列的长度；When the data size does not exceed a preset threshold, two random integers are generated, where the two random integers are different integers between 1 and N, where N is the length of the first scheduling sequence;

将所述第一排程顺序序列中对应两个随机整数的序号位置的数值互换，得到第一比较顺序序列；Interchange the numerical values of the serial number positions corresponding to the two random integers in the first scheduling sequence to obtain a first comparison sequence;

计算并比较所述第一排程顺序序列的适应值与第一比较顺序序列的适应值；在所述第一比较顺序序列的适应值大于所述第一排程顺序序列的适应值时，对所述第一比较顺序序列重复所述数值互换的步骤，直至满足约束条件中最大迭代次数，得到第二排程顺序序列，所述第二排程顺序序列为在约束条件下适应值最大的排程顺序序列。Calculate and compare the fitness value of the first scheduling sequence sequence with the fitness value of the first comparison sequence sequence; when the fitness value of the first comparison sequence sequence is greater than the fitness value of the first scheduling sequence The first comparison sequence sequence repeats the steps of the value exchange until the maximum number of iterations in the constraint condition is satisfied, and a second sequence sequence sequence is obtained, and the second sequence sequence sequence is the one with the largest adaptation value under the constraint condition. Schedule a sequential sequence.

当所述数据规模超过预设阈值时，对所述第一排程顺序序列在约束条件下进行轮盘筛选、交叉操作及变异操作得到多个优选序列，所述约束条件包括最大迭代次数包括最大迭代次数，种群中个体数量、交叉概率及变异概率；When the data size exceeds a preset threshold, perform roulette screening, crossover operation and mutation operation on the first scheduling sequence under constraints, where the constraints include the maximum number of iterations including the maximum number of iterations. The number of iterations, the number of individuals in the population, the probability of crossover and the probability of mutation;

根据多个优选序列确定所述第二排程顺序序列，所述第二排程顺序序列为多个优选序列中适应值最大的优选序列。The second scheduling sequence is determined according to the multiple preferred sequences, and the second scheduled sequence is the preferred sequence with the largest fitness value among the multiple preferred sequences.

进一步，在上述半导体制造排产方法中，所述对所述第一排程顺序序列在约束条件下进行轮盘筛选的步骤具体包括：Further, in the above-mentioned semiconductor manufacturing scheduling method, the step of performing roulette screening on the first scheduling sequence under constraints specifically includes:

分别计算得到所述第一排程顺序序列中多个随机顺序序列的适应值，根据每一随机顺序序列的适应值占所有随机顺序序列的适应值之和的比例构造归一化区间，所述归一化区间包括每一适应值对应的多个子区间，所述归一化区间取值范围为[0，1]；Calculate the fitness values of a plurality of random sequence sequences in the first scheduling sequence sequence respectively, and construct a normalized interval according to the ratio of the fitness value of each random sequence sequence to the sum of the fitness values of all random sequence sequences. The normalized interval includes multiple sub-intervals corresponding to each fitness value, and the value range of the normalized interval is [0, 1];

生成多个位于0~1之间不同的随机数，所述随机数的数量与多个随机顺序序列的数量相同；generating multiple random numbers that are different between 0 and 1, the number of the random numbers is the same as the number of multiple random sequence sequences;

在所述归一化区间中判断每一随机数落入的子区间，并根据所落入的子区间的适应值选出其对应的随机顺序序列，得到筛选后的多个随机顺序序列。In the normalized interval, the sub-intervals into which each random number falls are determined, and the corresponding random sequence sequence is selected according to the fitness value of the sub-interval that falls into, and a plurality of selected random sequence sequences are obtained.

进一步，在上述半导体制造排产方法中，所述对所述第一排程顺序序列在约束条件下进行交叉操作的步骤具体包括：Further, in the above-mentioned semiconductor manufacturing scheduling method, the step of performing a crossover operation on the first scheduling sequence sequence under constraints specifically includes:

根据交叉概率及筛选后的多个随机顺序序列确定待交叉的随机顺序序列，并将所述待交叉的随机顺序序列进行两两配对得到多对序列组，每一对序列组包括两个随机顺序序列；Determine the random sequence sequence to be crossed according to the crossover probability and the multiple random sequence sequences after screening, and pair the random sequence sequences to be crossed in pairs to obtain multiple pairs of sequence groups, each pair of sequence groups includes two random sequence sequences sequence;

生成一个1~N之间的随机整数，N为所述第一排程顺序序列的长度，对该序列组中两个随机顺序序列从该随机整数位置起正向或反向的数值进行互换，得到交叉后的多个随机顺序序列。Generate a random integer between 1 and N, where N is the length of the first scheduling sequence, and exchange the forward or reverse values of the two random sequence sequences in the sequence group from the random integer position , to obtain multiple random sequence sequences after crossover.

进一步，在上述半导体制造排产方法中，所述对所述第一排程顺序序列在约束条件下进行变异操作的步骤具体包括：Further, in the above-mentioned semiconductor manufacturing scheduling method, the step of performing a mutation operation on the first scheduling sequence sequence under constraints specifically includes:

根据变异概率及交叉后的多个随机顺序序列确定待变异的随机顺序序列；Determine the random sequence sequence to be mutated according to the mutation probability and multiple random sequence sequences after crossover;

生成两个1~N之间的随机整数，N为所述第一排程顺序序列的长度，对待变异的随机顺序序列中的所述两个随机整数对应位置之间的子序列进行随机排序，得到多个变异后的随机顺序序列；Generate two random integers between 1 and N, where N is the length of the first scheduling sequence, and randomly sort the subsequences between the corresponding positions of the two random integers in the random sequence to be mutated, Obtain multiple mutated random sequence sequences;

根据所述多个变异后的随机顺序序列计算得到一优选序列，所述优选序列为所述多个变异后的随机顺序序列中适应值最大的序列。A preferred sequence is calculated according to the plurality of mutated random sequence sequences, and the preferred sequence is the sequence with the largest fitness value among the plurality of mutated random sequence sequences.

进一步，在上述半导体制造排产方法中，所述训练动作包括交期顺序、客户等级、产品离缓冲区的到达时间以及产品初始顺序。Further, in the above-mentioned semiconductor manufacturing scheduling method, the training action includes the delivery sequence, the customer level, the arrival time of the products from the buffer zone, and the initial sequence of the products.

另，本发明还提供一种实现上述的半导体制造排产方法的半导体制造排产系统，包括：In addition, the present invention also provides a semiconductor manufacturing scheduling system for realizing the above-mentioned semiconductor manufacturing scheduling method, including:

第一确定单元，用于根据半导体制造排产的数据规模确定数学优化模型；a first determining unit, used for determining a mathematical optimization model according to the data scale of semiconductor manufacturing scheduling;

第一计算单元，用于根据所述数学优化模型生成第一排程顺序序列，通过所述数学优化模型在约束条件下对所述第一排程顺序序列进行迭代计算得到第二排程顺序序列，所述第二排程顺序序列为在约束条件下适应值最大的排程顺序序列，所述约束条件包括最大迭代次数；a first computing unit, configured to generate a first scheduling sequence according to the mathematical optimization model, and obtain a second scheduling sequence by iteratively calculating the first scheduling sequence under constraints by the mathematical optimization model , the second scheduling sequence is a scheduling sequence with a maximum fitness value under a constraint condition, and the constraint condition includes a maximum number of iterations;

第二确定单元，用于确定待加工产品的目标属性优先级；The second determining unit is used to determine the priority of the target attribute of the product to be processed;

第二计算单元，用于确定半导体制造环境的所有状态参数，以及所述第二排程顺序序列的训练动作，遍历各所述状态参数，基于各所述初始策略确定遍历的状态参数对应的所有遍历初始策略；以及运行各所述遍历初始策略中的训练动作，基于各所述训练动作的运行结果确定最优策略，根据各所述状态参数对应的最优策略确定目标总策略，并根据所述目标总策略进行仿真排产，得到第三排程顺序序列；The second computing unit is configured to determine all the state parameters of the semiconductor manufacturing environment and the training actions of the second scheduling sequence, traverse each of the state parameters, and determine all the state parameters corresponding to the traversed state parameters based on each of the initial strategies Traversing the initial strategy; and running the training actions in each of the traversing initial strategies, determining the optimal strategy based on the running results of each of the training actions, determining the overall target strategy according to the optimal strategy corresponding to each of the state parameters, and According to the general strategy of the target, the simulation scheduling is carried out, and the third scheduling sequence is obtained;

排产单元，用于根据第三排程顺序序列进行排程操作。The scheduling unit is used for scheduling operations according to the third scheduling sequence.

另，本发明还提供一种计算机可读存储介质，其上存储有计算机程序，所述程序被处理器执行实现上述的半导体制造排产方法。In addition, the present invention also provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to realize the above-mentioned semiconductor manufacturing scheduling method.

本发明通过智能算法的全局搜索能力以及强化学习算法的局部优化技巧的结合，实现了半导体制造排产的快速目标优化，有效合理分配了生产产能，提高了资源利用率，降低了生产成本。Through the combination of the global search ability of the intelligent algorithm and the local optimization skill of the reinforcement learning algorithm, the invention realizes the rapid target optimization of the semiconductor manufacturing scheduling, effectively and reasonably allocates the production capacity, improves the resource utilization rate, and reduces the production cost.

附图说明Description of drawings

图1是本发明实施例的半导体制造排产方法的流程图；1 is a flowchart of a semiconductor manufacturing scheduling method according to an embodiment of the present invention;

图2是本发明中排程顺序序列的适应值计算与估计的示意图；Fig. 2 is the schematic diagram of the fitness value calculation and estimation of the scheduling sequence sequence in the present invention;

图3是本发明中第一数学优化模型的示意图；Fig. 3 is the schematic diagram of the first mathematical optimization model in the present invention;

图4是本发明中第二数学优化模型的示意图；Fig. 4 is the schematic diagram of the second mathematical optimization model in the present invention;

图5是本发明中半导体制造排产方法的场景示意图；Fig. 5 is the scene schematic diagram of the semiconductor manufacturing scheduling method in the present invention;

图6是本发明中强化学习训练的原理示意图；Fig. 6 is the principle schematic diagram of reinforcement learning training in the present invention;

图7是本发明强化学习训练中策略Q值表的示意图；Fig. 7 is the schematic diagram of the strategy Q value table in the reinforcement learning training of the present invention;

图8为本发明强化学习训练中深度回归模型训练的流程示意图；8 is a schematic flowchart of the training of a deep regression model in the reinforcement learning training of the present invention;

图9是本发明实施例的半导体制造排产方法的半导体制造排产系统的结构示意图；9 is a schematic structural diagram of a semiconductor manufacturing scheduling system of a semiconductor manufacturing scheduling method according to an embodiment of the present invention;

图10是图9中第一计算单元的结构示意图。FIG. 10 is a schematic structural diagram of the first computing unit in FIG. 9 .

具体实施方式Detailed ways

本实施例以半导体制造排产方法及系统为例，以下将结合具体实施例和附图对本发明进行详细说明。The present embodiment takes the semiconductor manufacturing scheduling method and system as an example, and the present invention will be described in detail below with reference to specific embodiments and accompanying drawings.

本发明实施例提供的一种半导体制造排产方法，包括如下步骤：A semiconductor manufacturing scheduling method provided by an embodiment of the present invention includes the following steps:

获取半导体制造排产的数据规模，根据所述数据规模确定数学优化模型；根据所述数学优化模型生成第一排程顺序序列，通过所述数学优化模型在约束条件下对所述第一排程顺序序列进行迭代计算得到第二排程顺序序列，所述第二排程顺序序列为在约束条件下适应值最大的排程顺序序列，所述约束条件包括最大迭代次数；确定半导体制造环境的所有状态参数，以及所述第二排程顺序序列的训练动作，遍历各所述状态参数，基于各所述初始策略确定遍历的状态参数对应的所有遍历初始策略；运行各所述遍历初始策略中的训练动作，基于各所述训练动作的运行结果确定最优策略，根据各所述状态参数对应的最优策略确定目标总策略，并根据所述目标总策略进行仿真排产，得到第三排程顺序序列；根据第三排程顺序序列进行排程操作。本发明通过智能算法的全局搜索能力以及强化学习算法的局部优化技巧的结合，实现了半导体制造排产的快速目标优化，有效合理分配了生产产能，提高了资源利用率，降低了生产成本。Obtain the data scale of semiconductor manufacturing scheduling, and determine a mathematical optimization model according to the data scale; generate a first scheduling sequence sequence according to the mathematical optimization model, and use the mathematical optimization model to perform the first scheduling under constraints The sequence sequence is iteratively calculated to obtain a second sequence sequence sequence, and the second sequence sequence sequence is a sequence sequence sequence with the largest fitness value under the constraint condition, and the constraint condition includes the maximum number of iterations; The state parameters, and the training actions of the second scheduling sequence sequence, traverse each of the state parameters, and determine all traversal initial strategies corresponding to the traversed state parameters based on each of the initial strategies; run each of the traversal initial strategies. For training actions, the optimal strategy is determined based on the running results of each of the training actions, the overall target strategy is determined according to the optimal strategy corresponding to each of the state parameters, and simulation scheduling is performed according to the overall target strategy to obtain a third schedule Sequence sequence; the scheduling operation is performed according to the third scheduling sequence sequence. Through the combination of the global search ability of the intelligent algorithm and the local optimization skill of the reinforcement learning algorithm, the invention realizes the rapid target optimization of the semiconductor manufacturing scheduling, effectively and reasonably allocates the production capacity, improves the resource utilization rate, and reduces the production cost.

请参阅图1至图6，本发明实施例提供的一种半导体制造排产方法，所述方法具体包括以下步骤：Please refer to FIG. 1 to FIG. 6 , a semiconductor manufacturing scheduling method provided by an embodiment of the present invention specifically includes the following steps:

步骤S11：对半导体制造的场景进行仿真初始化；Step S11: perform simulation initialization on the semiconductor manufacturing scene;

在具体实现时，在半导体制造加工时，例如从晶棒到硅片的完整加工过程进行排产，则可将真实环境抽象为一个发布产品的入口、一个统计完成产品的出口、两个存放产品的缓冲区和各个加工机器。入口为订单的产生模块，出口为已完成产品的汇合，不同厂区的各个空闲机器从同一缓冲区获取合适的产品。可以理解，各个场景的各种成分与数量可以不一致。In the specific implementation, during semiconductor manufacturing and processing, such as scheduling the entire process from ingot to silicon wafer, the real environment can be abstracted into an entrance for releasing products, an exit for statistical completed products, and two for storing products. buffers and individual processing machines. The entrance is the generation module of the order, the exit is the confluence of completed products, and each idle machine in different factory areas obtains suitable products from the same buffer. It can be understood that the various components and quantities of each scene may be inconsistent.

步骤S12：获取半导体制造排产的数据规模，根据所述数据规模确定数学优化模型；Step S12: obtaining the data scale of semiconductor manufacturing scheduling, and determining a mathematical optimization model according to the data scale;

请参阅图2，在具体实现时，本发明是通过两级优化来达到半导体排程最优策略，其中，第一级优化是通过智能算法计算半导体排程各个序列对应的适应值的大小来尝试找出更优的序列，而智能算法的计算量跟半导体制造排产的数据规模相关，因此，首先需获取评估半导体制造排产的数据规模，并根据所述数据规模来确定数学优化模型。本实施例中，所述数据规模为待加工产品数量，当然，所述数据规模也可以为产品加工总时间等其他条件数据。在半导体生产过程中，一批待加工产品数量是根据生产计划提前确定的，主要是根据短期生产计划或中期生产计划而定，例如根据客户需求分批分计划生产，某一批500个产品按期进行排程生产之前，产品的数量500个是提前确定的，根据产品不同数量，其排程计算适应值F1的计算量是不同的，其应用的数学优化模型也是不同，具体地，可根据经验或大数据设置一阈值，若数据规模不超过该阈值，则选择应用第一数学优化模型，若数据规模超过该阈值，则选择应用第二数学优化模型。Please refer to FIG. 2 , in the specific implementation, the present invention achieves the optimal strategy of semiconductor scheduling through two-level optimization, wherein, the first-level optimization is to try to calculate the size of the fitness value corresponding to each sequence of semiconductor scheduling through an intelligent algorithm. To find a better sequence, the calculation amount of the intelligent algorithm is related to the data scale of semiconductor manufacturing scheduling. Therefore, it is first necessary to obtain the data scale for evaluating semiconductor manufacturing scheduling, and determine the mathematical optimization model according to the data scale. In this embodiment, the data scale is the number of products to be processed. Of course, the data scale may also be other conditional data such as the total product processing time. In the semiconductor production process, the quantity of a batch of products to be processed is determined in advance according to the production plan, mainly according to the short-term production plan or the medium-term production plan. Before the scheduled production, the quantity of 500 products is determined in advance. According to the different quantities of the products, the calculation amount of the adaptive value F1 of the scheduling calculation is different, and the applied mathematical optimization model is also different. Specifically, according to the experience Or set a threshold for big data. If the data size does not exceed the threshold, the first mathematical optimization model is selected to be applied, and if the data size exceeds the threshold, the second mathematical optimization model is selected to be applied.

即所述根据所述数据规模确定数学优化模型的步骤具体包括：That is, the step of determining the mathematical optimization model according to the data scale specifically includes:

当所述数据规模不超过预设阈值时，则选择应用第一数学优化模型；当所述数据规模超过预设阈值时，则选择应用第二数学优化模型。When the data size does not exceed the preset threshold, the first mathematical optimization model is selected to be applied; when the data size exceeds the preset threshold, the second mathematical optimization model is selected to be applied.

本实施例中，所述阈值为500，所述第一数学优化模型为爬山算法模型，所述第二数学优化模型为遗传算法模型。即本实施例中，当待加工产品数量不超过500时，选择应用爬山算法模型；当待加工产品数量超过500时，选择应用遗传算法模型。这样，可以根据不同的数据规模及计算量选择合适的数学优化模型，使得模型优化及生产排程最快速有效。In this embodiment, the threshold is 500, the first mathematical optimization model is a hill-climbing algorithm model, and the second mathematical optimization model is a genetic algorithm model. That is, in this embodiment, when the number of products to be processed does not exceed 500, the hill-climbing algorithm model is selected to be applied; when the number of products to be processed exceeds 500, the genetic algorithm model is selected to be applied. In this way, an appropriate mathematical optimization model can be selected according to different data scales and calculations, so that model optimization and production scheduling are the fastest and most effective.

步骤S13：根据所述数学优化模型生成第一排程顺序序列；Step S13: generating a first scheduling sequence according to the mathematical optimization model;

在具体实现时，当所述数据规模不超过预设阈值时，则选择应用第一数学优化模型。本实施例中，所述第一数学优化模型为爬山算法模型，所述数据规模为待加工产品数量，根据所述待加工产品数量生成第一排程顺序序列，所述第一排程顺序序列的长度为待加工产品数量。具体地，当选择应用爬山算法模型时，根据待加工产品数量N生成一个长度为N的第一排程顺序序列P₀，所述第一排程顺序序列P₀为随机顺序序列。During specific implementation, when the data scale does not exceed a preset threshold, the first mathematical optimization model is selected to be applied. In this embodiment, the first mathematical optimization model is a hill-climbing algorithm model, the data scale is the number of products to be processed, and a first scheduling sequence is generated according to the number of products to be processed, and the first scheduling sequence is The length is the number of products to be processed. Specifically, when the hill-climbing algorithm model is selected to be applied, a first scheduling sequence P ₀ of length N is generated according to the number N of products to be processed, and the first scheduling sequence P ₀ is a random sequence.

当所述数据规模超过预设阈值时，则选择应用第二数学优化模型。本实施例中，所述第二数学优化模型为遗传算法模型，所述数据规模为待加工产品数量，根据所述待加工产品数量及种群中个体数量生成第一排程顺序序列，所述第一排程顺序序列包括多个随机顺序序列，所述随机顺序序列的个数为种群中个体数量，所述随机顺序序列的长度为待加工产品数量。具体地，当选择应用遗传算法模型时，根据待加工产品数量N及种群中个体数量m，生成m个长度为N的随机顺序序列L₁₁，L₁₂，…，L_1m；记i=1。When the data size exceeds a preset threshold, the second mathematical optimization model is selected to be applied. In this embodiment, the second mathematical optimization model is a genetic algorithm model, the data scale is the number of products to be processed, and a first scheduling sequence is generated according to the number of products to be processed and the number of individuals in the population, and the first scheduling sequence is generated. A scheduling sequence sequence includes a plurality of random sequence sequences, the number of the random sequence sequences is the number of individuals in the population, and the length of the random sequence sequence is the number of products to be processed. Specifically, when choosing to apply the genetic algorithm model, according to the number of products to be processed N and the number of individuals m in the population, m random sequence sequences L ₁₁ , L ₁₂ , . . . , L _1m of length N are generated; denote i=1.

步骤S14：通过所述数学优化模型在约束条件下对所述第一排程顺序序列进行迭代计算得到第二排程顺序序列，所述第二排程顺序序列为在约束条件下适应值最大的排程顺序序列，所述约束条件包括最大迭代次数；Step S14: Iteratively calculates the first scheduling sequence under constraints by using the mathematical optimization model to obtain a second scheduling sequence, where the second scheduling sequence is the one with the largest fitness value under the constraints. scheduling a sequential sequence, the constraints include a maximum number of iterations;

在具体实现时，请参阅图3，当所述数据规模不超过预设阈值时，所述第一数学优化模型为爬山算法模型，此时，约束条件为最大迭代次数n，所述最大迭代次数n是根据经验及目标精度估算的次数。当生成长度为N的第一排程顺序序列P₀后，例如第一排程顺序序列P₀为2-3-1-7-6-4-8-5；生成两个1~N之间的随机整数，再将P₀中对应两个随机整数序号位置的数值互换，记为P₁，例如生成3，4两个随机整数，此时将第一排程顺序序列P₀第3、4位置上的数值互换，得到第一比较顺序序列P₁为2-3-7-1-6-4-8-5。接着，计算并比较所述第一排程顺序序列P₀的适应值F₀及第一比较顺序序列P₁的适应值F₁。In the specific implementation, please refer to FIG. 3, when the data scale does not exceed a preset threshold, the first mathematical optimization model is a hill-climbing algorithm model, and at this time, the constraint condition is the maximum number of iterations n, the maximum number of iterations n is the number of times estimated based on experience and target accuracy. After generating the first scheduling sequence P ₀ of length N, for example, the first scheduling sequence P ₀ is 2-3-1-7-6-4-8-5; , and then exchange the numerical values corresponding to the serial _number positions of the two random integers in P ₀ and record it as P ₁ . For example, two random integers 3 and 4 are generated. The numerical values at the 4 position are interchanged, and the first comparison sequence P ₁ is obtained as 2-3-7-1-6-4-8-5. Next, the fitness value F ₀ of the first scheduling sequence P ₀ and the fitness value F ₁ of the first comparison sequence P ₁ are calculated and compared.

若F₀F₁，则说明第一比较顺序序列P₁非优于第一排程顺序序列P₀，此时需要重新进行上述两个随机整数位置的数值互换步骤，直至找到优于第一排程顺序序列P₀的第一比较顺序序列P₁。If F ₀ F ₁ , it means that the first comparison sequence P ₁ is not superior to the first scheduling sequence P ₀ . At this time, it is necessary to re-execute the numerical exchange steps of the above two random integer positions until the first comparison sequence P 1 is found to be better than the first sequence sequence P 0 . The first comparison sequence P ₁ of the sequence sequence P ₀ is scheduled.

若F₀<F₁，则说明第一比较顺序序列P₁更优，此时，继续选择生成两个1~N之间的随机整数，再将P₁中对应两个随机整数序号位置的数值互换，记为P₂，例如生成6，7两个随机整数，此时将第一排程顺序序列P₁第6、7位置上的数值互换，得到第二比较顺序序列P₂为2-3-7-1-6-8-4-5。接着，计算并比较所述第一比较顺序序列P₁的适应值F₁及第二比较顺序序列P₂的适应值F₂，。。。以此类推，选择生成两个1~N之间的随机整数，再将P_i中对应两个随机整数序号位置的数值互换，记为P_i+1，直至满足约束条件最大迭代次数n（i>n），停止迭代，即计算得到第二排程顺序序列P_n，所述第二排程顺序序列P_n为在最大迭代次数n下适应值最大的排程顺序序列。If F ₀ <F ₁ , it means that the first comparison sequence sequence P ₁ is better. At this time, continue to select and generate two random integers between 1 and N, and then set the values in P ₁ corresponding to the sequence numbers of the two random integers. Swap, denoted as P ₂ , for example, two random integers 6 and 7 are generated. At this time, the values at the 6th and 7th positions of the first sequence sequence P ₁ are interchanged, and the second sequence sequence P ₂ is obtained as 2 -3-7-1-6-8-4-5. Next, the fitness value F 1 of the first comparison sequence P ₁ and the fitness value F ₂ of the _second comparison sequence P ₂ are calculated and compared. . . By analogy, two random integers between 1 and N are selected to be generated, and then the numerical values corresponding to the serial numbers of the two random integers in Pi are exchanged, and recorded as P _i ₊₁ , until the maximum number of iterations n ( i>n), the iteration is stopped, that is, the second scheduling sequence sequence P _n is obtained by calculation, and the second scheduling sequence sequence P _n is the scheduling sequence sequence with the largest fitness value under the maximum number of iterations n.

对于第一排程顺序序列L=[l₀, l₁, l₂, ..., l_N-1]，其适应值函数F[L]为：For the first scheduling sequence L=[l ₀ , l ₁ , l ₂ , ..., l _N-1 ], its fitness function F[L] is:

其中，P（li)为该产品批次的总加工时间，D(li)为该产品批次的交期，Begin为初始时间，N为第一排程顺序序列的长度，l₀, l₁, l₂, ..., l_N-1为第一排程顺序序列中的产品。Among them, P(li) is the total processing time of the product batch, D(li) is the delivery date of the product batch, Begin is the initial time, N is the length of the first scheduling sequence, l ₀ , l ₁ , l ₂ , ..., l _N-1 are the products in the first scheduling sequence.

在其他条件相同的情况下，一个产品批次序列适应值越大，其排程效率越高，越优化合理。Under the same other conditions, the larger the fitness value of a product batch sequence, the higher the scheduling efficiency and the more reasonable the optimization.

即所述步骤S14具体包括：That is, the step S14 specifically includes:

所述步骤S14还包括：The step S14 also includes:

在所述第一比较顺序序列的适应值不大于所述第一排程顺序序列的适应值时，重复上述生成第一比较顺序序列的步骤，直至得到的第一比较顺序序列的适应值大于所述第一排程顺序序列的适应值。When the fitness value of the first comparison sequence is not greater than the fitness value of the first scheduling sequence, repeat the above steps of generating the first comparison sequence until the obtained fitness value of the first comparison sequence is greater than the The fitness value of the first scheduling sequence sequence.

请参阅图4，当所述数据规模不超过预设阈值时，所述第二数学优化模型为遗传算法模型，此时，约束条件为最大迭代次数n、种群中个体数量m、交叉概率pc及变异概率pm，所述约束条件是根据经验及目标精度估算的次数。当生成m个长度为N的随机顺序序列即第一排程顺序序列L₁₁，L₁₂，…，L_1m后，再通过轮盘模型对种群的个体进行筛选，具体为：分别计算得到该m个的随机顺序序列L₁₁，L₁₂，…，L_1m的适应值F₁₁，F₁₂，…，F_1m，以所述m个适应值F₁₁， F₁₂，…，F_1m，构造归一化的轮盘，即先计算m个适应值F₁₁，F₁₂，…，F_1m之和

，再分别计算m个适应值F₁₁，F₁₂，…，F_1m，与

的比例，这样就构造了[0，1]的区间，以m个适应值 F₁₁，F₁₂，…，F_1m，与

的比例将所述[0，1]的区间划分成m个子区间；接着，生成m个0~1 之间的随机数，并将这m个随机数所对应m个子区间的个体挑选出来，形成筛选后种群，例如某个随机数为0.05，落入在F₁₁的子区间内，此时则选择出F₁₁对应的随机顺序序列L₁₁；以此类推，可以选出m个随机顺序序列，所述m个随机顺序序列可能存在相同的序列（落入在相同的子区间内）。 Please refer to FIG. 4 , when the data size does not exceed a preset threshold, the second mathematical optimization model is a genetic algorithm model. At this time, the constraints are the maximum number of iterations n, the number of individuals in the population m, the crossover probability pc and Mutation probability pm, the constraint is the number of times estimated based on experience and target accuracy. After generating m random sequence sequences of length N, that is, the first sequence sequence L ₁₁ , L ₁₂ , ..., L _1m , the individuals of the population are screened by the roulette model. Specifically, the m The fitness values F ₁₁ , F ₁₂ , . . _{. , F 1m} _of _random sequence sequences L ₁₁ , L ₁₂ _, _. The modified roulette, that is, first calculate the sum of m fitness values F ₁₁ , F ₁₂ , ..., F _1m

, and then calculate m fitness values F ₁₁ , F ₁₂ , . . . , F _1m respectively, and

The proportion of , thus constructing the interval of [0, 1], with m fitness values F ₁₁ , F ₁₂ , . . . , F _1m , and

The ratio of [0, 1] is divided into m sub-intervals; then, m random numbers between 0 and 1 are generated, and the individuals of the m sub-intervals corresponding to the m random numbers are selected to form After screening, the population, for example, a random number of 0.05 falls within the sub-interval of F ₁₁ , at this time, the random sequence sequence L ₁₁ corresponding to F ₁₁ is selected; and so on, m random sequence sequences can be selected, The m random sequence sequences may exist in the same sequence (falling in the same sub-interval).

接着，对筛选后的种群m个随机顺序序列进行交叉操作，即根据交叉概率pc挑选需要交叉的个体，并两两配对，例如筛选后的种群有100个随机顺序序列，交叉概率pc为0.8，则挑选出0.8*100=80个随机顺序序列，并两两配成40对序列组；再针对每一对序列组，分别生成一个1~N之间的随机整数，对该序列组中两个随机顺序序列从该随机整数位置起的数值互换，当数字有重复即冲突时在末端冲突处进行交换，形成交叉后种群的m个随机顺序序列；例如进行交叉的一个序列组包括两个随机顺序序列P₁=2，3，1，7，5，6，4，8；P₂=3，2，4，8，7，6，1，5；生成的随机整数为3，即所述两个随机顺序序列P₁、P₂从第3位起向左（也可向右）的数值依次进行互换，而互换后P₁的第3位数值4与其第7位的数值4重复冲突，此时则需要对P₁、P₂的第7位数值4，1进行互换，以避免产生同一序列中出现数值重复冲突的问题。Next, perform a crossover operation on m random sequence sequences of the screened population, that is, select individuals to be crossed according to the crossover probability pc, and pair them in pairs. For example, the screened population has 100 random sequence sequences, and the crossover probability pc is 0.8, Then select 0.8*100=80 random sequence sequences and match them into 40 pairs of sequence groups; then for each pair of sequence groups, generate a random integer between 1 and N, respectively, for two of the sequence groups. The values of the random sequence sequence from the random integer position are exchanged, and when the numbers are repeated, that is, conflict, they are exchanged at the end conflict, forming m random sequence sequences of the population after the crossover; for example, a sequence group for crossover includes two random sequences. Sequential sequence P ₁ =2,3,1,7,5,6,4,8; P ₂ =3,2,4,8,7,6,1,5; the generated random integer is 3, that is, the The values of the two random sequence sequences P ₁ and P ₂ from the 3rd bit to the left (or to the right) are exchanged in turn, and after the swap, the value 4 of the third bit of P ₁ is repeated with the value 4 of the seventh bit. In this case, it is necessary to exchange the values 4 and 1 of the seventh digit of P ₁ and P ₂ to avoid the problem of repeated conflict of values in the same sequence.

最后，对交叉后种群m个随机顺序序列进行变异操作：根据变异概率pm挑选需要变异的个体，对选择出的每个随机顺序序列，生成两个1~N之间的随机整数，对此序列的两个随机数对应位置之间的子序列进行随机排序，最终形成最新种群m个随机顺序序列L_i1，L_i2，…，L_im。例如交叉后某个种群有100个随机顺序序列，变异概率pc为0.05，则挑选出0.05*100=5个随机顺序序列进行变异操作，针对这5个随机顺序序列中每一对序列，分别生成两个1~N之间的随机整数，对该序列从该两个随机整数对应位置之间的子序列数值进行随机排序，例如某个序列为2，3，1，7，5，6，4，8，而生成的两个随机整数为3和5，则对该序列中第3-5位的子序列数值（1，7，5）进行随机排序，例如得到变异后的序列为2，3，5，1，7，6，4，8。这样，经过对m个的随机顺序序列L₁₁，L₁₂，…，L_1m进行轮盘筛选、交叉、变异操作后得到m个序列L’₁₁，L’₁₂，…，L’_1m，计算该m个序列L’₁₁，L’₁₂，…，L’_1m的适应值，并得到适应值最大的优选序列F₁，这样就通过遗传算法模型完成了1次迭代计算。Finally, perform mutation operation on m random sequence sequences of the population after crossover: select individuals to be mutated according to the mutation probability pm, and generate two random integers between 1 and N for each random sequence sequence selected. The subsequences between the corresponding positions of the two random numbers are randomly sorted, and finally the newest population m random sequence sequences L _i1 , L _i2 , ..., _Lim are formed. For example, after crossover, there are 100 random sequence sequences in a population, and the mutation probability pc is 0.05, then 0.05*100=5 random sequence sequences are selected for mutation operation, and for each pair of sequences in these 5 random sequence sequences, generate respectively Two random integers between 1 and N, the sequence is randomly sorted from the subsequence values between the corresponding positions of the two random integers, for example, a sequence is 2, 3, 1, 7, 5, 6, 4 , 8, and the two random integers generated are 3 and 5, then the subsequence values (1, 7, 5) of the 3rd-5th position in the sequence are randomly sorted, for example, the mutated sequence is 2, 3 , 5, 1, 7, 6, 4, 8. In this way, after performing roulette screening, crossover and mutation operations on m random sequence sequences L ₁₁ , L ₁₂ , ..., L _1m , m sequences L' ₁₁ , L' ₁₂ , ..., L' _1m are obtained, and the The fitness values of m sequences L' ₁₁ , L' ₁₂ , ..., L' _1m are obtained, and the optimal sequence F ₁ with the largest fitness value is obtained. In this way, one iteration calculation is completed through the genetic algorithm model.

接着，重新生成新的m个的随机顺序序列，继续对所述新的m个的随机顺序序列进行轮盘筛选、交叉、变异操作得到另外m个序列，再计算所述另外m个序列的适应值，并得到适应值最大的优选序列F₂，这样，通过遗传算法模型完成多次迭代计算，直至迭代次数达到最大迭代次数n，则停止迭代，得到n个优选序列的适应值F₁，F₂，…，F_n，选取其中适应值最大的序列作为最优序列，即第二排程顺序序列，所述第二排程顺序序列为在约束条件下适应值最大的排程顺序序列，所述约束条件包括最大迭代次数，种群中个体数量、交叉概率及变异概率。Next, regenerate new m random sequence sequences, continue to perform roulette screening, crossover, and mutation operations on the new m random sequence sequences to obtain additional m sequences, and then calculate the adaptation of the additional m sequences and obtain the optimal sequence F ₂ with the largest fitness value. In this way, the genetic algorithm model is used to complete multiple iterative calculations until the number of iterations reaches the maximum number of iterations n, then the iteration is stopped, and the fitness values F ₁ , F of n optimal sequences are obtained. ₂ _, . The above constraints include the maximum number of iterations, the number of individuals in the population, the probability of crossover and the probability of mutation.

所述步骤S14具体包括：The step S14 specifically includes:

当所述数据规模超过预设阈值时，对所述第一排程顺序序列在约束条件下进行轮盘筛选、交叉操作及变异操作得到多个优选序列，所述约束条件包括最大迭代次数，种群中个体数量、交叉概率及变异概率；When the data size exceeds a preset threshold, perform roulette screening, crossover and mutation operations on the first scheduling sequence under constraints, where the constraints include the maximum number of iterations, the population The number of individuals, the probability of crossover and the probability of mutation;

所述对所述第一排程顺序序列在约束条件下进行轮盘筛选的步骤具体包括：The step of performing roulette screening on the first scheduling sequence under constraints specifically includes:

所述对所述第一排程顺序序列在约束条件下进行交叉操作的步骤具体包括：The step of performing a crossover operation on the first scheduling sequence sequence under constraints specifically includes:

所述对所述第一排程顺序序列在约束条件下进行变异操作的步骤具体包括：The step of performing mutation operation on the first scheduling sequence sequence under constraints specifically includes:

所述对所述第一排程顺序序列在约束条件下进行轮盘筛选、交叉操作及变异操作得到多个优选序列的步骤还包括：The step of performing roulette screening, crossover operation and mutation operation on the first scheduling sequence under constraints to obtain multiple preferred sequences further includes:

重复所述生成第一排程顺序序列以及轮盘筛选、交叉操作及变异操作的步骤得到多个优选序列，直至满足约束条件中的最大迭代次数。The steps of generating the first scheduling sequence, roulette screening, crossover operation and mutation operation are repeated to obtain multiple preferred sequences until the maximum number of iterations in the constraints is satisfied.

步骤S15：确定半导体制造环境的所有状态参数，以及所述第二排程顺序序列的训练动作，遍历各所述状态参数，基于各所述初始策略确定遍历的状态参数对应的所有遍历初始策略；Step S15: Determine all state parameters of the semiconductor manufacturing environment and the training actions of the second scheduling sequence, traverse each of the state parameters, and determine all traversal initial strategies corresponding to the traversed state parameters based on each of the initial strategies;

在具体实现时，请参阅图5至图8，针对运行速度、生产规模、调度方式，提出一个基于强化学习的调度系统，以任意状态下的决策作为调度基础，在保证可行方案和优化效果的同时，形成可实时调度的快速排产方案。在本实施例中，会先提前建立好仿真排产模型（即预设仿真排产模型），并对仿真排产模型所在环境的环境参数进行采集，其采集的环境参数包括设备信息、产品信息、工艺流程和加工时间等，并根据采集的环境参数定义所有状态参数、所有动作参数和标准回报。For the specific implementation, please refer to Figure 5 to Figure 8. Aiming at the operation speed, production scale, and scheduling method, a scheduling system based on reinforcement learning is proposed. The decision in any state is used as the scheduling basis to ensure feasible solutions and optimization effects. At the same time, a rapid production scheduling scheme that can be scheduled in real time is formed. In this embodiment, a simulation production scheduling model (that is, a preset simulation production scheduling model) will be established in advance, and environmental parameters of the environment where the simulation production scheduling model is located will be collected, and the collected environmental parameters include equipment information and product information. , process flow and processing time, etc., and define all state parameters, all action parameters and standard returns according to the collected environmental parameters.

在本实施例中，在确定强化学习过程中产生的目标总策略时，需要先遍历各个状态参数，并在提前设置的各个初始策略中确定当前时刻遍历的状态参数对应的所有遍历初始策略。其中，所有遍历初始策略中均包含有遍历的状态参数，且各个遍历初始策略中的训练动作不相同。In this embodiment, when determining the overall target strategy generated in the reinforcement learning process, each state parameter needs to be traversed first, and all traversal initial strategies corresponding to the traversed state parameters at the current moment are determined in each initial strategy set in advance. Among them, all traversal initial strategies contain traversal state parameters, and the training actions in each traversal initial strategy are different.

需要说明的是，在本实施例中，排产过程中，产品需要在各个缓冲区和机器中游走，走完其工艺流程即为排产结束。而对于排产过程中的机器来说，当该机器对应的缓冲区有产品时，应当选择其中一个产品进行加工。其中，缓冲区是临时放置产品的地方，位于某一机器或同种类型机器之前，机器加工完成后的一个产品加工需要从缓冲区获取。例如，如图4所示，在排产过程中，可用空闲机器在缓冲区中选择一个产品进行加工，并且每一个或多个机器对应有一个或多个缓冲区，如缓冲区1、缓冲区2、缓冲区3等。其中，缓冲区筛选的方式可以如表1所示：It should be noted that, in this embodiment, during the production scheduling process, the product needs to travel in various buffer zones and machines, and the production scheduling is completed after the process flow is completed. For a machine in the production scheduling process, when there are products in the buffer corresponding to the machine, one of the products should be selected for processing. Among them, the buffer zone is a place where products are temporarily placed, before a certain machine or a machine of the same type, and a product processing after machine processing needs to be obtained from the buffer zone. For example, as shown in Figure 4, during the production scheduling process, an idle machine can be used to select a product in the buffer for processing, and each one or more machines corresponds to one or more buffers, such as buffer 1, buffer 2. Buffer 3, etc. Among them, the way of buffer filtering can be shown in Table 1:

表1Table 1

而筛选机器类型的方式可以如表2所示：The way to filter machine types can be shown in Table 2:

表2Table 2

在本实施例中，动作参数的设置可以是机器类型的数量乘以缓冲区的数量，如表1所示，缓冲区筛选方式存在3个动作，筛选机器类型方式存在3个动作，则动作参数的数量可以为9个，即9个动作参数。In this embodiment, the setting of the action parameter can be the number of machine types multiplied by the number of buffers. As shown in Table 1, there are 3 actions in the buffer filtering mode, and there are 3 actions in the filtering machine type mode, then the action parameter The number of can be 9, that is, 9 action parameters.

本实施例中，所述动作参数是以机器选择lot制定一维动作，对缓冲区的lot进行排序，具体为：1、交期顺序；2、客户等级；3、产品离缓冲区的到达时间；4. 产品初始顺序；而回报——以阶段用时（单位：秒）的相反数作为基础，当有lot超期时，减去上一正数（相同个数×100）作为惩罚。In this embodiment, the action parameter is based on the machine selecting lot to formulate a one-dimensional action, and sorting the lot of the buffer, specifically: 1. The order of delivery; 2. The customer level; 3. The arrival time of the product from the buffer ; 4. The initial order of the product; and the reward - based on the inverse number of the stage time (unit: second), when there is a lot overdue, subtract the previous positive number (same number × 100) as a penalty.

步骤S16：运行各所述遍历初始策略中的训练动作，基于各所述训练动作的运行结果确定最优策略，根据各所述状态参数对应的最优策略确定目标总策略，并根据所述目标总策略进行仿真排产，得到第三排程顺序序列；Step S16: Run the training actions in each of the traversal initial strategies, determine an optimal strategy based on the running results of each of the training actions, determine an overall target strategy according to the optimal strategy corresponding to each of the state parameters, and determine the target overall strategy according to the target The overall strategy is simulated and scheduled, and the third scheduling sequence is obtained;

在具体实现时，请参阅图5-图8，在确定所有的遍历初始策略后，会让仿真排产模型依次执行遍历初始策略中的训练动作，并且在运行各个训练动作时，仿真排产模型的状态会保持和遍历的状态参数一致。再根据仿真排产模型对各个训练动作进行训练后，得到的回报值确定各个训练动作中回报效果最好的训练动作，并将其作为最优训练动作，将最优训练动作对应的初始策略作为最优策略，该最优策略包括有遍历的状态参数和遍历的状态参数对应的最优训练动作。然后再确定是否获取到所有状态参数对应的最优策略，若获取到所有状态参数对应的最优策略，则可以确定强化学习训练已完成，并将所有状态参数对应的最优策略作为目标总策略，根据所述目标总策略进行仿真排产，得到第三排程顺序序列。For the specific implementation, please refer to Figure 5-Figure 8. After all the traversal initial strategies are determined, the simulation scheduling model will execute the training actions in the traversal initial strategy in turn, and when running each training action, the simulation scheduling model will be simulated. The state of the traversal will remain the same as the state parameter of the traversal. After training each training action according to the simulated production scheduling model, the reward value obtained determines the training action with the best reward effect among the training actions, and takes it as the optimal training action, and takes the initial strategy corresponding to the optimal training action as the optimal training action. The optimal strategy includes the traversed state parameters and the optimal training action corresponding to the traversed state parameters. Then it is determined whether the optimal strategy corresponding to all state parameters is obtained. If the optimal strategy corresponding to all state parameters is obtained, it can be determined that the reinforcement learning training has been completed, and the optimal strategy corresponding to all state parameters is used as the target total strategy , and simulate production scheduling according to the target general strategy to obtain a third scheduling sequence.

在本实施例中，最优策略的训练需要建立一个Q值表来保存状态S和将会采取的所有动作A，即Q（S，A）。例如，如图8所示，Q值表包括动作A1、动作A2，...，动作An；状态S1，状态S2，...，状态Sn；qn1，...，qnm。如q11=Q（S1，A1），若状态Sn的qn2最大，则确定状态Sn最佳动作为A2。此时，状态Sn对应的最优策略就包括状态Sn和动作A2。In this embodiment, the training of the optimal strategy requires the establishment of a Q-value table to store the state S and all actions A that will be taken, ie Q(S, A). For example, as shown in FIG. 8, the Q value table includes action A1, action A2, . . . , action An; state S1, state S2, . . . , state Sn; qn1, . For example, q11=Q(S1, A1), if the qn2 of the state Sn is the largest, it is determined that the best action of the state Sn is A2. At this time, the optimal strategy corresponding to the state Sn includes the state Sn and the action A2.

在本实施例中，通过遍历各个状态参数，并运行遍历的状态参数对应的所有遍历初始策略中的训练动作，以确定最优策略，再根据各个状态参数对应的最优策略确定目标总策略，从而保障了获取到的目标总策略的有效性。In this embodiment, by traversing each state parameter, and running all the training actions in the traversal initial strategy corresponding to the traversed state parameter, the optimal strategy is determined, and then the overall target strategy is determined according to the optimal strategy corresponding to each state parameter, Thus, the effectiveness of the obtained target overall strategy is guaranteed.

当检测发现仿真排产模型已执行完成目标训练动作，即此时目标训练动作已运行完成，则会产生对应的回报。并且在本实施例中，仿真排产模型每执行完一次动作，都会产生一个与之对应的回报。因此在得到目标训练动作对应的回报后，可以根据此回报确定基于此遍历的状态参数的排产学习过程是否已完成，若未完成，则会在所有动作参数中选择一个新的动作参数作为新的训练动作继续执行，直至确定基于此遍历的状态参数的排产学习过程已完成。并且在目标训练动作运行完成后，仿真排产模型的状态会由未排产状态参数转换为遍历的状态参数。若确定遍历的状态参数对应的排产学习过程已完成，则需要所有的状态参数对应的排产学习过程是否已完成，若未完成，则需要继续执行未完成的状态参数对应的排产学习过程，并确定强化学习训练未完成。若所有的状态参数对应的排产学习过程均已完成，则确定强化学习训练已完成。其中确定遍历的状态参数对应的排产学习过程是否已完成的方式，可以是确定遍历的状态参数对应的最佳动作参数，即确定各个动作参数对应的回报，并在各个回报中选择效果最好（即将获取到的回报和提前设置的标准回报进行对比，以确定效果最好的回报），回报值最大的回报，再将此回报对应的动作参数作为最佳动作参数，并在确定好遍历的状态参数对应的最佳动作参数后，就可以确定遍历的状态参数对应的排产学习过程已完成。此时就可以确定强化学习训练过程中每个状态参数对应的最优动作参数，并将每一组状态参数和该组状态参数对应的最优动作参数作为一组最优策略，再将所有的状态参数对应的最优策略作为目标总策略。其中，回报可以包括用时、交期、设备利用率、机器设备的切换时间和配方切换等中的一种或多种。When it is detected that the simulated production scheduling model has completed the target training action, that is, the target training action has been completed, a corresponding reward will be generated. And in this embodiment, each time the simulated production scheduling model performs an action, a corresponding reward will be generated. Therefore, after the reward corresponding to the target training action is obtained, it can be determined whether the production scheduling learning process based on the traversed state parameters has been completed according to the reward. If not, a new action parameter will be selected as the new action parameter The training action continues to be executed until it is determined that the scheduling learning process based on this traversed state parameter has been completed. And after the target training action is completed, the state of the simulation scheduling model will be converted from the unscheduled state parameter to the traversed state parameter. If it is determined that the production scheduling learning process corresponding to the traversed state parameters has been completed, it is necessary to determine whether the production scheduling learning process corresponding to all the state parameters has been completed. , and determine that the reinforcement learning training is not complete. If the production scheduling learning process corresponding to all the state parameters has been completed, it is determined that the reinforcement learning training has been completed. The method of determining whether the production scheduling learning process corresponding to the traversed state parameters has been completed may be to determine the best action parameters corresponding to the traversed state parameters, that is, determine the rewards corresponding to each action parameter, and select the best effect among the rewards (Compare the reward to be obtained with the standard reward set in advance to determine the reward with the best effect), the reward with the largest reward value, and then use the action parameter corresponding to this reward as the best action parameter, and determine the traversed After the optimal action parameters corresponding to the state parameters are obtained, it can be determined that the production scheduling learning process corresponding to the traversed state parameters has been completed. At this point, the optimal action parameters corresponding to each state parameter in the reinforcement learning training process can be determined, and each group of state parameters and the optimal action parameters corresponding to the group of state parameters can be used as a group of optimal strategies, and then all The optimal strategy corresponding to the state parameters is used as the target total strategy. The return may include one or more of time, delivery, equipment utilization, switching time of machinery and equipment, and recipe switching.

当获取到目标总策略后，可以根据目标总策略进行实际的排产操作，并在排产操作完成后，会输出排产操作的排产结果。而强化学习训练和实际的排产操作构建仿真排产模型并采集环境参数，以确定状态参数、动作参数和标准回报，并根据各个状态参数和各个动作参数构建初始策略。然后再对仿真排产模型执行强化学习训练的学习过程，即对仿真排产模型中的状态进行初始化数据，将仿真排产模型中的状态转换为未排产状态。再遍历各个状态参数，并确定遍历的状态参数对应的初始策略，再根据初始策略获取状态（即遍历的状态参数）对应的动作，并执行动作，确定是否排产（即排产学习）完成，若否，则在保持当前遍历的状态不变的情况下，获取新的动作，并继续执行。若是，即排产完成，则需要确定是否结束训练（即强化学习训练），若否，则需要继续对其他的状态参数进行排产操作，即更新策略。若是，即结束训练，则输出目标总策略，并结束强化学习训练过程。然后再进行实际排产的排产操作。并且在开始进行排产操作的排产过程中，先获取所有数据，并根据目标总策略获取状态对应的动作，执行动作，在动作执行完成后，确定是否排产完成。若否（即排产未完成），则继续获取新的动作执行。若是（即排产完成），则输出排产结果，直至结束。After the target general strategy is obtained, the actual production scheduling operation can be performed according to the target general strategy, and after the production scheduling operation is completed, the production scheduling result of the production scheduling operation will be output. Reinforcement learning training and actual production scheduling operations build a simulated production scheduling model and collect environmental parameters to determine state parameters, action parameters and standard returns, and construct initial strategies based on each state parameter and each action parameter. Then, the learning process of reinforcement learning training is performed on the simulated production scheduling model, that is, initialized data is performed on the state in the simulated production scheduling model, and the state in the simulated production scheduling model is converted into an unscheduled state. Then traverse each state parameter, and determine the initial strategy corresponding to the traversed state parameter, and then obtain the action corresponding to the state (that is, the traversed state parameter) according to the initial strategy, and execute the action to determine whether the production scheduling (ie, production scheduling learning) is completed, If not, in the case of keeping the current traversed state unchanged, acquire a new action and continue to execute it. If so, that is, the production scheduling is completed, it is necessary to determine whether to end the training (that is, the reinforcement learning training). If so, that is, end the training, output the target general strategy, and end the reinforcement learning training process. Then perform the actual production scheduling operation. And in the production scheduling process that starts the production scheduling operation, first obtain all the data, and obtain the actions corresponding to the status according to the target general strategy, execute the actions, and determine whether the production scheduling is completed after the action execution is completed. If not (that is, the scheduling is not completed), continue to obtain new action execution. If yes (that is, the production scheduling is completed), the production scheduling result will be output until the end.

例如，以某半导体加工实时排产项目为例，若在该排产项目中只需要确定光刻区加工的情况，则可以将真实环境抽象为一个入口、一个出口、一个缓冲区和各个加工机器，并将状态（即状态参数）设计为各个机器等待加工产品数+各个机器正在加工产品数+缓冲去产品数量+缓冲去正在运往机器的数量+入口产品数+出口完成加工产品数。状态表现为一个向量或一个数组，向量的每一个元素代表某一指定形式的数量，如缓冲区1产品数，机器1产品数，...，缓冲区n产品数，机器n产品数，入口产品数和已完成产品数。而动作（即动作参数）可以设计为“缓冲区Lot选择”+“机器选择”的多动作。其中Lot选择可以是优先级最高、先进先出和可选机台最少。机器选择可以是加工时长最短和机器空闲时间最久。回报可以设置为以阶段用时的相反数作为基础，当机器有相同配方连续加工时，加上一个正数作为奖励。并且在确定某个状态对应的最佳车辆时，可以通过深度回归模型进行确定，即所述运行各所述遍历初始策略中的训练动作，基于各所述训练动作的运行结果确定最优策略的步骤包括：For example, taking a semiconductor processing real-time production scheduling project as an example, if only the processing conditions of the lithography area need to be determined in this production scheduling project, the real environment can be abstracted into an entrance, an exit, a buffer zone and various processing machines. , and design the state (that is, the state parameter) as the number of products waiting to be processed by each machine + the number of products being processed by each machine + the number of buffered products + the number of buffered products being shipped to the machine + the number of imported products + the number of finished processed products at export. The state is represented as a vector or an array, each element of the vector represents a quantity in a specified form, such as the number of products in buffer 1, the number of products in machine 1, ..., the number of products in buffer n, the number of products in machine n, the entry The number of products and the number of completed products. The action (ie action parameter) can be designed as a multi-action of "buffer Lot selection" + "machine selection". The Lot selection can be the highest priority, the first in first out and the least optional machines. The machine selection can be the shortest processing time and the longest machine idle time. The reward can be set to be based on the inverse of the stage time, and when the machine has the same recipe for continuous processing, a positive number is added as a reward. And when determining the best vehicle corresponding to a certain state, it can be determined through a deep regression model, that is, the training action in each of the traversal initial strategies is executed, and the optimal strategy is determined based on the running results of each of the training actions. Steps include:

构建多层感知器：Build a multilayer perceptron:

其中，n为层数，为各层神经元数，D（S，A）为状态S和动作A为输入的函数，输出为Q值；Among them, n is the number of layers, which is the number of neurons in each layer, D(S, A) is the function of the state S and the action A as the input, and the output is the Q value;

根据上述多层感知器可以得到最佳策略，即：According to the above multilayer perceptron, the optimal strategy can be obtained, namely:

其中， D（S，a）为状态S和状态S对应的最佳动作a的函数。Among them, D(S, a) is a function of state S and the optimal action a corresponding to state S.

并且需要说明的是，在本实施例中的强化学习过程中，可以如图7所示，各阶段回报是动态变化的，希望通过不同组合使最终的回报总和最大，进一步提升其他指标。即由于在排产流程中存在多个阶段，因此可以确定不同阶段对应的状态和动作，并确定不同阶段对应的回报，再根据状态、动作、回报来归纳与训练策略。而回报可以是阶段用时（基础）+相同配方连续加工（附加），动作可以是缓冲区Lot选择”+“机器选择”的多动作。其中Lot选择可以是优先级最高、先进先出和可选机台最少。机器选择可以是加工时长最短和机器空闲时间最久。并且策略可以是构建多层感知器并利用PPO算法进行深度强化学习得到。It should be noted that, in the reinforcement learning process in this embodiment, as shown in FIG. 7 , the returns at each stage change dynamically, and it is hoped that the final sum of returns can be maximized through different combinations to further improve other indicators. That is, since there are multiple stages in the production scheduling process, the states and actions corresponding to different stages can be determined, and the rewards corresponding to different stages can be determined, and then the strategies can be summarized and trained according to the states, actions, and rewards. The reward can be stage time (basic) + continuous processing of the same recipe (additional), and the action can be a multi-action of "buffer lot selection" + "machine selection". The Lot selection can be the highest priority, first in first out and optional The minimum number of machines. The machine selection can be the shortest processing time and the longest machine idle time. And the strategy can be obtained by constructing a multi-layer perceptron and using the PPO algorithm for deep reinforcement learning.

步骤S17：根据第三排程顺序序列进行排产操作。Step S17: The production scheduling operation is performed according to the third scheduling sequence.

在具体实现时，在得到最优排序的第三排程顺序序列后，即可按照第三排程顺序序列进行排程操作，通过半导体制造排产的快速优化排产，从而实现合理调度人力、设备等资源，帮助工厂合理进行产能分配、提高资源利用率、降低生产时间、平衡产线、降低企业成本等等。In the specific implementation, after obtaining the optimally sorted third scheduling sequence, the scheduling operation can be performed according to the third scheduling sequence. Equipment and other resources can help factories rationally allocate capacity, improve resource utilization, reduce production time, balance production lines, and reduce corporate costs.

另，请参阅,9，本发明还提供一种实现上述半导体制造排产方法的半导体制造排产系统，所述系统包括：In addition, please refer to 9, the present invention also provides a semiconductor manufacturing scheduling system for realizing the above-mentioned semiconductor manufacturing scheduling method, the system comprising:

第一确定单元10，用于根据半导体制造排产的数据规模确定数学优化模型；a first determining unit 10, configured to determine a mathematical optimization model according to the data scale of semiconductor manufacturing scheduling;

第一计算单元20，用于根据所述数学优化模型生成第一排程顺序序列，通过所述数学优化模型在约束条件下对所述第一排程顺序序列进行迭代计算得到第二排程顺序序列，所述第二排程顺序序列为在约束条件下适应值最大的排程顺序序列，所述约束条件包括最大迭代次数；The first computing unit 20 is configured to generate a first scheduling sequence according to the mathematical optimization model, and obtain a second scheduling sequence by iteratively calculating the first scheduling sequence under constraints by using the mathematical optimization model sequence, the second scheduling sequence sequence is the scheduling sequence sequence with the largest fitness value under the constraint condition, and the constraint condition includes the maximum number of iterations;

第二确定单元30，用于确定待加工产品的目标属性优先级；The second determining unit 30 is configured to determine the priority of the target attribute of the product to be processed;

第二计算单元40，用于确定半导体制造环境的所有状态参数，以及所述第二排程顺序序列的训练动作，遍历各所述状态参数，基于各所述初始策略确定遍历的状态参数对应的所有遍历初始策略；以及运行各所述遍历初始策略中的训练动作，基于各所述训练动作的运行结果确定最优策略，根据各所述状态参数对应的最优策略确定目标总策略，并根据所述目标总策略进行仿真排产，得到第三排程顺序序列；The second computing unit 40 is configured to determine all state parameters of the semiconductor manufacturing environment and the training actions of the second scheduling sequence, traverse each of the state parameters, and determine the corresponding state parameters of the traversed state parameters based on each of the initial strategies All traversal initial strategies; and run the training actions in each of the traversed initial strategies, determine the optimal strategy based on the running results of each of the training actions, determine the overall target strategy according to the optimal strategy corresponding to each of the state parameters, and The target general strategy is simulated and scheduled to obtain a third scheduling sequence;

排产单元50，用于根据第三排程顺序序列进行排程操作。The production scheduling unit 50 is configured to perform a scheduling operation according to the third scheduling sequence.

其中，请参阅图10，所述第一计算单元20进一步包括：Wherein, please refer to FIG. 10 , the first computing unit 20 further includes:

生成子单元201，用于当所述数据规模不超过预设阈值时，生成两个随机整数，所述两个随机整数为1~N之间不同的整数，N为所述第一排程顺序序列的长度；The generating subunit 201 is configured to generate two random integers when the data size does not exceed a preset threshold, where the two random integers are different integers between 1 and N, where N is the first scheduling order the length of the sequence;

互换子单元202，用于将所述第一排程顺序序列中对应两个随机整数的序号位置的数值互换，得到第一比较顺序序列；an exchange subunit 202, configured to exchange the numerical values of the sequence number positions corresponding to two random integers in the first scheduling sequence to obtain a first comparison sequence;

迭代子单元203，用于计算并比较所述第一排程顺序序列的适应值与第一比较顺序序列的适应值；在所述第一比较顺序序列的适应值大于所述第一排程顺序序列的适应值时，对所述第一比较顺序序列重复所述数值互换的步骤，直至满足约束条件中最大迭代次数，得到第二排程顺序序列，所述第二排程顺序序列为在约束条件下适应值最大的排程顺序序列。The iterative subunit 203 is configured to calculate and compare the fitness value of the first scheduling sequence with the fitness value of the first comparison sequence; the fitness value of the first comparison sequence is greater than the first scheduling sequence When the fitness value of the sequence is determined, repeat the step of exchanging the numerical values for the first comparison sequence sequence until the maximum number of iterations in the constraint condition is satisfied, and a second scheduling sequence sequence is obtained, and the second scheduling sequence sequence is at: The scheduling sequence sequence with the largest fitness value under the constraints.

另，本发明还提供一种计算机可读存储介质，其上存储有计算机程序，所述程序被处理器执行实现如上所述的半导体制造排产方法。In addition, the present invention also provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement the above-mentioned semiconductor manufacturing scheduling method.

相比于现有技术，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、通过改变传统的人工排产方式，轻松面对复杂high-mix生产环境。运行时间短，针对插单故障等突发事件的处理代价小，使企业生产更加灵活高效。1. By changing the traditional manual production scheduling method, it is easy to face the complex high-mix production environment. The running time is short, and the processing cost for emergencies such as single-insertion failure is small, making the production of enterprises more flexible and efficient.

2、基于真实产能的生产计划，使供应商交货与工厂调度同步，从而降低提前订购生产原材料导致的库存成本、运输成本。2. The production plan based on the real production capacity synchronizes the supplier's delivery with the factory scheduling, thereby reducing the inventory cost and transportation cost caused by ordering production raw materials in advance.

3、通过智能算法的全局搜索能力以及强化学习算法的局部优化技巧的结合，有效合理分配生产产能，提高了资源利用率，降低了生产成本。3. Through the combination of the global search ability of the intelligent algorithm and the local optimization skills of the reinforcement learning algorithm, the production capacity is effectively and reasonably allocated, the resource utilization rate is improved, and the production cost is reduced.

综上，本发明通过智能算法的全局搜索能力以及强化学习算法的局部优化技巧的结合，实现了半导体制造排产的快速目标优化，有效合理分配了生产产能，提高了资源利用率，降低了生产成本。To sum up, the present invention realizes the rapid target optimization of semiconductor manufacturing scheduling through the combination of the global search ability of the intelligent algorithm and the local optimization skill of the reinforcement learning algorithm, effectively and reasonably allocates the production capacity, improves the resource utilization rate, and reduces the production capacity. cost.

这里本发明的描述和应用是说明性的，并非想将本发明的范围限制在上述实施例中。这里所披露的实施例的变形和改变是可能的，对于那些本领域的普通技术人员来说实施例的替换和等效的各种部件是公知的。本领域技术人员应该清楚的是，在不脱离本发明的精神或本质特征的情况下，本发明可以以其它形式、结构、布置、比例，以及用其它组件、材料和部件来实现。在不脱离本发明范围和精神的情况下，可以对这里所披露的实施例进行其它变形和改变。The description and application of the present invention herein is illustrative, and is not intended to limit the scope of the present invention to the above-described embodiments. Variations and variations of the embodiments disclosed herein are possible, and alternative and equivalent various components of the embodiments are known to those of ordinary skill in the art. It should be apparent to those skilled in the art that the present invention may be implemented in other forms, structures, arrangements, proportions, and with other components, materials and components without departing from the spirit or essential characteristics of the invention. Other modifications and changes of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims

1. A production scheduling method for semiconductor manufacturing, characterized in that the method comprises the following steps:

Obtain the data scale of semiconductor manufacturing scheduling, and determine a mathematical optimization model according to the data scale;

A first scheduling sequence is generated according to the mathematical optimization model, and a second scheduling sequence is obtained by iteratively calculating the first scheduling sequence through the mathematical optimization model under constraints. The sequence sequence is a scheduling sequence sequence with the largest fitness value under the constraint condition, and the constraint condition includes the maximum number of iterations;

determining all state parameters of the semiconductor manufacturing environment and the training actions of the second scheduling sequence, traversing each of the state parameters, and determining all traversal initial strategies corresponding to the traversed state parameters based on each of the initial strategies;

Run the training actions in each of the traversal initial strategies, determine the optimal strategy based on the running results of each of the training actions, determine the overall target strategy according to the optimal strategy corresponding to each of the state parameters, and perform the strategy according to the overall target strategy. Simulate production scheduling to obtain the third scheduling sequence;

The scheduling operation is performed according to the third scheduling sequence.

2 . The method for scheduling semiconductor manufacturing according to claim 1 , wherein the data scale includes the number of products to be processed. 3 .

3 . The semiconductor manufacturing scheduling method according to claim 2 , wherein the second scheduling sequence is obtained by iteratively calculating the first scheduling sequence through the mathematical optimization model under constraints. 4 . The steps specifically include:

When the data size does not exceed a preset threshold, two random integers are generated, where the two random integers are different integers between 1 and N, where N is the length of the first scheduling sequence;

Interchange the numerical values of the serial number positions corresponding to the two random integers in the first scheduling sequence to obtain a first comparison sequence;

Calculate and compare the fitness value of the first scheduling sequence sequence with the fitness value of the first comparison sequence sequence; when the fitness value of the first comparison sequence sequence is greater than the fitness value of the first scheduling sequence The first comparison sequence sequence repeats the steps of the value exchange until the maximum number of iterations in the constraint condition is satisfied, and a second sequence sequence sequence is obtained, and the second sequence sequence sequence is the one with the largest adaptation value under the constraint condition. Schedule a sequential sequence.

4 . The semiconductor manufacturing scheduling method according to claim 2 , wherein the second scheduling sequence is obtained by iteratively calculating the first scheduling sequence through the mathematical optimization model under constraints. 5 . The steps specifically include:

When the data size exceeds a preset threshold, perform roulette screening, crossover operation and mutation operation on the first scheduling sequence under constraints, where the constraints include the maximum number of iterations including the maximum number of iterations. The number of iterations, the number of individuals in the population, the probability of crossover and the probability of mutation;

The second scheduling sequence is determined according to the multiple preferred sequences, and the second scheduled sequence is the preferred sequence with the largest fitness value among the multiple preferred sequences.

5. The semiconductor manufacturing scheduling method according to claim 4, wherein the step of performing roulette screening on the first scheduling sequence under constraints specifically comprises:

Calculate the fitness values of a plurality of random sequence sequences in the first scheduling sequence sequence respectively, and construct a normalized interval according to the ratio of the fitness value of each random sequence sequence to the sum of the fitness values of all random sequence sequences. The normalized interval includes multiple sub-intervals corresponding to each fitness value, and the value range of the normalized interval is [0, 1];

generating multiple random numbers that are different between 0 and 1, the number of the random numbers is the same as the number of multiple random sequence sequences;

In the normalized interval, the sub-intervals into which each random number falls are determined, and the corresponding random sequence sequence is selected according to the fitness value of the sub-interval that falls into, and a plurality of selected random sequence sequences are obtained.

6 . The semiconductor manufacturing scheduling method according to claim 5 , wherein the step of performing a cross operation on the first scheduling sequence sequence under a constraint condition specifically comprises: 6 .

Determine the random sequence sequence to be crossed according to the crossover probability and the multiple random sequence sequences after screening, and pair the random sequence sequences to be crossed in pairs to obtain multiple pairs of sequence groups, each pair of sequence groups includes two random sequence sequences sequence;

Generate a random integer between 1 and N, where N is the length of the first scheduling sequence, and exchange the forward or reverse values of the two random sequence sequences in the sequence group from the random integer position , to obtain multiple random sequence sequences after crossover.

7 . The semiconductor manufacturing scheduling method according to claim 6 , wherein the step of performing a mutation operation on the first scheduling sequence sequence under a constraint condition specifically comprises: 8 .

Determine the random sequence sequence to be mutated according to the mutation probability and multiple random sequence sequences after crossover;

Generate two random integers between 1 and N, where N is the length of the first scheduling sequence, and randomly sort the subsequences between the corresponding positions of the two random integers in the random sequence to be mutated, Obtain multiple mutated random sequence sequences;

A preferred sequence is calculated according to the plurality of mutated random sequence sequences, and the preferred sequence is the sequence with the largest fitness value among the plurality of mutated random sequence sequences.

8. The semiconductor manufacturing scheduling method according to claim 1, wherein the training action includes a delivery sequence, a customer level, an arrival time of products from a buffer zone, and an initial sequence of products.

9. A semiconductor manufacturing scheduling system for realizing the semiconductor manufacturing scheduling method according to any one of claims 1-8, wherein the system comprises:

a first determining unit, used for determining a mathematical optimization model according to the data scale of semiconductor manufacturing scheduling;

a first computing unit, configured to generate a first scheduling sequence according to the mathematical optimization model, and obtain a second scheduling sequence by iteratively calculating the first scheduling sequence under constraints by the mathematical optimization model , the second scheduling sequence is a scheduling sequence with a maximum fitness value under a constraint condition, and the constraint condition includes a maximum number of iterations;

The second determining unit is used to determine the priority of the target attribute of the product to be processed;

The second computing unit is configured to determine all state parameters of the semiconductor manufacturing environment and the training actions of the second scheduling sequence, traverse each of the state parameters, and determine the corresponding state parameters of the traversed state parameters based on each of the initial strategies All traversal initial strategies; and run the training actions in each of the traversal initial strategies, determine the optimal strategy based on the running results of each of the training actions, and determine the overall target strategy according to the optimal strategy corresponding to each of the state parameters, and Perform simulated production scheduling according to the target general strategy to obtain a third scheduling sequence;

The scheduling unit is used for scheduling operations according to the third scheduling sequence.

10. A computer-readable storage medium on which a computer program is stored, the program being executed by a processor to implement the semiconductor manufacturing scheduling method according to any one of claims 1-8.