CN111507523A

CN111507523A - An optimization method for cable production scheduling based on reinforcement learning

Info

Publication number: CN111507523A
Application number: CN202010299221.9A
Authority: CN
Inventors: 林剑; 宋洪波; 王周敬
Original assignee: Zhejiang University of Finance and Economics
Current assignee: Zhejiang University of Finance and Economics
Priority date: 2020-04-16
Filing date: 2020-04-16
Publication date: 2020-08-07
Anticipated expiration: 2040-04-16
Also published as: CN111507523B

Abstract

The invention discloses a cable production scheduling optimization method based on reinforcement learning. First, a cable production scheduling optimization model under the constraints of multiple pipelines and complex resources is established. The optimization model aims at minimizing the penalty fee for deadline delay. On the basis, combined with the super-heuristic algorithm framework, the reinforcement learning mechanism is used as the HLH strategy of the super-heuristic algorithm, and according to the characteristics of the cable production scheduling problem, a simple heuristic rule is designed to construct the LLH method set, so as to realize the cable production scheduling problem. The optimization solution of production scheduling problem; the optimization method has low complexity, which can effectively improve the production and management efficiency of the traditional cable industry; it is of great significance for the comprehensive promotion of quality, efficiency, transformation and upgrading of traditional industries.

Description

An optimization method for cable production scheduling based on reinforcement learning

技术领域technical field

本发明涉及一种优化方法，特别涉及一种基于强化学习的线缆生产调度优化方法。The invention relates to an optimization method, in particular to a cable production scheduling optimization method based on reinforcement learning.

背景技术Background technique

随着工业规模的不断提升和社会经济的不断发展，线缆产品已越来越广泛地被应用于建筑、交通、汽车、通信、能源等重要工业领域。据统计，早在2012年，我国电线电缆行业总产值就已超过万亿规模，成为世界上第一大电线电缆生产国。与此同时，电线电缆行业市场竞争形势也日趋激烈，企业需要通过降低库存、提高设备利用率、合理配置人力资源等方式来降低企业生产成本，提升企业生产、管理和服务效率。调度优化是实现企业生产、管理和服务效率提升的关键环节，对企业来讲，合理的生产调度方案不仅可以缩短产品制造周期，而且可以有效提高人员工作效率、设备利用率、减少能源和物质损耗，从而达到节能减排、降低成本和提高经济效益的目的。特别是伴随着敏捷制造思想的形成以及企业敏捷化工程的不断开展，重视准时生产，实现资源的灵活和高效配置以满足企业生产和客户服务需求，已成为生产调度的核心思想。With the continuous improvement of industrial scale and the continuous development of social economy, cable products have been widely used in important industrial fields such as construction, transportation, automobiles, communications, and energy. According to statistics, as early as 2012, the total output value of my country's wire and cable industry has exceeded one trillion, becoming the world's largest wire and cable producer. At the same time, the market competition in the wire and cable industry is becoming increasingly fierce. Enterprises need to reduce their production costs and improve their production, management and service efficiency by reducing inventory, improving equipment utilization, and rationally allocating human resources. Scheduling optimization is the key link to improve the production, management and service efficiency of enterprises. For enterprises, a reasonable production scheduling scheme can not only shorten the product manufacturing cycle, but also effectively improve the work efficiency of personnel, equipment utilization, and reduce energy and material losses. , so as to achieve the purpose of energy saving, emission reduction, cost reduction and economic benefit improvement. Especially with the formation of agile manufacturing ideas and the continuous development of enterprise agile projects, it has become the core idea of production scheduling to attach importance to just-in-time production and realize flexible and efficient allocation of resources to meet the needs of enterprise production and customer service.

由于线缆产品种类型号繁多、生产工艺复杂，因此针对电缆生产调度问题的建模和求解均具有很大挑战性。目前线缆生产企业主要还停留在依赖人工经验进行生产调度的阶段，有关线缆生产调度的文献少之又少。申请号为201810526733.7发明专利名称为《一种多类电缆加工的优化调度方法》，公开了一种多类电缆加工的优化调度方法，用于实现电缆生产加工排程。但是该发明只考虑了所有订单工艺流程均相同的情况，与线缆企业的生产实际存在明显差异。Due to the variety of cable products and complex production processes, the modeling and solution of cable production scheduling problems are very challenging. At present, cable manufacturers are still in the stage of relying on manual experience for production scheduling, and there are very few literatures on cable production scheduling. The application number is 201810526733.7 and the title of the invention patent is "An Optimal Scheduling Method for Multi-Type Cable Processing", which discloses an optimal scheduling method for multi-type cable processing, which is used to realize the cable production and processing schedule. However, this invention only considers the situation that all order processes are the same, which is obviously different from the actual production of cable companies.

此外，超启发式算法作为一种跨领域的问题求解模式，通过一种高层次启发式(High Level Heuristic，HLH)策略管理和操纵一系列低层次启发式(Low LevelHeuristics，LLH)方法，动态地生成最优启发式方法用以求解不同问题，这为解决复杂多样性问题提供了新的途径。但是，超启发式算法存在计算复杂度较高的问题，其中主要一方面原因在于HLH策略本身就需耗费大量时间以寻找最优启发式方法，降低HLH策略的算法复杂度对于提升算法整体性能同样具有重要影响。In addition, as a cross-domain problem solving mode, hyperheuristics manage and manipulate a series of low-level heuristics (LLH) methods through a high-level heuristic (HLH) strategy, dynamically Generating optimal heuristics to solve different problems provides a new way to solve complex and diverse problems. However, the hyperheuristic algorithm has the problem of high computational complexity. One of the main reasons is that the HLH strategy itself needs to spend a lot of time to find the optimal heuristic method. Reducing the algorithm complexity of the HLH strategy can also improve the overall performance of the algorithm. have an important impact.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是提供一种简单实用，优化方法复杂度低，可有效提升传统电缆行业生产与管理效率的基于强化学习的线缆生产调度优化方法。The technical problem to be solved by the present invention is to provide a cable production scheduling optimization method based on reinforcement learning that is simple and practical, the optimization method has low complexity, and can effectively improve the production and management efficiency of the traditional cable industry.

本发明首先建立多流水线和复杂资源约束条件下的线缆生产调度优化模型，优化模型以截止期延期惩罚费用最小化为目标，在此基础上，结合超启发式算法框架，将强化学习机制作为超启发式算法的HLH策略，并针对线缆生产调度问题特点，设计简易启发式规则，用以构建LLH方法集合，从而实现对于线缆生产调度问题的优化求解。The invention first establishes a cable production scheduling optimization model under the condition of multi-pipeline and complex resource constraints. The optimization model aims at minimizing the penalty fee for deadline delay. On this basis, combined with the super-heuristic algorithm framework, the reinforcement learning mechanism is used as the According to the HLH strategy of the super-heuristic algorithm, and according to the characteristics of the cable production scheduling problem, a simple heuristic rule is designed to construct the LLH method set, so as to realize the optimal solution to the cable production scheduling problem.

本发明是通过以下技术方案来实现的：The present invention is achieved through the following technical solutions:

1、一种基于强化学习的线缆生产调度优化方法，该方法包括如下步骤：1. A method for optimizing cable production scheduling based on reinforcement learning, the method comprising the following steps:

步骤1、建立线缆生产调度问题的约束优化数学模型；Step 1. Establish a constrained optimization mathematical model for the cable production scheduling problem;

线缆生产原材料铜棒或铝棒通过拉丝退火、束丝/绞线、挤塑、成缆、挤护套、铠装等工艺环节实现电线电缆生产，其中退火环节主要针对铜棒材质，以增加拉丝后导线的柔韧性。不同工序的设备均需要相应配套模具以实现某种特定型号的线缆生产，在某一工序的某一台机器上，生产不同型号产品需要切换相应模具，且切换模具需花费一定时间，在拉丝退火、束丝/绞线、挤塑、成缆、挤护套等工艺环节结束后都会生成线缆产品。设定线缆生产线中共有m台机器，有N个待生产订单{J₁,J₂,…,J_N}，每个订单J_i(i＝1,2，…,N)根据线缆产品型号的生产工艺要求对应n个工序集合O_i＝{O_i1,O_i2,…,O_in}；一个订单只包含一种线缆产品规格，设定用于工艺环节g(g＝1,2,…,6)生产的机器集合为M_g，G_gh表示工艺环节g上第h个生产规格，Gi_g为订单J_i在工艺环节g上对应的生产规格，G′_gh为生产线缆规格G_gh时相应的可用模具套数；在机器M_k(k＝1,2,…,m)上生产，若需要从订单J_i切换到另一订单J_i′，且J_i和J_i′两个订单对应的线缆规格不同，则所需更换模具的时间为S_ii′k；设定工序O_ij(i＝1,2,…,N；j＝1,2,…,n)的开始时间和完工时间分别为B_ij和C_ij；设定机器k上生产订单J_i的开始时间和完工时间分别为B_i′_k和C_i″_k；以截止期延期惩罚费用最小化为优化目标，合理安排不同作业相应工序的加工设备和时序；线缆生产调度问题目标函数为：The raw material copper rod or aluminum rod for cable production is wire and cable production through wire drawing annealing, wire bundle/stranded wire, extrusion, cabling, sheath extrusion, armoring and other process links. The flexibility of the wire after drawing. Equipment in different processes requires corresponding supporting molds to realize the production of a specific type of cable. On a certain machine in a process, the production of different types of products requires switching corresponding molds, and it takes a certain amount of time to switch molds. Cable products will be produced after the process links such as annealing, stranding/stranding, extrusion, cabling, and sheath extrusion are completed. There are m machines in the cable production line, and there are N orders {J ₁ ,J ₂ ,…,J _N } to be produced, and each order J _i (i=1,2,…,N) is based on the cable product The production process requirements of the model correspond to n process sets O _i ={O _i1 ,O _i2 ,...,O _in }; an order contains only one cable product specification, which is set for the process link g (g=1,2 ,...,6) The set of machines produced is M _g , G _gh represents the h-th production specification in the process link g, Gi _g is the production specification corresponding to the order J _i in the process link g, and G' _gh is the production cable specification The corresponding number of mold sets available at G _gh ; produced on the machine M _k ( _k =1,2,...,m), if it is necessary to switch from order Ji to another order _Ji' , and both _Ji and _Ji' The cable specifications corresponding to each order are different, the time required to replace the mold is S _ii′k ; set the start of the process O _ij (i=1,2,...,N; j=1,2,...,n) The time and completion time are B _ij and C _ij respectively; the start time and completion time of the production order J _i on machine k are set as B _i ′ _k and C _i ″ _{k respectively} ; the optimization goal is to minimize the penalty cost of deadline delay , and reasonably arrange the processing equipment and timing of the corresponding processes of different jobs; the objective function of the cable production scheduling problem is:

其中，D_i为订单J_i对应的交货截止期，C_i为订单J_i的完工时间，w_i为截止期各订单紧急权重因子；Among them, D _i is the delivery deadline corresponding to order J _i , C _i is the completion time of order J _i , and _wi is the urgency weight factor of each order in the deadline;

约束条件如下：The constraints are as follows:

其中，约束(2)给定了同一个订单J_i中后一个工序的开始时间必须要在前一个工序结束后才能开始加工；约束(3)给定了机器k上紧后工序必须要在前一工序结束后才能开始加工，其中考虑了更换模具的时间；约束(5)给定了线缆生产中某一工序上的模具数量限制；本步骤所建立的线缆生产调度模型同时考虑了多型号线缆生产、不同型号模具切换、模具资源约束等情况，更加符合企业线缆生产实际情况。Among them, constraint (2) specifies that the start time of the next process in the same order J _i must be completed after the previous process is completed; constraint (3) specifies that the process after the machine k is tightened must be in the previous process. Processing can only be started after a process is completed, which considers the time to replace the mold; constraint (5) gives the limit of the number of molds in a certain process in cable production; the cable production scheduling model established in this step also considers multiple Model cable production, different types of mold switching, mold resource constraints, etc., are more in line with the actual situation of enterprise cable production.

步骤2、初始化优化算法和强化学习参数；Step 2. Initialize the optimization algorithm and reinforcement learning parameters;

2.1、初始化算法参数：当前迭代次数t，最大迭代次数maxT，周期迭代次数T；2.1. Initialization algorithm parameters: the current number of iterations t, the maximum number of iterations maxT, and the number of periodic iterations T;

2.2、初始化强化学习动作集：构建全局搜索算子集Λ＝{a₁,a₂,…,a_λ}和领域搜索算子集Γ＝{a′₁,a′₂,…,a′_γ}，并将A＝Λ∪Γ作为动作集，其中Λ中算子基于交叉操作，Γ中算子则基于交换操作；2.2. Initialize the reinforcement learning action set: construct the global search operator set Λ={a ₁ ,a ₂ ,…,a _λ } and the domain search operator set Γ={a′ ₁ ,a′ ₂ ,…,a′ _γ }, and use A=Λ∪Γ as the action set, where the Λ operator is based on the crossover operation, and the Γ operator is based on the exchange operation;

2.3、生成初始解：随机生成一个由N个订单对应工序所组成的初始解，即X_t＝Ruffled{O₁,O₂,…,O_N}，Ruffled(·)为随机打乱顺序操作；2.3. Generate initial solution: randomly generate an initial solution consisting of N order corresponding processes, namely X _t =Ruffled{O ₁ ,O ₂ ,...,O _N }, Ruffled( ) is a random shuffling operation;

步骤3、随机选取初始状态s_t以及s_t对应的某一个动作χ_t(χ_t∈A)；Step 3. Randomly select the initial state s _t and a certain action χ _t (χ _t ∈ A) corresponding to the initial state s _t ;

步骤4、将χ_t作为搜索算子应用到X_t，并连续运行T次，每次运行时，采用最小完工时间优先作为标准，生成调度方案，具体步骤如下：Step 4. Apply χ _t as a search operator to X _t , and run it continuously for T times. In each operation, use the minimum completion time priority as the standard to generate a scheduling scheme. The specific steps are as follows:

4.1、遍历所有机器，判断工序O_ij是否可以在机器上加工，若可以，则在满足公式(2)-(6)给定的约束条件基础上，计算每一台机器上工序O_ij的完工时间；4.1. Traverse all machines to determine whether the process O _ij can be processed on the machine. If so, calculate the completion of the process O _ij on each machine on the basis of satisfying the constraints given by formulas (2)-(6). time;

4.2、选取完工时间最小的机器作为O_ij的加工指派机器；4.2. Select the machine with the smallest completion time as the processing assignment machine of _Oij ;

4.3、生成订单在机器上的生产调度方案，并采用公式(1)计算得到目标函数值F(·)；4.3. Generate the production scheduling plan of the order on the machine, and use the formula (1) to calculate the objective function value F(·);

若得到的新解更优，则替换原有解，T次运行结束后按照公式(7)计算得到λ值；If the new solution obtained is better, replace the original solution, and calculate the λ value according to formula (7) after T times of running;

步骤5、根据λ值选择相应状态s_t，即λ∈{s|s＝θ₁,θ₂,θ₃}，其中θ₁＝[0.9,1]，θ₂＝[0.5,0.9)，θ₃＝[0,0.5)为状态空间的区间阈值；Step 5. Select the corresponding state s _t according to the λ value, that is, λ∈{s|s=θ ₁ ,θ ₂ ,θ ₃ }, where θ ₁ =[0.9,1], θ ₂ =[0.5,0.9), θ ₃ = [0, 0.5) is the interval threshold of the state space;

步骤6、生成随机数r(r∈[0,1])，基于公式(8)所计算的强化概率ε得到下一步执行动作χ_t；当r＜ε时，选择状态s_t对应Q值最高的动作；否则，随机选择状态s_t对应某一动作进行操作；Step 6. Generate a random number r (r∈[0,1]), and obtain the next execution action χ _t based on the reinforcement probability ε calculated by formula (8); when r < ε, select the state s _t corresponding to the highest Q value action; otherwise, randomly select state s _t to operate corresponding to a certain action;

公式(8)中，maxT为设定的最大迭代次数；In formula (8), maxT is the maximum number of iterations set;

步骤7、针对当前动作χ_t执行结果对其效用进行评价以引导超启发式算法的搜索方向，定义执行动作χ_t的效用值函数r_t为：Step 7: Evaluate the utility of the execution result of the current action χ _t to guide the search direction of the hyperheuristic algorithm, and define the utility value function r _t of the execution action χ _t as:

根据公式(10)所示学习函数更新χ_t所属动作集中所有动作χ′_t的Q值，并依据状态表达机制确定下一状态；According to the learning function shown in formula (10), update the Q values of all actions χ′ _t in the action set to which χ _t belongs, and determine the next state according to the state expression mechanism;

公式(10)中Q_t(s_t,χ_t)表示第t次迭代时状态s_t对应动作χ_t的Q值，α为学习率，γ为折扣因子，其中γ＝0.8，α采用公式(11)所示方式进行自适应调整；In formula (10), Q _t (s _t , χ _t ) represents the Q value of the state s _t corresponding to the action χ _t in the t-th iteration, α is the learning rate, γ is the discount factor, where γ=0.8, α adopts the formula ( 11) Adaptive adjustment is performed in the manner shown;

步骤8、判断t≤maxT是否成立，若成立转到步骤4继续执行，否则输出最优调度方案及其对应的甘特图。Step 8. Determine whether t≤maxT is established, if so, go to step 4 to continue execution, otherwise output the optimal scheduling scheme and its corresponding Gantt chart.

本发明的有益效果是：可根据线缆企业生产的实际情况，以截止期延期惩罚费用最小化为优化目标，建立了多流水线和复杂资源约束条件下的线缆生产调度模型。在此基础上提出了基于强化学习的超启发式调度优化方法，在超启发式算法框架下，设计了包含具备全局和局部搜索能力的LLH方法集合；在强化学习机制下，将LLH方法集合作为动作集合，动态地选择相应LLH方法进行单解迭代寻优。该方法采用单列编码和单解迭代方案，简单实用，算法复杂度低，可有效提升传统电缆行业生产与管理效率，对于传统产业全面推进提质增效、转型升级具有重要意义。The beneficial effect of the invention is that a cable production scheduling model can be established under multi-pipeline and complex resource constraint conditions according to the actual production situation of the cable enterprise, with the optimization goal of minimizing the penalty fee for deadline delay. On this basis, a hyper-heuristic scheduling optimization method based on reinforcement learning is proposed. Under the framework of hyper-heuristic algorithm, a set of LLH methods with global and local search capabilities is designed; under the reinforcement learning mechanism, the set of LLH methods is used as the Action set, dynamically select the corresponding LLH method for single solution iterative optimization. The method adopts a single-column coding and a single-solution iteration scheme, which is simple and practical, and has low algorithm complexity. It can effectively improve the production and management efficiency of the traditional cable industry, and is of great significance for the comprehensive promotion of quality, efficiency, transformation and upgrading of traditional industries.

附图说明Description of drawings

为了易于说明，本发明由下述的具体实施例及附图作以详细描述。For ease of description, the present invention is described in detail by the following specific embodiments and accompanying drawings.

图1是线缆生产流程示意图。Figure 1 is a schematic diagram of the cable production process.

图2是基于强化学习的超启发式调度优化算法流程图。Figure 2 is a flowchart of the hyperheuristic scheduling optimization algorithm based on reinforcement learning.

图3是调度解甘特图。Figure 3 is a Gantt chart for scheduling solutions.

具体实施方式Detailed ways

下面结合附图对本发明的优选实施例进行详细阐述，以使本发明的优点和特征能更易于被本领域技术人员理解，从而对本发明的保护范围做出更为清楚明确的界定；The preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings, so that the advantages and features of the present invention can be more easily understood by those skilled in the art, so that the protection scope of the present invention can be more clearly defined;

线缆企业生产流程示意图如图1所示，线缆生产原材料铜棒或铝棒通过拉丝退火、束丝/绞线、挤塑、成缆、挤护套、铠装等工艺环节实现电线电缆生产，其中退火环节主要针对铜棒材质，以增加拉丝后导线的柔韧性。不同工序的设备均需要相应配套模具以实现某种特定型号的线缆生产，在某一工序的某一台机器上，生产不同型号产品需要切换相应模具，且切换模具需花费一定时间，在拉丝退火、束丝/绞线、挤塑、成缆、挤护套等工艺环节结束后都会生成线缆产品。在线缆行业，客户订单通常约定产品交货截止期，延期交付会导致违约成本增加。基于上述考虑，实施例以截止期延期惩罚费用最小化为例进行说明。The schematic diagram of the production process of cable companies is shown in Figure 1. The copper rods or aluminum rods, the raw materials for cable production, are produced through wire drawing and annealing, bundling/stranding, extrusion, cabling, sheath extrusion, armoring and other processes to achieve wire and cable production. , the annealing link is mainly for the copper rod material to increase the flexibility of the wire after drawing. Equipment in different processes requires corresponding supporting molds to realize the production of a specific type of cable. On a certain machine in a process, the production of different types of products requires switching corresponding molds, and it takes a certain amount of time to switch molds. Cable products will be produced after the process links such as annealing, stranding/stranding, extrusion, cabling, and sheath extrusion are completed. In the cable industry, customer orders usually stipulate a deadline for product delivery, and delayed delivery will lead to increased default costs. Based on the above considerations, the embodiment is described by taking the minimization of the penalty fee for deadline extension as an example.

步骤1、设定线缆生产线中共有m台机器可用于上述工艺环节生产，有N个待生产订单{J₁,J₂,…,J_N}，每个订单J_i(i＝1,2，…,N)根据其产品型号的生产工艺要求对应n个工序集合O_i＝{O_i1,O_i2,…,O_in}。一个订单只包含一种线缆产品规格，设定用于工艺环节g(g＝1,2,…,6)生产的机器集合为M_g，G_gh表示工艺环节g上第h个生产规格，

为订单J_i在工艺环节g上对应的生产规格，G′_gh为生产线缆规格G_gh时相应的可用模具套数；在机器M_k(k＝1,2,…,m)上生产，若需要从订单J_i切换到另一订单J_i′，且J_i和J_i′两个订单对应的线缆规格不同，则所需更换模具的时间为S_ii′k。此外，设定工序O_ij(i＝1,2,…,N；j＝1,2,…,n)的开始时间和完工时间分别为B_ij和C_ij；设定机器k上生产订单J_i的开始时间和完工时间分别为B′_ik和C″_ik；以截止期延期惩罚费用最小化为优化目标，合理安排不同作业相应工序的加工设备和时序。Step 1. Set a total of m machines in the cable production line that can be used for the production of the above process links. There are N orders to be produced {J ₁ ,J ₂ ,...,J _N }, and each order J _i (i=1,2 ,...,N) corresponds to n process sets O _i ={O _i1 ,O _i2 ,...,O _in } according to the production process requirements of its product model. An order contains only one cable product specification, and the set of machines used for the production of process link g (g=1,2,...,6) is set to be Mg , and G _gh represents the hth production specification on process link _g ,

is the production specification corresponding to the order J _i in the process link g, G′ _gh is the corresponding number of available mold sets when producing the cable specification G _gh ; produced on the machine M _k (k=1,2,...,m), if It is necessary to switch from order J _i to another order J _i′ , and the cable specifications corresponding to the two orders J _i and J _i′ are different, then the time required to replace the mold is S _ii′k . In addition, set the start time and finish time of the process O _ij (i=1,2,...,N; j=1,2,...,n) as B _ij and C _ij respectively; set the production order J on the machine k The start time and finish time of _i are B′ _ik and C″ _ik respectively; the optimization goal is to minimize the penalty cost of deadline delay, and the processing equipment and sequence of the corresponding procedures of different operations are reasonably arranged.

其目标函数为：Its objective function is:

其中，D_i为订单J_i对应的交货截止期，C_i为订单J_i的完工时间，w_i为截止期各订单紧急权重因子。Among them, D _i is the delivery deadline corresponding to the order J _i , C _i is the completion time of the order J _i , and _wi is the urgency weight factor of each order in the deadline.

约束条件如下：The constraints are as follows:

其中，约束(2)给定了同一个订单J_i中后一个工序的开始时间必须要在前一个工序结束后才能开始加工；约束(3)给定了机器k上紧后工序必须要在前一工序结束后才能开始加工，其中考虑了更换模具的时间；约束(5)给定了线缆生产中某一工序上的模具数量限制。Among them, constraint (2) specifies that the start time of the next process in the same order J _i must be completed after the previous process is completed; constraint (3) specifies that the process after the machine k is tightened must be in the previous process. Processing can only be started after a process is completed, which takes into account the time to replace the mold; constraint (5) specifies the limit on the number of molds in a process in cable production.

基于强化学习的超启发式调度优化算法求解线缆生产调度问题的具体应用实例如下：The specific application examples of the super-heuristic scheduling optimization algorithm based on reinforcement learning to solve the cable production scheduling problem are as follows:

给定某线缆生产调度问题实例如表2所示，该实例包含7个订单、34个工序和10台机器，每个订单有对应交货截止期，每个工序有对应生产规格、模具数量限制、生产时间、可用机器设备，不同规格之间切换模具时间如表3所示。A given example of a cable production scheduling problem is shown in Table 2. The example contains 7 orders, 34 processes and 10 machines. Each order has a corresponding delivery deadline, and each process has corresponding production specifications and mold quantities. Limits, production time, available machines, and mold switching time between different specifications are shown in Table 3.

表1线缆生产调度问题实例Table 1 Examples of cable production scheduling problems

表2不同规格之间模具更换时间表Table 2 Timetable for mold replacement between different specifications

G11G11 G12G12 G21G21 G22G22 G31G31 G32G32 G41G41 G42G42 G51G51 G52G52 G61G61 G62G62 G11G11 00 33 -- -- -- -- -- -- -- -- -- -- G12G12 11 00 -- -- -- -- -- -- -- -- -- -- G21G21 -- -- 00 44 -- -- -- -- -- -- -- -- G22G22 -- -- 22 00 -- -- -- -- -- -- -- -- G31G31 -- -- -- -- 00 11 -- -- -- -- -- -- G32G32 -- -- -- -- 22 00 -- -- -- -- -- -- G41G41 -- -- -- -- -- -- 00 33 -- -- -- -- G42G42 -- -- -- -- -- -- 33 00 -- -- -- -- G51G51 -- -- -- -- -- -- -- -- 00 11 -- -- G52G52 -- -- -- -- -- -- -- -- 33 00 -- -- G61G61 -- -- -- -- -- -- -- -- -- -- 00 33 G62G62 -- -- -- -- -- -- -- -- -- -- 66 00

因此，N＝7，m＝10。基于强化学习的超启发式调度优化算法求解线缆生产调度问题的具体步骤如下：Therefore, N=7, m=10. The specific steps of solving the cable production scheduling problem by the hyperheuristic scheduling optimization algorithm based on reinforcement learning are as follows:

步骤2、初始化优化算法和强化学习参数。Step 2. Initialize the optimization algorithm and reinforcement learning parameters.

2.1、初始化算法参数：当前迭代次数t＝1，最大迭代次数maxT＝300，周期迭代次数T＝3，Q值表中所有数据初始化为0；2.1. Initialization algorithm parameters: the current number of iterations t=1, the maximum number of iterations maxT=300, the number of periodic iterations T=3, and all data in the Q value table are initialized to 0;

2.2、初始化强化学习动作集：构建全局搜索算子集Λ＝{a₁,a₂,…,a_λ}和领域搜索算子集Γ＝{a′₁,a′₂,…,a′_γ}，并将A＝Λ∪Γ作为动作集，其中Λ中算子主要基于交叉操作，Γ中算子则主要基于交换操作；2.2. Initialize the reinforcement learning action set: construct the global search operator set Λ={a ₁ ,a ₂ ,…,a _λ } and the domain search operator set Γ={a′ ₁ ,a′ ₂ ,…,a′ _γ }, and use A=Λ∪Γ as the action set, where the Λ operator is mainly based on the crossover operation, and the Γ operator is mainly based on the exchange operation;

2.3、生成初始解：随机生成一个由7个订单对应工序所组成的初始解，即X_t＝Ruffled{O₁,O₂,…,O₇}，Ruffled(·)为随机打乱顺序操作。2.3. Generate initial solution: randomly generate an initial solution composed of 7 orders corresponding to the process, namely X _t =Ruffled{O ₁ ,O ₂ ,...,O ₇ }, Ruffled( ) is a random order shuffling operation.

步骤4、将χ_t作为搜索算子应用到X_t，并连续运行T次，每次运行时，若得到的新解更优，则替换原有解，T次运行结束后按照公式(7)计算得到λ值；Step 4. Apply χ _t as a search operator to X _t , and run it continuously for T times. In each run, if the new solution obtained is better, replace the original solution. After the T times of running, follow formula (7) Calculate the λ value;

步骤5、根据λ值选择相应状态s_t，即λ∈{s|s＝θ₁,θ₂,θ₃}，其中θ₁＝[0.9,1]，θ₂＝[0.5,0.9)，θ₃＝[0,0.5)为状态空间的区间阈值。Step 5. Select the corresponding state s _t according to the λ value, that is, λ∈{s|s=θ ₁ ,θ ₂ ,θ ₃ }, where θ ₁ =[0.9,1], θ ₂ =[0.5,0.9), θ ₃ = [0, 0.5) is the interval threshold of the state space.

步骤6、生成随机数r(r∈[0,1])，基于公式(8)所计算的强化概率ε得到下一步执行动作χ_t。当r＜ε时，选择状态s_t对应Q值最高的动作；否则，随机选择状态s_t对应某一动作进行操作。Step 6: Generate a random number r (r∈[0,1]), and obtain the next execution action χ _t based on the reinforcement probability ε calculated by the formula (8). When r<ε, select the state _st corresponding to the action with the highest Q value; otherwise, randomly select the state _st to operate corresponding to a certain action.

公式(8)中，maxT为设定的最大迭代次数。In formula (8), maxT is the set maximum number of iterations.

步骤7、针对当前动作χ_t执行结果对其效用进行评价以引导超启发式算法的搜索方向，本发明定义执行动作χ_t的效用值函数r_t为：Step 7. Evaluate its utility for the execution result of the current action χ _t to guide the search direction of the hyperheuristic algorithm, the present invention defines the utility value function r _t of the execution action χ _t as:

在此基础上根据公式(10)所示学习函数更新χ_t所属动作集中所有动作χ′_t的Q值，并依据状态表达机制确定下一状态。On this basis, the Q values of all actions χ' _t in the action set to which χ _t belongs are updated according to the learning function shown in formula (10), and the next state is determined according to the state expression mechanism.

公式(10)中Q_t(s_t,χ_t)表示第t次迭代时状态s_t对应动作χ_t的Q值，α为学习率，γ为折扣因子，其中γ＝0.8，α采用公式(11)所示方式进行自适应调整。In formula (10), Q _t (s _t , χ _t ) represents the Q value of the state s _t corresponding to the action χ _t in the t-th iteration, α is the learning rate, γ is the discount factor, where γ=0.8, α adopts the formula ( 11) Adaptive adjustment is performed in the manner shown.

步骤8、判断t≤maxT是否成立，如成立转到步骤4继续执行，否则输出最优调度解X_best。本实施例得到的目标函数值为39，对应的甘特图，结果如图3所示，其中A所示区间为模具更换时间。Step 8. Determine whether t≤maxT is established, if so, go to step 4 to continue execution, otherwise output the optimal scheduling solution X _best . The objective function value obtained in this example is 39, corresponding to the Gantt chart, and the result is shown in Figure 3, where the interval indicated by A is the mold replacement time.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何不经过创造性劳动想到的变化或替换，都应涵盖在本发明的保护范围之内；因此，本发明的保护范围应该以权利要求书所限定的保护范围为准。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this, and any changes or substitutions that are not thought of without creative work should be covered within the protection scope of the present invention; therefore, The protection scope of the present invention should be based on the protection scope defined by the claims.

Claims

1. a cable production scheduling optimization method based on reinforcement learning, is characterized in that, this method comprises the steps:

Step 1. Establish a constrained optimization mathematical model for the cable production scheduling problem;

There are m machines in the cable production line, and there are N orders {J ₁ ,J ₂ ,…,J _N } to be produced, and each order J _i (i=1,2,…,N) is based on the cable product The production process requirements of the model correspond to n process sets O _i ={O _i1 ,O _i2 ,...,O _in }; an order contains only one cable product specification, which is set for the process link g (g=1,2 ,...,6) The set of machines produced is M _g , where G _gh represents the h-th production specification on the process link g,

is the production specification corresponding to the order J _i in the process link g, G′ _gh is the corresponding number of available mold sets when producing the cable specification G _gh ; produced on the machine M _k (k=1,2,...,m), if It is necessary to switch from order J _i to another order J _i′ , and the cable specifications corresponding to the two orders J _i and J _i′ are different, then the time required to replace the mold is S _ii′k ; set the process O _ij ( i=1,2,...,N; j=1,2,...,n) start time and finish time are B _ij and C _ij respectively; set the start time and finish time of production order J _i on machine k, respectively are B′ _ik and C′ _i′k ; take the minimization of deadline delay penalty costs as the optimization goal, reasonably arrange the processing equipment and timing of the corresponding procedures of different jobs; the objective function of the cable production scheduling problem is:

Among them, D _i is the delivery deadline corresponding to order J _i , C _i is the completion time of order J _i , and _wi is the urgency weight factor of each order in the deadline;

The constraints are as follows:

Among them, constraint (2) specifies that the start time of the next process in the same order J _i must be completed after the previous process is completed; constraint (3) specifies that the process after the machine k is tightened must be in the previous process. Processing can only start after a process is completed;

Step 2. Initialize the optimization algorithm and reinforcement learning parameters;

2.1. Initialization algorithm parameters: the current number of iterations t, the maximum number of iterations maxT, and the number of periodic iterations T;

2.2. Generate initial solution: randomly generate an initial solution consisting of N order corresponding processes, namely X _t =Ruffled{O ₁ ,O ₂ ,...,O _N }, Ruffled( ) is a random shuffling operation;

Step 3. Randomly select the initial state s _t and a certain action χ _t (χ _t ∈ A) corresponding to the initial state s _t ;

Step 4. Apply χ _t as a search operator to X _t , and run it continuously for T times. In each operation, use the minimum completion time priority as the standard to generate a scheduling scheme,

If the new solution obtained is better, replace the original solution, and calculate the λ value according to formula (7) after T times of running;

Step 5. Select the corresponding state s _t according to the λ value, that is, λ∈{s|s=θ ₁ ,θ ₂ ,θ ₃ }, where θ ₁ =[0.9,1], θ ₂ =[0.5,0.9), θ ₃ = [0, 0.5) is the interval threshold of the state space;

Step 6. Generate a random number r (r∈[0,1]), and obtain the next execution action χ _t based on the reinforcement probability ε calculated by formula (8); when r < ε, select the state s _t corresponding to the highest Q value action; otherwise, randomly select state s _t to operate corresponding to a certain action;

In formula (8), maxT is the maximum number of iterations set;

Step 7: Evaluate the utility of the execution result of the current action χ _t to guide the search direction of the hyperheuristic algorithm, and define the utility value function r _t of the execution action χ _t as:

According to the learning function shown in formula (10), update the Q values of all actions χ′ _t in the action set to which χ _t belongs, and determine the next state according to the state expression mechanism;

In formula (10), Q _t (s _t , χ _t ) represents the Q value of the state s _t corresponding to the action χ _t in the t-th iteration, α is the learning rate, γ is the discount factor, where γ=0.8, α adopts the formula ( 11) Adaptive adjustment is performed in the manner shown;

Step 8. Determine whether t≤maxT is established, if so, go to step 4 to continue execution, otherwise output the optimal scheduling scheme and its corresponding Gantt chart.

2. The cable production scheduling optimization method according to claim 1, wherein a step is added after step 2.1 and before step 2.2, the step is to initialize the reinforcement learning action set: constructing a global search operator set Λ={ a ₁ ,a ₂ ,…,a _λ } and domain search operator subset Γ={a′ ₁ ,a′ ₂ ,…,a′ _γ }, and take A=Λ∪Γ as the action set, where Λ calculates The neutron is based on the crossover operation, and the Γ neutron is based on the exchange operation.

3. The cable production scheduling optimization method according to claim 1, wherein the specific steps of generating a scheduling scheme described in step 4 are as follows:

4.1. Traverse all machines to determine whether the process O _ij can be processed on the machine. If so, calculate the completion of the process O _ij on each machine on the basis of satisfying the constraints given by formulas (2)-(6). time;

4.2. Select the machine with the smallest completion time as the processing assignment machine of _Oij ;

4.3. Generate the production scheduling plan of the order on the machine, and use the formula (1) to calculate the objective function value F(·).

4. The cable production scheduling optimization method according to claim 3, characterized in that: in step 4.2, if there are different machines with the same minimum completion time, the processing and assigning machines are randomly selected among them.

5. The cable production scheduling optimization method according to claim 1, characterized in that: in step 1, the constraint (3) considers the time to replace the mold; The number of molds is limited.