[go: up one dir, main page]

CN113792924A - A single job shop scheduling method based on Deep Q-network deep reinforcement learning - Google Patents

A single job shop scheduling method based on Deep Q-network deep reinforcement learning Download PDF

Info

Publication number
CN113792924A
CN113792924A CN202111084479.8A CN202111084479A CN113792924A CN 113792924 A CN113792924 A CN 113792924A CN 202111084479 A CN202111084479 A CN 202111084479A CN 113792924 A CN113792924 A CN 113792924A
Authority
CN
China
Prior art keywords
scheduling
deep
action
value function
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111084479.8A
Other languages
Chinese (zh)
Other versions
CN113792924B (en
Inventor
乔东平
段绿旗
王雅静
肖艳秋
文笑雨
李�浩
罗国富
李立伟
孙春亚
张玉彦
王昊琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN202111084479.8A priority Critical patent/CN113792924B/en
Publication of CN113792924A publication Critical patent/CN113792924A/en
Application granted granted Critical
Publication of CN113792924B publication Critical patent/CN113792924B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Manufacturing & Machinery (AREA)
  • Primary Health Care (AREA)
  • General Factory Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明属于生产计划技术领域,具体涉及一种基于Deep Q‑network深度强化学习的单件作业车间调度方法,包括以下步骤:(1)采用析取图的方法对作业车间调度环境进行建模,将调度决策问题转换为序贯决策问题,建立马尔科夫五元组模型,使用深度强化学习对该模型进行求解;(2)从作业车间调度的析取图环境中提取当前的状态;(3)采用卷积神经网络对动作值函数和目标值函数进行拟合;(4)采用18种启发式调度规则,作为强化学习的代理动作;(5)设计奖励函数对整个调度决策进行评估,使用DQN算法更新动作值函数的权重参数;(6)进行状态转移;(7)目标值函数的网络参数更新;本发明可以快速处理作业车间的调度问题,具有实时性强和灵活性高的优点。

Figure 202111084479

The invention belongs to the technical field of production planning, and in particular relates to a single-piece job shop scheduling method based on Deep Q-network deep reinforcement learning. Convert the scheduling decision problem into a sequential decision problem, build a Markov quintuple model, and use deep reinforcement learning to solve the model; (2) Extract the current state from the disjunctive graph environment of job shop scheduling; (3) ) using a convolutional neural network to fit the action value function and the target value function; (4) using 18 heuristic scheduling rules as proxy actions for reinforcement learning; (5) designing a reward function to evaluate the entire scheduling decision, using The DQN algorithm updates the weight parameters of the action value function; (6) performs state transition; (7) updates the network parameters of the objective value function; the invention can quickly deal with the scheduling problem of the job shop, and has the advantages of strong real-time performance and high flexibility.

Figure 202111084479

Description

Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network
Technical Field
The invention belongs to the technical field of production plans, is used for optimizing the production plan of a job shop, and particularly relates to a scheduling method of a single job shop based on Deep reinforcement learning of Deep Q-network (DQN).
Background
Under the intelligent manufacturing environment, "sensing, interconnection and intelligence of everything" becomes a new target for the development of the manufacturing industry nowadays. A workshop generates a large amount of data in the actual production process, the data contain a large amount of scheduling knowledge, and the traditional meta-heuristic algorithm cannot effectively utilize the data, so that the data are wasted. Under the modern manufacturing mode of intelligent and informatization rapid development, the research of an intelligent scheduling algorithm with self-perception, data analysis and self-decision has important significance.
The traditional scheduling algorithm mainly comprises a heuristic rule algorithm and a meta-heuristic scheduling algorithm. Heuristic rules are approximate scheduling algorithms that have been widely used in practical applications because of their advantages such as insensitivity to NP characteristics and good real-time performance, but research has shown that the performance of a single assignment rule in the global context depends on the characteristics of the system, operating condition parameters and production goals, and that previously valid scheduling rules may not be applicable when the state of the manufacturing system changes. The meta-heuristic algorithm carries out iterative optimization through a certain evolution or moving mechanism and forms a brand-new scheduling scheme, and the meta-heuristic algorithm has the advantages of high solving precision, capability of obtaining a better rescheduling scheme, and the disadvantage that the searching time is long, the requirement of real-time performance cannot be met, and the meta-heuristic algorithm is not suitable for being used under the condition that dynamic events occur frequently.
In the production process, the scheduling problem (JSP) of a single job shop needs to be processed, but the existing job shop scheduling algorithm cannot rapidly process the dynamic problem in the job shop, and the practicability is poor. Therefore, how to overcome the above technical problems and disadvantages is a problem to be solved.
Disclosure of Invention
The invention aims to provide a Deep reinforcement learning single-target single-piece job shop real-time scheduling method based on Deep Q-network (DQN).
The purpose of the invention is: the method realizes the analysis graph modeling of the single job shop scheduling problem, the scheduling decision based on the convolutional neural network and the scheduling decision optimization based on the DQN deep reinforcement learning algorithm, thereby optimizing the performance index of minimizing the maximum completion time in the job shop scheduling.
In order to achieve the aim, the technical scheme of the invention is to provide a single job shop scheduling method based on Deep reinforcement learning of Deep Q-network, which comprises the following steps:
(1) modeling a job shop scheduling environment by adopting a method of drawing, converting a scheduling decision problem into a sequential decision problem, establishing a Markov (MDP) quintuple epsilon { S, A, S', gamma and R } model, and solving the model by using deep reinforcement learning according to the minimum maximum completion time as a performance evaluation index of the model;
(2) extracting a current state S belonging to S from an extraction diagram environment of job shop scheduling;
the state s is formed by a three-channel matrix, and the three matrices are a processing time matrix, a machine matrix and a procedure processing starting time matrix respectively;
(3) constructing an action value function Q (s, a) and a target value function
Figure BDA0003264953110000021
Fitting the action value function Q (s, a; theta) and the target value function by using a convolutional neural network
Figure BDA0003264953110000022
And initializing a function of two values Q (s, a; theta)0)、
Figure BDA0003264953110000023
Wherein theta is a set of internal weights w and bias values b of the convolutional neural network of the action value function, and theta-Convolving the neural network internal weights w for the target value function-Sum bias value b-A set of (a);
(4) adopting 18 heuristic scheduling rules as an agent action a for reinforcement learning, wherein a belongs to A;
establishing a task set of each machine according to a workpiece to be processed, inputting a global state into a convolutional neural network, converting the output of the convolutional neural network into the probability of each heuristic rule by using a softmax function, and when action selection is performed according to the probability P, in order to ensure that a scheduling strategy can be converged and has the capability of jumping out of a local optimal solution, adding uncertainty into the current decision to design a work selection mechanism, and combining the action a which is the action of selecting the maximum probability a max (P) with the action selection a random (P) according to uniform probability distribution;
the action selection mechanism comprises a manually set hyper-parameter beta and a randomly generated natural number d, the d belongs to (0,1), when the d is less than or equal to the hyper-parameter beta, a heuristic scheduling rule with the maximum probability is selected, when the d is greater than the hyper-parameter beta, the heuristic scheduling rule is selected according to uniform probability distribution, workpieces are selected from a task set according to the selected rule and are arranged on a corresponding machine for processing, and the action selection mechanism comprises the following steps:
Figure BDA0003264953110000024
(5) designing a reward function R (t) to evaluate the whole scheduling decision, and updating the weight parameter theta of the action value function by using a DQN algorithm to realize the updating of the scheduling decision;
in the aspect of reward function design, a single performance index of the minimum maximum completion time is mainly considered, the minimum maximum completion times obtained by scheduling are different due to different scheduling problems, and in order to evaluate the instant behavior in a unified manner, the optimization of the minimum maximum completion time is converted into the optimization of the machine utilization rate through the following formula:
Figure BDA0003264953110000025
Figure BDA0003264953110000031
in the formula, MkIs a k machine, CkAs a machine MkTime of end of working, Pi,jFor working work JiThe time of the j-th step;
defining the instant prize as:
r(t)=Uk(t)-Uk(t-1)
the expected target value function is calculated as:
Figure BDA0003264953110000032
updating the parameter theta of the action value function by using a mode of minimizing random gradient descent of the square loss function, wherein the calculation formula of the loss function is as follows:
LDQN=(Yt DQN-Q(s,a;θt))2
the update formula of the parameter θ is:
Figure BDA0003264953110000033
(6) carrying out state transition P (s '| s, a), and storing the state s, the action a, the reward r and the state s' of the next step into a memory pool;
executing the action a by the state s, updating the workshop environment, extracting the state s ', and sequentially storing the state s, the action a, the reward r and the next state s' in a memory pool;
(7) updating network parameters of the target value function;
when the memory pool is full, the agent starts to update the network parameters of the target value function by using the historical data, and randomly extracts the historical data of the batch _ size bar from the memory pool at intervals of C steps to perform the target value function
Figure BDA0003264953110000034
Parameter theta of-The updated formula is as follows:
Figure BDA0003264953110000035
further, in the step (1), the state S is obtained from the environment, the state is input into the agent, the action a currently executed by the agent is output, the global state S is changed, the reward R is obtained, the process is repeated until all the machines complete the processing tasks, and the environment of job shop scheduling is established according to the disjunctive graph.
Further, in the step (1), the disjunctive graph model is represented by a set G (N; C, E) of triples;
wherein, N is all process node sets, including virtual start node start and end node end;
c is a directed connection directed arc set between adjacent working procedures determined by the process under the same part;
e represents an extracted arc set between processes that can be processed on the same machine tool.
Further, in the step (2), the processing time matrix is a matrix formed by the processing time of each process, and each element in the matrix is normalized by 0-1 according to the maximum processing time.
Further, in the step (2), the machine matrix is a matrix composed of machine numbers of the machining processes.
Further, in the step (2), the process starting time matrix is a matrix formed by the starting time of each process, and each element is normalized by 0 to 1 according to the current maximum starting time and initialized to zero.
Further, in the step (2), the machining time matrix, the machine matrix, and the process machining start time matrix are all n × m in size, n is the number of workpieces, and m is the number of processes.
Further, in the step (3), the convolutional neural network comprises two convolutional layers and two full-connected layers, and the input size is 3 × n × m;
wherein, the first layer convolution layer contains 9 convolution kernels, the size of the convolution kernels is 3 multiplied by 3, the step length is 1, and the edge filling is 2;
the second convolution layer contains 18 convolution kernels, the size of the convolution kernels is 3 x 3, the step size is 1, and the edge filling is 2;
the third layer is a full connection layer, the number of input nodes is 18 x (n +4) (m +4), and the number of output nodes is 18 x (n + 4);
the fourth layer is a full connection layer, the number of input nodes is 18 (n +4), and the number of output nodes is 18.
The invention has the beneficial effects that:
1. the invention considers the performance index of the minimized maximum completion time in the job shop scheduling, utilizes the disjunctive graph to convert the scheduling decision problem into the sequential decision problem, establishes the job shop scheduling model of MDP, utilizes the DQN algorithm to solve the model, can obtain better scheduling decision arrangement through training, and can quickly solve the new job shop scheduling problem with the same scale by the trained intelligent agent.
2. The invention is based on Deep reinforcement learning algorithm of Deep Q-network (DQN) to solve the scheduling problem of related single job workshops, and the algorithm selects heuristic scheduling rules according to the input state characteristics, thereby being more in line with the scheduling decision process of the actual order response type production manufacturing system.
3. The invention uses a convolution neural network model to fit an action value function and a target value function, inputs the characteristic data of the processing state of the manufacturing system into the model, combines a reinforced learning reward and punishment mechanism, and selects an optimal combination heuristic rule as a strategy for each scheduling decision.
4. The Deep reinforcement learning algorithm of Deep Q-network (DQN) provided by the invention has the advantages of strong real-time performance and high flexibility.
Drawings
FIG. 1 is a flowchart of a Deep reinforcement learning method for scheduling a single-target single-component job shop in real time based on Deep Q-network (DQN) according to the present invention.
Fig. 2 is an extracted graph of ft 06.
FIG. 3 is a transition diagram of the scheduling state of the present invention.
Fig. 4 is a graph of training data for ft 06.
Fig. 5 is a gantt chart for ft 06.
Detailed Description
The invention will be described in more detail below by means of specific embodiments with reference to the attached drawings.
The invention relates to a Deep reinforcement learning job shop scheduling method based on Deep Q-network (DQN), the flow of which is shown in figure 1, and the method comprises the following steps:
(1) taking an ft06 reference case as a concrete implementation description, taking the minimized maximum completion time as a performance evaluation index, wherein the information of the ft06 reference case is shown in table 1, and an analytic graph is adopted to establish a scheduling environment of the reference case, as shown in fig. 2;
TABLE 1 ft06 Standard example
Figure BDA0003264953110000051
(2) Extracting state features from the disjunctive graph as input of a convolutional neural network;
principle of state design: from the perspective of a task target, the selection of state characteristics is related to a scheduling target, otherwise, characteristic redundancy is caused, the state characteristics need to describe main characteristics and changes of a scheduling environment, including global information and local information, and the state characteristics need to have certain compatibility in the face of different scheduling problems, the state characteristics are numerical representation of state attributes, the state attributes are easy to calculate and normalize, and the scale uniformity is ensured; extracting states from a job shop environment to obtain three state matrixes, namely a processing time matrix, a machine matrix and a procedure starting time matrix;
the processing time matrix is a matrix formed by the processing time of each procedure, the matrix is normalized by 0-1, and if the procedure is allocated to one machine, the value is zero;
the machine matrix is a matrix consisting of the machine numbers of the machining processes, while changing to zero if the process has been assigned to a machine;
the procedure processing starting time matrix is a matrix formed by the starting time of each procedure processing, and is normalized by 0-1 and initialized to zero;
the sizes of the processing time matrix, the machine matrix and the procedure processing starting time matrix are n multiplied by m, n is the number of workpieces, m is the number of procedures, and ft06 is changed from an initial state s0 to a state s3 as shown in FIG. 3;
(3) constructing an action value function Q (s, a) and a target value function
Figure BDA0003264953110000061
Fitting the action value function Q (s, a; theta) and the target value function by using a convolutional neural network
Figure BDA0003264953110000062
And initializes two value functions Q (s, a;θ0)、
Figure BDA0003264953110000063
wherein theta is a set of internal weights w and bias values b of the convolutional neural network of the action value function, and theta-Convolving the neural network internal weights w for the target value function-Sum bias value b-A set of (a);
the convolutional neural network comprises two convolutional layers and two full-connection layers, and the input size is 3 multiplied by n multiplied by m;
wherein, the first layer convolution layer contains 9 convolution kernels, the size of the convolution kernels is 3 multiplied by 3, the step length is 1, and the edge filling is 2;
the second convolution layer contains 18 convolution kernels, the size of the convolution kernels is 3 x 3, the step size is 1, and the edge filling is 2;
the third layer is a full connection layer, the number of input nodes is 18 x (n +4) (m +4), and the number of output nodes is 18 x (n + 4);
the fourth layer is a full connection layer, the number of input nodes is 18 x (n +4), and the number of output nodes is 18 of the number of actions;
(4) adopting 18 heuristic scheduling rules as an agent action a for reinforcement learning, wherein a belongs to A;
establishing a task set of each machine according to a workpiece to be processed, inputting a global state into a convolutional neural network, converting the output of the convolutional neural network into the probability of each heuristic rule by using a softmax function, and when action selection is performed according to the probability P, in order to ensure that a scheduling strategy can be converged and has the capability of jumping out of a local optimal solution, adding uncertainty into the current decision to design a work selection mechanism, and combining the action a which is the action of selecting the maximum probability a max (P) with the action selection a random (P) according to uniform probability distribution;
the action selection mechanism comprises a manually set hyper-parameter beta and a randomly generated natural number d, the d belongs to (0,1), when the d is less than or equal to the hyper-parameter beta, a heuristic scheduling rule with the maximum probability is selected, when the d is greater than the hyper-parameter beta, the heuristic scheduling rule is selected according to uniform probability distribution, workpieces are selected from a task set according to the selected rule and are arranged on a corresponding machine for processing, and the action selection mechanism comprises the following steps:
Figure BDA0003264953110000071
wherein β is 0.95;
the 18 heuristic rules are shown in table 2:
table 218 heuristic scheduling rules
No. Rule Description of the invention
1 SPT Selecting the workpiece with the shortest processing time
2 LPT Selecting the workpiece with the longest processing time in the working procedure
3 SPT+SSO Selecting the workpiece with the shortest processing time in the current process and the subsequent process
4 LPT+LSO Selecting the workpiece with the longest processing time in the current process and the longest processing time in the subsequent process
5 SPT*TWK Selecting the workpiece with the smallest product of the working procedure machining time and the total machining time
6 LPT*TWK Selecting the workpiece with the largest product of the working procedure machining time and the total machining time
7 SPT/TWK Selecting the workpiece with the minimum ratio of the working procedure processing time to the total processing time
8 LPT/TWK Selecting the workpiece with the largest ratio of working procedure machining time to total machining time
9 SPT*TWKR Selecting the workpiece with the smallest product of the working procedure machining time and the residual machining time
10 LPT*TWKR Selecting the workpiece with the largest product of the working procedure machining time and the residual machining time
11 SPT/TWKR Selecting the workpiece with the smallest ratio of the working procedure machining time to the residual machining time
12 LPT/TWKR Selecting working procedure processing timeWorkpiece having maximum ratio to remaining machining time
13 SRM Selecting the workpiece with the shortest residual processing time except the current considered process
14 LRM Selecting the workpiece with the longest residual processing time except the current considered process
15 SRPT Selecting the workpiece with the shortest residual processing time
16 LRPT Selecting the workpiece with the longest residual processing time
17 SSO Selecting the workpiece with the shortest subsequent processing time
18 LSO Selecting the workpiece with the longest subsequent processing time
(5) Designing a reward function R (t) to evaluate the whole scheduling decision, and updating the weight parameter theta of the action value function by using a DQN algorithm to realize the updating of the scheduling decision;
in the aspect of reward function design, a single performance index of the minimum maximum completion time is mainly considered, the minimum maximum completion times obtained by scheduling are different due to different scheduling problems, and in order to evaluate the instant behavior in a unified manner, the optimization of the minimum maximum completion time is converted into the optimization of the machine utilization rate through the following formula:
Figure BDA0003264953110000072
Figure BDA0003264953110000073
in the formula, MkIs a k machine, CkAs a machine MkTime of end of working, Pi,jFor working work JiThe time of the j-th step;
defining the instant prize as:
r(t)=Uk(t)-Uk(t-1)
the expected target value function is calculated as:
Figure BDA0003264953110000081
wherein, the discount factor gamma is 0.95;
updating the parameter theta of the action value function by using a mode of minimizing random gradient descent of the square loss function, wherein the calculation formula of the loss function is as follows:
LDQN=(Yt DQN-Q(s,a;θt))2
the update formula of the parameter θ is:
Figure BDA0003264953110000082
wherein the learning rate α is 0.0001;
(6) performing state transition P (s '| s, a), and storing a state s, an action a, a reward r and a next state s' into a memory pool, wherein the capacity N of the memory pool is 2000;
executing the action a by the state s, updating the workshop environment, extracting the state s ', and sequentially storing the state s, the action a, the reward r and the next state s' in a memory pool;
(7) updating network parameters of the target value function;
when the memory pool is full, the intelligent agent starts to update the network parameters of the target value function by using the historical data, and 128 pieces of historical data are randomly extracted from the memory pool at 200 steps to carry out the target value function
Figure BDA0003264953110000083
Parameter theta of-The updated formula is as follows:
Figure BDA0003264953110000084
and the epamode is 3500 in this embodiment.
The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (8)

1. A single job shop scheduling method based on Deep reinforcement learning of Deep Q-network is characterized by comprising the following steps:
(1) modeling a job shop scheduling environment by adopting a method of drawing, converting a scheduling decision problem into a sequential decision problem, establishing a Markov quintuple epsilon (S, A, S', gamma and R) model, and solving the model by using deep reinforcement learning according to the performance evaluation index of the model which is the minimum maximum completion time;
(2) extracting a current state S belonging to S from an extraction diagram environment of job shop scheduling;
the state s is formed by a three-channel matrix, and the three matrices are a processing time matrix, a machine matrix and a procedure processing starting time matrix respectively;
(3) constructing an action value function Q (s, a) and a target value function
Figure FDA0003264953100000011
Fitting the action value function Q (s, a; theta) and the target value function by using a convolutional neural network
Figure FDA0003264953100000012
And initializing a function of two values Q (s, a; theta)0)、
Figure FDA0003264953100000013
Wherein theta is a set of internal weights w and bias values b of the convolutional neural network of the action value function, and theta-Convolving the neural network internal weights w for the target value function-Sum bias value b-A set of (a);
(4) adopting 18 heuristic scheduling rules as an agent action a for reinforcement learning, wherein a belongs to A;
establishing a task set of each machine according to a workpiece to be processed, inputting a global state into a convolutional neural network, converting the output of the convolutional neural network into the probability of each heuristic rule by using a softmax function, and when action selection is performed according to the probability P, in order to ensure that a scheduling strategy can be converged and has the capability of jumping out of a local optimal solution, adding uncertainty into the current decision to design a work selection mechanism, and combining the action a which is the action of selecting the maximum probability a max (P) with the action selection a random (P) according to uniform probability distribution;
the action selection mechanism comprises a manually set hyper-parameter beta and a randomly generated natural number d, the d belongs to (0,1), when the d is less than or equal to the hyper-parameter beta, a heuristic scheduling rule with the maximum probability is selected, when the d is greater than the hyper-parameter beta, the heuristic scheduling rule is selected according to uniform probability distribution, workpieces are selected from a task set according to the selected rule and are arranged on a corresponding machine for processing, and the action selection mechanism comprises the following steps:
Figure FDA0003264953100000014
(5) designing a reward function R (t) to evaluate the whole scheduling decision, and updating the weight parameter theta of the action value function by using a DQN algorithm to realize the updating of the scheduling decision;
in the aspect of reward function design, a single performance index of the minimum maximum completion time is mainly considered, the minimum maximum completion times obtained by scheduling are different due to different scheduling problems, and in order to evaluate the instant behavior in a unified manner, the optimization of the minimum maximum completion time is converted into the optimization of the machine utilization rate through the following formula:
Figure FDA0003264953100000021
Figure FDA0003264953100000022
in the formula, MkIs a k machine, CkAs a machine MkTime of end of working, Pi,jFor working work JiThe time of the j-th step;
defining the instant prize as:
r(t)=Uk(t)-Uk(t-1)
the expected target value function is calculated as:
Figure FDA0003264953100000023
updating the parameter theta of the action value function by using a mode of minimizing random gradient descent of the square loss function, wherein the calculation formula of the loss function is as follows:
LDQN=(Yt DQN-Q(s,a;θt))2
the update formula of the parameter θ is:
Figure FDA0003264953100000024
(6) carrying out state transition P (s '| s, a), and storing the state s, the action a, the reward r and the state s' of the next step into a memory pool;
executing the action a by the state s, updating the workshop environment, extracting the state s ', and sequentially storing the state s, the action a, the reward r and the next state s' in a memory pool;
(7) updating network parameters of the target value function;
when the memory pool is full, the agent starts to update the network parameters of the target value function by using the historical data, and randomly extracts the historical data of the batch _ size bar from the memory pool at intervals of C steps to perform the target value function
Figure FDA0003264953100000025
Parameter theta of-The updated formula is as follows:
Figure FDA0003264953100000031
2. the Deep reinforcement learning single-piece job shop scheduling method based on Deep Q-network of claim 1, wherein in the step (1), the state S is obtained from the environment, the state is input into the agent, the action a currently executed by the agent is output, the global state S is changed, the reward R is obtained, the process is repeated until all machines complete the processing task, and the job shop scheduling environment is established according to the disjunctive graph.
3. The Deep reinforcement learning based one-piece job shop scheduling method based on Deep Q-network of claim 1, wherein in the step (1), the extraction graph model is represented by a set G (N; C, E) of triples;
wherein, N is all process node sets, including virtual start node start and end node end;
c is a directed connection directed arc set between adjacent working procedures determined by the process under the same part;
e represents an extracted arc set between processes that can be processed on the same machine tool.
4. The Deep reinforcement learning single-piece job shop scheduling method based on Deep Q-network as claimed in claim 1, wherein in the step (2), the processing time matrix is a matrix formed by the processing time of each process, and each element in the matrix is normalized by 0-1 according to the maximum processing time.
5. The Deep reinforcement learning based single-piece job shop scheduling method according to claim 1, wherein in the step (2), the machine matrix is a matrix composed of machine numbers of the processing procedures.
6. The Deep reinforcement learning single-piece job shop scheduling method based on Deep Q-network of claim 1, wherein in the step (2), the processing start time matrix is a matrix formed by the start time of each processing, and each element is normalized by 0-1 according to the current maximum start time and initialized to zero.
7. The Deep reinforcement learning single-piece job shop scheduling method based on Deep Q-network of any one of claims 4 to 6, wherein in the step (2), the sizes of the processing time matrix, the machine matrix and the process processing start time matrix are n x m, n is the number of workpieces, and m is the number of processes.
8. The Deep reinforcement learning single-piece job shop scheduling method based on Deep Q-network of claim 1, wherein in the step (3), the convolutional neural network comprises two convolutional layers and two fully-connected layers, and the input size is 3 × n × m;
wherein, the first layer convolution layer contains 9 convolution kernels, the size of the convolution kernels is 3 multiplied by 3, the step length is 1, and the edge filling is 2;
the second convolution layer contains 18 convolution kernels, the size of the convolution kernels is 3 x 3, the step size is 1, and the edge filling is 2;
the third layer is a full connection layer, the number of input nodes is 18 x (n +4) (m +4), and the number of output nodes is 18 x (n + 4);
the fourth layer is a full connection layer, the number of input nodes is 18 (n +4), and the number of output nodes is 18.
CN202111084479.8A 2021-09-16 2021-09-16 Single-piece job shop scheduling method based on Deep Q-network Deep reinforcement learning Active CN113792924B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111084479.8A CN113792924B (en) 2021-09-16 2021-09-16 Single-piece job shop scheduling method based on Deep Q-network Deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111084479.8A CN113792924B (en) 2021-09-16 2021-09-16 Single-piece job shop scheduling method based on Deep Q-network Deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113792924A true CN113792924A (en) 2021-12-14
CN113792924B CN113792924B (en) 2024-10-22

Family

ID=79183615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111084479.8A Active CN113792924B (en) 2021-09-16 2021-09-16 Single-piece job shop scheduling method based on Deep Q-network Deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113792924B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580937A (en) * 2022-03-10 2022-06-03 暨南大学 Intelligent job scheduling system based on reinforcement learning and attention mechanism
CN114625089A (en) * 2022-03-15 2022-06-14 大连东软信息学院 A Job Shop Scheduling Method Based on Improved Near-End Policy Optimization Algorithm
CN114707881A (en) * 2022-04-18 2022-07-05 贵州大学 A job shop adaptive scheduling method based on deep reinforcement learning
CN114757512A (en) * 2022-04-02 2022-07-15 武汉理工大学 Job shop scheduling method based on graph neural network
CN114912826A (en) * 2022-05-30 2022-08-16 华中农业大学 A flexible job shop scheduling method based on multi-layer deep reinforcement learning
CN115016405A (en) * 2022-05-26 2022-09-06 天津大学 Process route multi-objective optimization method based on deep reinforcement learning
CN115034653A (en) * 2022-06-27 2022-09-09 暨南大学 End-to-end dynamic job shop scheduling model based on transformer
CN115293623A (en) * 2022-08-17 2022-11-04 海尔数字科技(青岛)有限公司 Training method and device for production scheduling model, electronic equipment and medium
CN116050803A (en) * 2023-02-27 2023-05-02 湘南学院 A dynamic scheduling method for automatic sorting of customized furniture panels
CN116307440A (en) * 2022-11-21 2023-06-23 暨南大学 A workshop scheduling method based on multi-objective weight learning based on reinforcement learning and its device and application
CN116610082A (en) * 2023-07-18 2023-08-18 安徽思高智能科技有限公司 RPA job workflow redundancy scheduling method and system based on deep reinforcement learning
CN116957172A (en) * 2023-09-21 2023-10-27 山东大学 Dynamic job shop scheduling optimization method and system based on deep reinforcement learning
KR102629022B1 (en) * 2023-07-21 2024-01-25 주식회사 마키나락스 Scheduling method based on reinforcement learning
CN117474295A (en) * 2023-12-26 2024-01-30 长春工业大学 Multi-AGV load balancing and task scheduling method based on lasting DQN algorithm
CN117764360A (en) * 2023-12-29 2024-03-26 中海油信息科技有限公司 Paint workshop intelligent scheduling method based on graphic neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053152A (en) * 2018-01-30 2018-05-18 陕西科技大学 The method that improved adaptive GA-IAGA based on polychromatic sets solves dynamic job shop scheduling
CN111985672A (en) * 2020-05-08 2020-11-24 东华大学 Single-piece job shop scheduling method for multi-Agent deep reinforcement learning
CN112734172A (en) * 2020-12-25 2021-04-30 南京理工大学 Hybrid flow shop scheduling method based on time sequence difference

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053152A (en) * 2018-01-30 2018-05-18 陕西科技大学 The method that improved adaptive GA-IAGA based on polychromatic sets solves dynamic job shop scheduling
CN111985672A (en) * 2020-05-08 2020-11-24 东华大学 Single-piece job shop scheduling method for multi-Agent deep reinforcement learning
CN112734172A (en) * 2020-12-25 2021-04-30 南京理工大学 Hybrid flow shop scheduling method based on time sequence difference

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580937B (en) * 2022-03-10 2023-04-28 暨南大学 Intelligent Job Scheduling System Based on Reinforcement Learning and Attention Mechanism
CN114580937A (en) * 2022-03-10 2022-06-03 暨南大学 Intelligent job scheduling system based on reinforcement learning and attention mechanism
CN114625089B (en) * 2022-03-15 2024-05-03 大连东软信息学院 Job shop scheduling method based on improved near-end strategy optimization algorithm
CN114625089A (en) * 2022-03-15 2022-06-14 大连东软信息学院 A Job Shop Scheduling Method Based on Improved Near-End Policy Optimization Algorithm
CN114757512A (en) * 2022-04-02 2022-07-15 武汉理工大学 Job shop scheduling method based on graph neural network
CN114707881A (en) * 2022-04-18 2022-07-05 贵州大学 A job shop adaptive scheduling method based on deep reinforcement learning
CN115016405A (en) * 2022-05-26 2022-09-06 天津大学 Process route multi-objective optimization method based on deep reinforcement learning
CN115016405B (en) * 2022-05-26 2024-09-24 天津大学 A multi-objective optimization method for process routes based on deep reinforcement learning
CN114912826A (en) * 2022-05-30 2022-08-16 华中农业大学 A flexible job shop scheduling method based on multi-layer deep reinforcement learning
CN114912826B (en) * 2022-05-30 2024-07-02 华中农业大学 Flexible job shop scheduling method based on multilayer deep reinforcement learning
CN115034653A (en) * 2022-06-27 2022-09-09 暨南大学 End-to-end dynamic job shop scheduling model based on transformer
CN115293623A (en) * 2022-08-17 2022-11-04 海尔数字科技(青岛)有限公司 Training method and device for production scheduling model, electronic equipment and medium
CN116307440A (en) * 2022-11-21 2023-06-23 暨南大学 A workshop scheduling method based on multi-objective weight learning based on reinforcement learning and its device and application
CN116307440B (en) * 2022-11-21 2023-11-17 暨南大学 Workshop scheduling method based on reinforcement learning and multi-objective weight learning, device and application thereof
CN116050803A (en) * 2023-02-27 2023-05-02 湘南学院 A dynamic scheduling method for automatic sorting of customized furniture panels
CN116610082B (en) * 2023-07-18 2023-10-31 安徽思高智能科技有限公司 RPA job workflow redundancy scheduling method and system based on deep reinforcement learning
CN116610082A (en) * 2023-07-18 2023-08-18 安徽思高智能科技有限公司 RPA job workflow redundancy scheduling method and system based on deep reinforcement learning
KR102629022B1 (en) * 2023-07-21 2024-01-25 주식회사 마키나락스 Scheduling method based on reinforcement learning
CN116957172B (en) * 2023-09-21 2024-01-16 山东大学 Dynamic job shop scheduling optimization method and system based on deep reinforcement learning
CN116957172A (en) * 2023-09-21 2023-10-27 山东大学 Dynamic job shop scheduling optimization method and system based on deep reinforcement learning
CN117474295A (en) * 2023-12-26 2024-01-30 长春工业大学 Multi-AGV load balancing and task scheduling method based on lasting DQN algorithm
CN117474295B (en) * 2023-12-26 2024-04-26 长春工业大学 Dueling DQN algorithm-based multi-AGV load balancing and task scheduling method
CN117764360A (en) * 2023-12-29 2024-03-26 中海油信息科技有限公司 Paint workshop intelligent scheduling method based on graphic neural network

Also Published As

Publication number Publication date
CN113792924B (en) 2024-10-22

Similar Documents

Publication Publication Date Title
CN113792924B (en) Single-piece job shop scheduling method based on Deep Q-network Deep reinforcement learning
CN111966050A (en) Optimization method for dual-resource mold job shop scheduling based on AMMAS-GA nesting algorithm
CN110533301B (en) Particle swarm scheduling method based on dynamic constraint matrix
CN104536412B (en) Photoetching procedure dynamic scheduling method based on index forecasting and solution similarity analysis
CN111985672B (en) Single-piece job shop scheduling method for multi-Agent deep reinforcement learning
CN111353646B (en) Steelmaking flexible scheduling optimization method, system, medium and equipment with switching time
CN117314078B (en) Deadlock-free Scheduling Method for Flexible Manufacturing System Based on Petri Net and Neural Network
CN117852825B (en) Deadlock-free scheduling method of flexible manufacturing system containing central resources based on deep learning
CN105975701A (en) Parallel scheduling disassembly path forming method based on mixing fuzzy model
CN113792494B (en) Multi-objective flexible job shop scheduling method based on migratory bird flock algorithm and cross fusion
CN112348323A (en) Multi-target energy supply and operation flexible scheduling method
CN115640898A (en) A Large-Scale Flexible Job Shop Scheduling Method Based on DDQN Algorithm
CN114004065A (en) Multi-objective optimization method of substation engineering based on intelligent algorithm and environmental constraints
CN115981262A (en) IMOEA-based hydraulic cylinder part workshop production scheduling method
CN118917567A (en) Multi-target distributed flexible job shop scheduling method based on hierarchical selection type deep reinforcement learning
CN114879618B (en) Knowledge-driven intelligent production scheduling method for workshops
CN116360362A (en) An Intelligent Scheduling Method for Workshop Considering Workpiece Arrangement
Li et al. An improved whale optimisation algorithm for distributed assembly flow shop with crane transportation
CN109214695B (en) High-end equipment research, development and manufacturing cooperative scheduling method and system based on improved EDA
CN118605425A (en) Distributed flexible job shop scheduling method and equipment based on deep reinforcement learning
CN117726119A (en) A graph bionic learning method to solve distributed hybrid flow workshop group scheduling
CN114839930B (en) An integrated scheduling system for distributed assembly blocked flow workshops
CN117273319A (en) A method to solve dynamic FJSP based on DIFFormer and deep reinforcement learning
CN110298468A (en) A kind of single dimension chain optimization matching method based on ant group algorithm
CN114237166A (en) Method for solving multi-rotating-speed energy-saving scheduling problem based on improved SPEA2 algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant