Abstract
Deep reinforcement learning (DRL) has been preliminarily applied to run-to-run (RtR) control. However, the existing works have mainly conducted on shift and drift disturbances in the chemical mechanical polishing (CMP) process and have not taken the non-stationary time-series disturbances into full consideration. Inspiring from the powerful self-learning mechanism of DRL, a new distributional reinforcement learning controller, quantile option structure deep deterministic policy gradient (QUOTA-DDPG), is designed to generate control policies without precise numerical model in this work. Specifically, the procedure for adjusting the recipe is formulated as a Markovian decision process. Meanwhile, state, action and reward are reasonably designed. Regarding QUOTA-DDPG, an option is first determined based on the option strategy, and the action is decided via intra-option policy at each time step. Moreover, target network and empirical replay mechanisms are utilized to enhance the stability and trainability. Simulations demonstrate that the presented approach outperforms the existing methods regarding the disturbance compensation and target tracking. The application of QUOTA-DDPG controller enriches the development of semiconductor smart manufacturing.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
References
Liu K, Chen Y, Zhang T et al (2018) A survey of run-to-run control for batch processes. ISA Trans 83:107–125
Zhong Z, Wang A, Kim H et al (2021) Adaptive cautious regularized run-to-run controller for lithography process. IEEE Trans Semicond Manuf 34(3):387–397
Fan SKS, Jen CH, Hsu CY et al (2020) A new double exponentially weighted moving average run-to-run control using a disturbance-accumulating strategy for mixed-product mode. IEEE Trans Autom Sci Eng 18(4):1846–1860
Lee AC, Horng JH, Kuo TW et al (2014) Robustness analysis of mixed product run-to-run control for semiconductor process based on ODOB control structure. IEEE Trans Semicond Manuf 27(2):212–222
Wang HY, Pan TH, Wong DSH et al (2019) An extended state observer-based run to run control for semiconductor manufacturing processes. IEEE Trans Semicond Manuf 32(2):154–162
Bao Y, Zhu Y, Qian F (2021) A deep reinforcement learning approach to improve the learning performance in process control. Ind Eng Chem Res 60(15):5504–5515
Wang J, Ma Y, Zhang L et al (2018) Deep learning for smart manufacturing: methods and applications. J Manuf Syst 48:144–156
Gupta S, Singal G, Garg D (2021) Deep reinforcement learning techniques in diversified domains: a survey. Archiv Comput Methods Eng 28(7):4715–4754
Wang H, Liu N, Zhang Y et al (2020) Deep reinforcement learning: a survey. Front Inf Technol Electron Eng 21(12):1726–1744
Grondman I, Busoniu L, Lopes GAD et al (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern C (Appl Rev) 42(6):1291–1307
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Lillicrap T P, Hunt J J, Pritzel A et al (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning. PMLR, pp 1587–1596
Schulman J, Wolski F, Dhariwal P et al (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Haarnoja T, Zhou A, Abbeel P et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning. PMLR, pp 1861–1870
Spielberg S P K, Gopaluni R B, Loewen PD (2017) Deep reinforcement learning approaches for process control. In: 6th international symposium on advanced control of industrial processes (AdCONIP). IEEE, pp 201–206
Song Z, Yang J, Mei X et al (2021) Deep reinforcement learning for permanent magnet synchronous motor speed control systems. Neural Comput Appl 33(10):5409–5418
Ma Y, Zhu W, Benton MG et al (2019) Continuous control of a polymerization system with deep reinforcement learning. J Process Control 75:40–47
Joshi T, Makker S, Kodamana H et al (2021) Twin actor twin delayed deep deterministic policy gradient (TATD3) learning for batch process control. Comput Chem Eng 155:107527
Nian R, Liu J, Huang B (2020) A review on reinforcement learning: introduction and applications in industrial process control. Comput Chem Eng 139:106886
Spielberg S, Tulsyan A, Lawrence NP et al (2020) Deep reinforcement learning for process control: a primer for beginners. arXiv preprint arXiv:2004.05490
Yu J, Guo P (2020) Run-to-run control of chemical mechanical polishing process based on deep reinforcement learning. IEEE Trans Semicond Manuf 33(3):454–465
Dabney W, Rowland M, Bellemare M et al (2018) Distributional reinforcement learning with quantile regression. In: Proceedings of the AAAI conference on artificial intelligence, vol 32(1)
Bellemare M G, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: International conference on machine learning. PMLR, pp 449–458
Zhang S, Yao H (2019) Quota: the quantile option architecture for reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01), pp 5797–5804
Bacon P L, Harb J, Precup D (2017) The option-critic architecture. In: Proceedings of the AAAI conference on artificial intelligence, vol 31(1)
Botvinick MM (2012) Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol 22(6):956–962
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (No. 62273002, No. 61873113).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, Z., Pan, T. Distributional reinforcement learning for run-to-run control in semiconductor manufacturing processes. Neural Comput & Applic 35, 19337–19350 (2023). https://doi.org/10.1007/s00521-023-08760-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08760-1