Abstract
Handling missing values and large-dimensional features are crucial requirements for data-driven fault diagnosis systems. However, most intelligent data-driven diagnostic systems are not able to handle missing data. The presence of high-dimensional feature sets can also further complicate the process of fault diagnosis. This paper aims to devise a missing data imputation unit along with a dimensionality reduction unit in the pre-processing module of the diagnostic system. This paper proposes a novel pooling strategy for missing data imputation (PSMI). This strategy can simplify complex patterns of missingness and incrementally update the pool. The pre-processing module receives incomplete observations, PSMI estimates missing values, and, then, the dimensionality reduction unit transforms completed observations onto a lower-dimensional feature space. These transformed observations are then fed as inputs to the fault classification module for decision making and diagnosis. This diagnostic scheme makes use of various state-of-the-art missing data imputation, dimensionality reduction and classification algorithms. This enables a comprehensive comparison and allows to find the best techniques for the sake of diagnosing faults in the Tennessee Eastman process. The obtained results show the effectiveness of the proposed pooling strategy and indicate that principal component analysis imputation and heteroscedastic discriminant analysis approaches outperform other imputation and dimensionality reduction techniques in this diagnostic application.








Similar content being viewed by others
References
Altman N (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185
Atouni M, Verron S, Kobi A (2015) Fault detection with conditional Gaussian network. Eng Appl Artif Intell 45:473–481
Batista G, Monard M (2002) A study of k-nearest neighbour as an imputation method. HIS 87:251–260
Bellman RE (1961) Adaptive control processes. Princeton University Press, Princeton
Cao W, Haralick R (2009) Affine feature extraction: a generalization of the fukunaga–koontz transformation. Eng Appl Artif Intell 22(1):40–47
Downs J, Vogel E (1993) A plant-wide industrial process control problem. Comput Chem Eng 17(2):245–255
Farajzadeh-Zanjani M, Hallaji E, Razavi-Far R, Saif M (2021) Generative-adversarial class-imbalance learning for classifying cyber-attacks and faults—a cyber-physical power system. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2021.3118636
Farajzadeh-Zanjani M, Hallaji E, Razavi-Far R, Saif M (2021) Generative adversarial dimensionality reduction for diagnosing faults and attacks in cyber-physical systems. Neurocomputing 440:101–110
Farajzadeh-Zanjani M, Hallaji E, Razavi-Far R, Saif M, Parvania M (2021) Adversarial semi-supervised learning for diagnosing faults and attacks in power grids. IEEE Trans Smart Grid 12(4):3468–3478
Farajzadeh-Zanjani M, Razavi-Far R, Saif M (2016) Efficient sampling techniques for ensemble learning and diagnosing bearing defects under class imbalanced condition. In: 2016 IEEE symposium series on computational intelligence (SSCI). pp 1–7
Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Folch-Fortuny A, Arteaga F, Ferrer A (2016) Missing data imputation toolbox for MATLAB. Chemom Intell Lab Syst 154:93–100
Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighbourhood components analysis. In: Advances in neural information processing systems, vol 17. MIT Press, pp 513–520
Grimble M, Johnson M (2005) Advanced textbooks in control and signal processing. Springer, Berlin
Hallaji E, Razavi-Far R, Saif M (2021) DLIN: Deep ladder imputation network. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3054878
Hancer E, Xue B, Zhang M, Karaboga D, Akay B (2018) Pareto front feature selection based on artificial bee colony optimization. Inf Sci 422:462–479
Huang G (2014) An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput 6:376–390
Jing C, Gao X, Zhu X, Lang S (2014) Fault classification on Tennessee Eastman process: PCA and SVM. In: 2014 International conference on mechatronics and control (ICMC)
Josse J, Husson F (2013) Handling missing values in exploratory multivariate data analysis methods. J SFdS 153(2):79–99
Kasun LLC, Yang Y, Huang GB, Zhang Z (2016) Dimension reduction with extreme learning machine. IEEE Trans Image Process 25(8):3906–3918
Loog M, Duin R (2004) Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion. IEEE Trans Pattern Anal Mach Intell 26(6):732–739
Monsef H, Ranjbar A, Jadid S (1997) Fuzzy rule-based expert system for power system fault diagnosis. IEE Proc Gener Transm Distrib 144(2):186–192
Oliveira J, Pontes VK, Sartori I, Embirucu M (2017) Fault detection and diagnosis in dynamic systems using weightless neural networks. Expert Syst Appl 84:200–219
Razavi-Far R, Chakrabarti S, Saif M, Zio E (2019) An integrated imputation–prediction scheme for prognostics of battery data with missing observations. Expert Syst Appl 115:709–723
Razavi-Far R, Cheng B, Saif M, Ahmadi M (2020) Similarity-learning information-fusion schemes for missing data imputation. Knowl Based Syst 187:104805
Razavi-Far R, Davilu H, Palade V, Lucas C (2009) Model-based fault detection and isolation of a steam generator using neuro-fuzzy networks. Neurocomputing 72(13):2939–2951
Razavi-Far R, Farajzadeh-Zanajni M, Wang B, Saif M, Chakrabarti S (2021) Imputation-based ensemble techniques for class imbalance learning. IEEE Trans Knowl Data Eng 33(5):1988–2001
Razavi-Far R, Farajzadeh-Zanjani M, Saif M (2017) An integrated class-imbalanced learning scheme for diagnosing bearing defects in induction motors. IEEE Trans Ind Inform 13(6):2758–2769
Razavi-Far R, Farajzadeh-Zanjani M, Saif M, Chakrabarti S (2020) Correlation clustering imputation for diagnosing attacks and faults with missing power grid data. IEEE Trans Smart Grid 11(2):1453–1464
Razavi-Far R, Kinnaert M (2012) Incremental design of a decision system for residual evaluation: a wind turbine application*. In: IFAC proceedings. 8th IFAC symposium on fault detection, supervision and safety of technical processes, vol 45(20). pp 343–348
Razavi-Far R, Palade V, Zio E (2014) Optimal detection of new classes of faults by an invasive weed optimization method. In: 2014 International joint conference on neural networks (IJCNN). pp 91–98
Razavi-Far R, Zio E, Palade V (2014) Efficient residuals preprocessing for diagnosing multi-class faults in a doubly fed induction generator, under missing data scenarios. Expert Syst Appl 41(14):6386–6399
Scheffer J (2002) Dealing with missing data. Res Lett Inf Math Sci 3:153–160
Sharma N, Saroha K (2015) Study of dimension reduction methodologies in data mining. In: International conference on computing, communication and automation (ICCCA2015)
Sim J, Kwon O, Lee K (2016) Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets. Expert Syst Appl 46:486–493
Wang G, Li J, Sun C, Jiao J (2018) Least squares and contribution plot based approach for quality-related process monitoring. IEEE Access 6:54158–54166
Yang X, Rui S, Zhang X, Xu S, Yang C, Liu PX (2019) Fault diagnosis in chemical processes based on class-incremental FDA and PCA. IEEE Access 7:18164–18171
Zhang S (2012) Nearest neighbor selection for iteratively KNN imputation. J Syst Softw 85(11):2541–2552
Zhang Z, Dong F (2014) Fault detection and diagnosis for missing data systems with a three time-slice dynamic Bayesian network approach. Chemom Intell Lab Syst 138:30–40
Zhu J, Ge Z, Song Z (2017) Distributed parallel PCA for modeling and monitoring of large-scale plant-wide processes with big data. IEEE Trans Ind Inform 13(4):1877–1885
Zhu Y, Wang Z, Gao D, Li D (2017) GMFLLM: a general manifold framework unifying three classic models for dimensionality reduction. Eng Appl Artif Intell 65:421–432
Zhu Z, Song ZH (2011) A novel fault diagnosis system using pattern classification on kernel FDA subspace. Expert Syst Appl 38:6895–6905
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Razavi-Far, R., Saif, M., Palade, V. et al. An integrated framework for diagnosing process faults with incomplete features. Knowl Inf Syst 64, 75–93 (2022). https://doi.org/10.1007/s10115-021-01625-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01625-w