Abstract
How to reduce the complexity of network topology and make the learned joint probability distribution fit data are two important but inconsistent issues for learning Bayesian network classifier (BNC). By transforming one single high-order topology into a set of low-order ones, ensemble learning algorithms can include more hypothesis implicated in training data and help achieve the tradeoff between bias and variance. Resampling from training data can vary the results of member classifiers of the ensemble, whereas the potentially lost information may bias the estimate of conditional probability distribution and then introduce insignificant rather than significant dependency relationships into the network topology of BNC. In this paper, we propose to learn from training data as a whole and apply heuristic search strategy to flexibly identify the significant conditional dependencies, and then the attribute order is determined implicitly. Random sampling is introduced to make each member of the ensemble “unstable” and fully represent the conditional dependencies. The experimental evaluation on 40 UCI datasets reveals that the proposed algorithm, called random Bayesian forest (RBF), achieves remarkable classification performance compared to the extended version of state-of-the-art out-of-core BNCs (e.g., SKDB, WATAN, WAODE, SA2DE, SASA2DE and IWAODE).







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29 (2-3):131–163
Jiang LX, Zhang LG, Li CQ, Wu J (2019) A Correlation-Based feature weighting filter for naive bayes. IEEE Trans Knowl Data Eng 31(2):201–213
Chickering DM, Heckerman D, Meek C (2004) Large-sample learning of Bayesian networks is NP-hard. J Mach Learn Res 5:1287–1330
Jiang LX, Zhang LG, Yu LJ, Wang DH (2019) Class-specific attribute weighted naive Bayes. Pattern Recogn 88:321–330
Wang LM, Zhang S, Mammadov M, Li K, Zhang XH (2021) Semi-supervised weighting for averaged one-dependence estimators. Applied Intelligence
Liu Y, Wang LM, Mammadov M, Chen SL, Wang GJ, Qi SK, Sun MH (2021) Hierarchical Independence Thresholding for learning Bayesian network classifiers. Knowl-Based Syst, p 212
Liu Y, Wang LM, Mammadov M (2020) Learning semi-lazy Bayesian network classifier under the c.i.i.d assumption. Knowl-Based Syst, p 208
Jiang LX, Li CQ, Wang SS, Zhang LG (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39
Jiang LX, Zhang H, Cai ZH (2009) A novel bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng 21(10):1361–1371
Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467
Sahami M (1996) Learning limited dependence bayesian classifiers. Knowledge Discovery in Databases 96(1):335–338
Wang LM, Zhang XH, Li K, Zhang S (2021) Semi-supervised learning for k-dependence Bayesian classifiers. Applied Intelligence
Wang LM, Chen SL, Mammadov M (2018) Target Learning: A Novel Framework to Mine Significant Dependencies for Unlabeled Data. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp 106–117
Herr HD, Krzysztofowicz R (2019) Ensemble Bayesian forecasting system Part II: Experiments and properties. J Hydrol 575:1328–1344
Aridas CK, Kotsiantis SB, Vrahatis MN (2016) Increasing Diversity in Random Forests Using Naive Bayes. In: Proceedings of the 12th International Conference on Artificial Intelligence Applications and Innovations, pp 75–86
Ho TK (1995) Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp 278–282
Wang LM, Chen P, Chen SL, Sun MH (2021) A novel approach to fully representing the diversity in conditional dependencies for learning Bayesian network classifier. Intelligent Data Analysis 25(1):35–55
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13rd International Conference on Machine Learning, pp 148–156
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Adeva JJG, Beresi UC, Calvo RA (2005) Accuracy and diversity in ensembles of text categorisers. CLEI Electronic Journal 9(1):1–12
Zhang H, Petitjean F, Buntine W (2020) Bayesian network classifiers using ensembles and smoothing. Knowl Inf Syst 62(9):3457–3480
Wang LM, Wang GJ, Duan ZY, Lou H, Sun MH (2019) Optimizing the Topology of Bayesian Network Classifiers by Applying Conditional Entropy to Mine Causal Relationships Between Attributes. IEEE Access 7:134271–134279
Martinez AM, Webb GI, Chen SL, Zaidi NA (2016) Scalable learning of bayesian network classifiers. J Mach Learn Res 17(1):1515–1549
Webb GI, Boughton JR, Wang ZH (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24
Jiang LX, Zhang H, Cai ZH, Wang DH (2012) Weighted average of one-dependence estimators. Journal of Experimental & Theoretical Artificial Intelligence 24(2):219–230
Kong H, Shi XH, Wang LM, Liu Y, Mammadov M (2021) Averaged tree-augmented one-dependence estimators. Appl Intell 51(7):4270–4286
Jiang LX, Cai ZH, Wang DH, Zhang H (2012) Improving Tree augmented Naive Bayes for class probability estimation. Knowl-Based Syst 26:239–245
Hellman S, McGovern A, Xue M (2012) Learning ensembles of Continuous Bayesian Networks: An application to rainfall prediction. In: Proceedings of 2012 Conference on Intelligent Data Understanding, pp 112–117
Geiger D, Heckerman D (1996) Knowledge representation and inference in similarity networks and Bayesian multinets. Artif Intell 82(1-2):45–74
Davison AC, Hinkley DV, Young GA (2003) Recent developments in bootstrap methodology. Stat Sci 18(2):141–157
Bryll R, Gutierrez-Osuna R, Quek F (2003) Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern Recogn 36(6):1291–1302
Jing YS, Pavlovic V, Rehg JM (2008) Boosted Bayesian network classifiers. Mach Learn 73(2):155–184
Ratsch G, Onoda T, Muller KR (2001) Soft margins for AdaBoost. Mach Learn 42 (3):287–320
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Kunwar R, Pal U, Blumenstein M (2014) Semi-Supervised Online Bayesian Network Learner for Handwritten Characters Recognition. In: Proceedings of the 22nd International Conference on Pattern Recognition, pp 3104–3109
Ma SC, Shi HB (2004) Tree-augmented Naive Bayes ensembles. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, pp 1497–1502
Breiman L (2001) Random forests. Machine Learning 45(1):5–32
Murphy PM, Aha DW (1994) UCI Repository of Machine Learning Databases, Available online: http://www.ics.uci.edu/mlearn/MLRepository.html
Kumar V, Heikkonen J, Rissanen J, Kaski K (2006) Minimum description length denoising with histogram models. IEEE Transactions on Signal Processing 54(8):2922–2928
Cestnik B (1990) Estimating probabilities: A crucial task in machine learning. In: Proceedings of the 9th European Conference on Artificial Intelligence, pp 147–149
Chen SL, Martinez AM, Webb GI, Wang LM (2017) Selective anDE for large data learning: a low-bias memory constrained approach. Knowl Inf Syst 50(2):475–503
Chen SL, Martinez AM, Webb GI, Wang LM (2017) Sample-based Attribute Selective anDE for Large Data. IEEE Trans Knowl Data Eng 29(1):172–185
Duan ZY, Wang LM, Chen SL, Sun MH (2020) Instance-based weighting filter for superparent one-dependence estimators. Knowl-Based Syst 203:106085
Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th International Conference on Machine Learning, pp 275–283
Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688
Salles T, Rocha L, Goncalves M (2020) A bias-variance analysis of state-of-the-art random forest text classifiers. ADAC 15(2):379–405
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach Learn 40(2):139–157
Acknowledgements
This work is supported by the National Key Research and Development Program of China (No.2019YFC1804804), Open Research Project of The Hubei Key Laboratory of Intelligent Geo-Information Processing (No.KLIGIP-2021A04), and the Scientific and Technological Developing Scheme of Jilin Province (No.20200201281JC).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Ren, Y., Wang, L., Li, X. et al. Stochastic optimization for bayesian network classifiers. Appl Intell 52, 15496–15516 (2022). https://doi.org/10.1007/s10489-022-03356-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03356-z