0% found this document useful (0 votes)

127 views6 pages

AutoML Tool for Big Industrial Data

Uploaded by

周牮

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

127 views6 pages

AutoML Tool for Big Industrial Data

Uploaded by

周牮

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

SoftwareX 17 (2022) 100919

Contents lists available at ScienceDirect

SoftwareX
journal homepage: www.elsevier.com/locate/softx

Original software publication

AMLBID: An auto-explained Automated Machine Learning tool for Big

Industrial Data
∗
Moncef Garouani a,b,c , , Adeel Ahmad a , Mourad Bouneffa a , Mohamed Hamlich b
a
Univ. Littoral Cote d’Opale, UR 4491, LISIC, Laboratoire d’Informatique Signal et Image de la Cote d’Opale, F-62100 Calais, France
b
CCPS Laboratory, ENSAM, University of Hassan II, Casablanca, Morocco
c
Study and Research Center for Engineering and Management(CERIM), HESTIM, Casablanca, Morocco

article info a b s t r a c t

Article history: The Machine Learning (ML) based solutions in manufacturing industrial contexts often require skilled
Received 15 June 2021 resources. More practical non-expert software solutions are then desired to enhance the usability of
Received in revised form 22 November 2021 ML algorithms. The algorithm selection and configuration is one of the most difficult tasks for users like
Accepted 22 November 2021
manufacturing specialists. The identification of the most appropriate algorithm in an automatic manner
Keywords: is among the major research challenges to achieve optimal performance of ML tools. In this paper, we
Machine learning present an auto-explained Automated Machine Learning tool for Big Industrial Data (AMLBID) to better
AutoML cope with the prominent challenges posed by the evolution of Big Industrial Data. It is a meta-learning
Meta-learning based decision support system for the automated selection and tuning of implied hyperparameters
Decision-support systems for ML algorithms. Moreover, the framework is equipped with an explainer module that makes the
Explainable AI outcomes transparent and interpretable for well-performing ML systems.
Big industrial data © 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Code metadata

Current code version v0.1

Permanent link to code/repository used for this code version https://github.com/ElsevierSoftwareX/SOFTX-D-21-00111
Code Ocean compute capsule https://codeocean.com/capsule/9828764/tree
Legal Code License MIT License
Code versioning system used git
Software code languages, tools, and services used Python
Compilation requirements, operating environments & dependencies Python 3.x; Jupyter; dash; sklearn; See requirements file
If available Link to developer documentation/manual https://github.com/LeMGarouani/AMLBID/blob/main/README.md
Support email for questions mgarouani@gmail.com

1. Motivation and significance the ML sufficiently assists the large data analysis for decision-
making purposes, the human interventions are often required.
The domain experts master better the application area. For ex-
Industrial Big Data refers to the large amount of diversified
ample, the domain experts are be able to provide characteristics
data that is generated continuously, in real time by the network
of the application, which can help to improve the performance
of industrial equipment [1]. The continuous digital transforma-
of the algorithms. However, they are not necessarily ML experts.
tion of the manufacturing industry has led to the widespread
Consequently, the large number of available ML algorithms and
adoption of ML solutions [2]. Although, in many industrial areas, hyperparameters configurations could lead to infeasible exhaus-
tive search executions. Therefore, in this context, the expertise of
∗ Corresponding author at: Univ. Littoral Cote d’Opale, UR 4491, LISIC, data-scientists is highly desired for the identification of the most
Laboratoire d’Informatique Signal et Image de la Cote d’Opale, F-62100 Calais,
appropriate algorithm configurations [3,4].
France. The selection of an algorithm or a family of algorithms that
E-mail address: moncef.garouani@etu.univ-littoral.fr (Moncef Garouani). are more likely to perform better on a given combination of

https://doi.org/10.1016/j.softx.2021.100919
2352-7110/© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
Moncef Garouani, Adeel Ahmad, Mourad Bouneffa et al. SoftwareX 17 (2022) 100919

datasets and their evaluation measures is a critical task [3]. The 2. Software description
ML algorithms generally have two kinds of parameters :
It is often difficult to build an accurate predictive model
• the ordinary parameters that the model learns and opti- based on ML for an industrial problem that is easy to be inter-
mizes automatically based on its normal behavior during the preted by non-expert ML developers [6,22]. The key idea behind
learning phase, the transparent and auto-explainable AutoML vision is to sepa-
• the hyperparameters (categorical and continuous), which rate the recommendations from the explanations by using two
are usually manually set before the beginning of the model modules simultaneously, as shown in Fig. 1. The Recommender
training. module (AMLBID) for the recommendations and the Explanatory
module (AMLExplainer) for explanations. The first module is
In the context of the manufacturing industry, a major chal-
used to provide the most appropriate ML configuration (s) for a
lenge is the selection of the feasible ML algorithm and the tun- given problem. It is aimed at maximizing the requested predic-
ing of related hyperparameters. The algorithm selection and its tive metric (e.g. Accuracy, Recall, Precision). The second module
configuration (tuning of hyperparameters) is a complex process is used to provide the rationale behind the recommended ML
because the ML algorithms are used as a ‘‘black-box’’. The perfor- configuration (s) as well as auto-generated explanations to better
mance is affected by the characteristics of the datasets and the understand the inner workings of the model in an interpretable
configuration of algorithms hyperparameters [5]. The selection manner through an interactive multi-view tool.
and configuration of appropriate algorithm(s) is an error prone
and time-consuming process due to the prevailing flaws while 2.1. Software architecture
establishing the multiple configurations. It hence emphasizes the
need to automate this process. The workflow of the proposed self-explanatory AutoML sys-
The Automated Machine Learning (AutoML) [6] is a decision tem consists of two major components :
support system that partially or totally automates the ML pipeline. • the AutoML component, which presents the AutoML process
The major goal of this research field is to enable non-expert ML at the different levels of abstraction from ML configuration’s
developers to effectively utilize ‘‘off-the-shelf’’ solutions, which recommendation to the refinement,
would save time and effort for practitioners [7,8]. At its core, the • and the explanatory one, which allows users to inspect both
AutoML strives to achieve the performance criteria (e.g. accuracy, the process of decision generation and the inner working of
recall, F1 score) in order to solve the respective ML tasks such as the recommended ML model.
classification, regression, or clustering. In the intervening period,
In the following sections, we discuss these modules in brief detail.
the AutoML optimizes a given performance criterion [9] to solve
the particular task with respect to the dataset.
2.1.1. Recommendation module
Multiple approaches have been proposed to tackle the above The AutoML tool for Big Industrial Data (AMLBID) is a meta-
problem [9–13] owing to the immense potential of AutoML. In learning based system in order to automate the problem of
this regard, several tools are available in the research community algorithm selection and its configuration. It uses a recommen-
such as Auto-sklearn [5], TPOT [12], and AutoWEKA [14]. There dation system that is bootstrapped with a knowledge base. The
are also several commercial tools such as RapidMiner [15], H2O current knowledge base is derived from a large set of experi-
Driverless AI [16], Data Robot [17], and MATLAB ML toolbox [18]. ments conducted on 400 real-world manufacturing classification
We observe that many industrial actors are competing around datasets which are collected from the popular repositories, such
the goal of automating the machine learning [19]. They are mostly as OpenML,1 UCI2 and Kaggle3 . It accumulates the generation
focused on various budget-limited tasks dealing with the su- of more than 4 millions evaluated ML configurations (pipelines).
pervised learning. However, they typically come up with the Each pipeline consists of a choice of a ML model and the con-
black-box solutions and lack the effective explanations of the pre- figuration of its hyperparameters. The system is able to identify
dicted performance factors. It is also worth noting that the cost of effective pipelines without performing expensive computational
these solutions tends to higher due to the involved computational analysis. For this purpose, the system explores the interactions
complexity and the time required to generate recommendations between meta-features (characteristics) of the datasets and the
pipelines topologies.
[3].
The recommendation phase is initiated with the occurrence
Generally, in the most of the existing AutoML systems, the
of a dataset as a new input of the AutoML process. At this
visibility is limited on the prominent exhibition of input and
stage, the user selects a predictive analytical metric (e.g. Precision,
output parameters. They rather conceal the visibility of inherent
Accuracy, Recall) to be used for the analysis. AMLBID then auto-
associations among them. Instead of that, the confidence of users matically provides a set of ML algorithms and recommended con-
can be increased with the transparency of the automatic results figuration of their related hyperparameters, so that the predictive
in AutoML systems. The user confidence in AutoML systems is performance becomes the first-rate performance.
important because conventionally AutoML systems are used as
the Decision Support Systems (DSS). Therefore, the acceptabil- 2.1.2. Explainer module
ity and the trust-in factors of an AutoML support system are AMLExplainer and AMLBID are implemented following a
highly dependent on the transparency of the recommendation client–server architecture. The server coordinates the interactions
generation process [20]. between AMLExplainer and the AutoML recommendation tool.
In this paper, we present AMLBID, a transparent, interpretable The client-side scripts manage the visual user interfaces including
and auto-explainable meta-learning based tool [21] that iden- the visualization of data summaries on multiple levels of the
tifies the optimal or near-optimal ML configuration for a given recommended models. Meanwhile, AMLExplainer guides the
problem. It also explains the rationale traceability behind a rec-
ommendation. The tool, as a decision support system, is able 1 https://www.openml.org/.
to simulate the role of the ML expert because it is based on 2 https://archive.ics.uci.edu/.
meta-learning approach. 3 https://www.kaggle.com/.

2
Moncef Garouani, Adeel Ahmad, Mourad Bouneffa et al. SoftwareX 17 (2022) 100919

Fig. 1. Workflow of the white-box internal structures of AutoML.

Table 1
The configuration of ML algorithms and hyperparameters as tuned in the current experiments.
ML algorithm Tuned hyperparameters
Logistic Regression (LR) C, Penalty, Fit_intercept, Dual
Stochastic Gradient Descent (SGD) Loss, Penalty, Alpha, Learning_rate,
Fit_intercept, L1_ratio, Eta0, Power_t
Support Vector Classifier (SVM) Kernel, C, Gamma, Degree, Coef0
Decision Tree (DT) Min_simple_leaf, Min_simple_split, Criterion,
Max_features
Random Forest (RF) & Extra N_estimators, Min_simple_split, Max_features,
Trees (ET) Min_simple_leaf, Min_weight_fraction_leaf
AdaBoost (AB) N_estimators, Learning_rate, Algorithm,
Base_estimator_max_depth
Gradient Boosting (GB) N_estimators, Learning_rate, Criterion, Loss,
Max_depth, Min_simple_leaf, Min_simple_split

end-users to improve the predictive performances, in case of The knowledge-base is continuously improved by running
the unsatisfying results. Hence, it can increase the transparency, more tasks. It makes AMLBID smarter by achieving more
controllability, and reliability of AutoML DSS. experience, based on the growing knowledge-base.
• It provides assistance when AutoML returns unsatisfying re-
2.2. Software functionalities sults, in order to improve the predictive performances. That
is achieved by assessing the importance and the correlation
The current version of AMLBID is available on the PyPI pack- among the algorithm hyperparameters.
age index in form of a Python-package4 to facilitate its distri- • The framework is equipped with an explanation module,
bution and use. It presents a meta-learning based framework which allows the end-user to understand the diagnostic
with major objective to automate the process of algorithm selec- design of the returned ML models using various explanation
tion and the tuning of hyperparameters in supervised ML along techniques. In particular, the explanation artifact allows the
with rational traceability. The available literature witness that the
end-user to:
majority of state-of-the-art tools evaluate a set of pipelines by
actually executing them on a given dataset prior to the recom- – investigate the reasoning behind the AutoML recom-
mendation. It is observed that such executions may require con- mendation generation process,
siderable computing time while consuming precious resources as – and explore the predictions of a recommendation in
per their availability [22]. The proposed system (AMLBID) imme- a trustful manner, through linked visual summaries in
diately produces a list of potential top-ranked pipelines using its form of graphical, tabular, or textual information for a
knowledge base at an imperceptible computational time, hence higher trust.
it notably economizes the cost of resources and their provisional
availability. Therefore, AMLBID enables the end users to ask a series of
The available version of AMLBID in its present form supports what-if scenarios while probing the opportunities to use
08 different classification algorithms from the popular Python- predictive models. It can improve outcomes and reduce
based ML library Scikit-learn. The Table 1 gives the detailed costs for various tasks such as the dependencies of classical
description of the supported algorithms and the tuned hyperpa- collaborations of domain experts and data-scientists.
rameters.
Broadly, AMLBID is an interactive tool to guide the end-users 3. Illustrative examples
for improving the utility and usability of the AutoML process with
the following salient features: AMLBID broadly has two major modules, the AMLBID_
Recommendation module and the AMLBID_Explainer module.
• It automatically (and accurately) selects the most appropri- The AMLBID_Recommendation module recommends and builds
ate ML algorithm (s) with related hyperparameters config- highly-tuned ML pipelines, whereas, the AMLBID_Explainer
uration through the use of a collaborative knowledge-base. module is used to intercept the inner working of the generated
pipeline (s). In the following sections we discuss the functionality
4 https://pypi.org/project/AMLBID/. of these modules in further detail.
3
Moncef Garouani, Adeel Ahmad, Mourad Bouneffa et al. SoftwareX 17 (2022) 100919

3.1. The recommendation module 1 from AMLBID . recommender import AMLBID_Recommender

2 from AMLBID . e x p l a i n e r import AMLBID_Explainer
3 from AMLBID . loader import ∗
Listing 1 summarizes the interactions required to use AML- 4
BID in order to recommend a pipeline. Subsequently, AMLBID at- 5 #Load d a t a s e t
tributes a score to the chosen pipelines and export the best 6 Data , X_train , Y_train , X_test , Y _ t e s t = load_data ( " Dataset . csv " )
7
pipeline to a dynamically stored .py file. 8 #Generate the optimal c o n f i g u r a t i o n s
As shown on line 5, we define the root directory of the dataset 9 model , c o n f i g =AMLBID_Recommender . recommend( Data ,
to be loaded. The recommend function (as shown on line 8) ini- 10 metric = " Accuracy " ,
11 mode= " Recommender_Explainer " )
tializes the meta-learning process to find the highest-scoring
12 model . f i t ( X_train , Y _ t r a i n )
pipeline according to the desired performance criteria. Then, the 13
recommended pipeline is trained on the train-set of the provided 14 #Generate the i n t e r a c t i v e explanatory dash
15 E x p l a i n e r = AMLBID_Explainer . ex p l ai n ( model , config ,
samples (as shown on line 10). After the execution of this code,
16 X_test , Y _ t e s t )
Recommended_pipeline.py (for instance, as given in listing 2) 17 E x p l a i n e r . dash ( )
is generated dynamically using the export function (as shown
on line 15). It contains the corresponding Python code for the Listing 3: Illustrative code example of
optimized pipeline. recommendation_explainer module.
1 from AMLBID . recommender import AMLBID_Recommender
2 from AMLBID . loader import ∗
3
4. Impact
4 #Load d a t a s e t
5 Data , X_train , Y_train , X_test , Y _ t e s t = load_data ( " Dataset . csv " ) The ML modeling process generally operates as a highly iter-
6
7 #Generate the optimal c o n f i g u r a t i o n
ative exploratory process. In reality, there is no one-size-fits-all
8 model=AMLBID_Recommender . recommend( Data , metric = " Accuracy " , model solution, i.e, there does not exist a single model or al-
9 mode= "Recommender" ) gorithm which can be used to achieve the highest accuracy for
10 model . f i t ( X_train , Y _ t r a i n )
all datasets in a certain application domain. On that account,
11
12 p r i n t ( model . score ( X_test , Y _ t e s t ) ) undertaking a large number of ML algorithms with different
13 hyperparameters configurations would not yield a practical so-
14 #Export c o n f i g u r a t i o n ’ s corresponding Python code lution, rather it would be an inefficient, tedious, and time con-
15 model . export ( ’ Recommended_pipeline . py ’ )
suming process. In this context, the application of AMLBID is
Listing 1: Illustrative code example of recommendation module. twofold: primarily it makes possible for non-data science special-
ists (practitioners and researchers) to build robust ML pipelines
1 import numpy as np without the need for specialist assistance or intervention and
2 import pandas as pd even having to write a single line of code. Subsequently, the
3 from s k l e a r n . t r e e import D e c i s i o n T r e e C l a s s i f i e r white-box specificity of the proposed AutoML tool makes it pos-
4 from s k l e a r n . metrics import c l a s s i f i c a t i o n _ r e p o r t
5 from s k l e a r n . model_selection import t r a i n _ t e s t _ s p l i t
sible to interactively inspect the inner-workings of the ML pre-
6 dictive models without having to depend on a data scientist to
7 data = pd . read_csv ( " Dataset . csv " ) generate and interpret all the extreme plots and tables.
8
The main objective of the AMLBID has been focused towards
9 X = data . drop ( ’ c l a s s ’ , a x i s =1)
10 Y = data [ ’ c l a s s ’ ] the design of a decision support system in order to support
11 the non-expert practitioners and researchers. Prospectively, we
12 X_train , X_test , Y_train , Y _ t e s t = t r a i n _ t e s t _ s p l i t (X , Y , intend its use in the domain of industry 4.0 [24] to take maximum
t e s t _ s i z e = 0 . 3 , random_state =42)
13
benefit of ML techniques to optimize the automated manufac-
14 model= D e c i s i o n T r e e C l a s s i f i e r ( c r i t e r i o n = ’ entropy ’ , turing processes. In our previous works [25,26], we studied the
15 max_features =0.5672564 , effectiveness of the recommendation module for the selection
16 min_samples_leaf =5 ,
17 min_samples_split =20)
and parameterization of ML for problems concerning the man-
18 ufacturing industry. The evaluation results respond the basic re-
19 model . f i t ( X_train , Y _ t r a i n ) search question that is how some ML oriented manufacturing works
20
could be further improved, simply through the use of a better ML
21 Y_pred = model . p r e d i c t ( X _ t e s t )
22 score = model . score ( X_test , Y _ t e s t ) algorithm configuration using the AMLBID. Since AMLBID is built
23 upon the meta-learning concept, in the broader sense, it is not
24 p r i n t ( c l a s s i f i c a t i o n _ r e p o r t ( Y_test , Y_pred ) ) only beneficial for the manufacturing actors and researchers but
25 p r i n t ( ’ P i p e l i n e t e s t accuracy : %.3 f ’ % score )
also for many other areas. Nevertheless, the AMLBID is useful for
Listing 2: Generated python file. academic purposes, helping academia to build and understand ML
predictive models behavior.

3.2. The explainer module 5. Conclusion

The AMLBID_Explainer module allows users to inspect the The machine learning based applications are increasingly de-
insights of the recommendation module and the decision gen- sired due to their robustness for the large data analysis. Also,
eration process. Its use is illustrated in listing 3. It provides ex- they can rapidly integrate ‘‘off-the-shelf" solutions in multiple
planations on several levels of abstraction like the importance of areas. However, the non-expert data analysts are more inclined
features and the contribution of features to the individual predic- to adapt the ML based solutions that are more easily persuad-
tions (with the help of SHAP tool [23] that finds the shapely values able, among diverse algorithms, with the help of their rational
of a contribution for some prediction), ‘‘what-if’’ analysis, visual- traceability. We argue that the adaptability of the powerful de-
ization of individual decision path, the weight of hyperparame- cision support systems based on the ML based solutions can be
ters, and correlations. A partial vision of the AMLBID_Explainer further enhanced with the help of comprehensive instructions
component is shown in Fig. 2. regarding the recommended pipelines and their insights. Thus,
4
Moncef Garouani, Adeel Ahmad, Mourad Bouneffa et al. SoftwareX 17 (2022) 100919

Fig. 2. Functional dashboard of AMLBID_Explainer showing the decision path of a predicted instance.

making them more trustworthy instead of black-box solutions. References

In this work, we present AMLBID, a novel transparent and auto-
explained AutoML support system. It also includes an interactive [1] Millman N. Big data to unlock value from the industrial internet of things.
ComputerWeekly.Com 2015. https://www.computerweekly.com/opinion/
visualization module (AMLBID_Explainer) that allows ML ex- Big-data-to-unlock-value-from-the-Industrial-Internet-of-Things.
perts and neophytes to easily inspect and analyze the automatic [2] Tao F, Qi Q, Liu A, Kusiak A. Data-driven smart manufacturing. J Manuf
results of an AutoML decision support system. Syst 2018;48:157–69. http://dx.doi.org/10.1016/j.jmsy.2018.01.006, Special
We propose to use the general explanation methods of Au- Issue on Smart Manufacturing.
[3] Cohen-Shapira N, Rokach L, Shapira B, Katz G, Vainshtein R. AutoGRD:
toML systems; the advantage of AMLBID on account of perfor-
model recommendation through graphical dataset representation. In: Pro-
mance comes from the remarkable time gain and prevention of ceedings of the 28th ACM International Conference on Information and
resource bottlenecks. In this context, the contemporary ML based Knowledge Management. New York, NY, USA; 2019, p. 821–30. http:
solutions are required to train multiple ML models from scratch //dx.doi.org/10.1145/3357384.3357896.
[4] Muñoz MA, Sun Y, Kirley M, Halgamuge SK. Algorithm selection for
for the fresh datasets prior to generate the list of recommended
black-box continuous optimization problems: A survey on methods and
pipelines. It competes favorably to the state-of-the-art AutoML challenges. Inform Sci 2015;317:224–45. http://dx.doi.org/10.1016/j.ins.
systems in the fields that are less tolerant to the delays such 2015.05.010.
as the manufacturing industry. Whilst, the AMLBID particularly [5] Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F.
Efficient and robust automated machine learning. In: Proceedings of the
addresses this challenge to promptly produce a list of potential
28th international conference on neural information processing systems,
top-ranked ML configurations using its collaborative knowledge vol. 2. 2015, p. 2755–63.
base. In practice, the confidentiality of the analyzed datasets is re- [6] Reif M, Shafait F, Goldstein M, Breuel T, Dengel A. Automatic classifier
spected by the fact that the knowledge-base of AMLBID consists of selection for non-experts. Pattern Anal Appl 2014;17(1):83–96. http://dx.
doi.org/10.1007/s10044-012-0280-z.
the meta-features of datasets and not the actual data. At present,
[7] Hutter F, Kotthoff L, Vanschoren J, editors. Automated machine learning:
we are planning to expand AMLBID to support algorithms of methods, systems, challenges. The springer series on challenges in machine
regression, deep learning and distributed ML libraries (e.g., Spark learning, 2019, http://dx.doi.org/10.1007/978-3-030-05318-5.
ML [27]) since we are dealing with Industrial Big Data. [8] Waring J, Lindvall C, Umeton R. Automated machine learning: Review
of the state-of-the-art and opportunities for healthcare. Artif Intell Med
2020;104:101822. http://dx.doi.org/10.1016/j.artmed.2020.101822.
Declaration of competing interest [9] Drori I, Krishnamurthy Y, Rampin R, Lourenço R, One J, Cho K, Silva C,
Freire J. AlphaD3M: Machine learning pipeline synthesis. In: AutoML
workshop at ICML. 2018.
The authors declare that they have no known competing finan- [10] Hutter F, Hoos HH, Leyton-Brown K. Sequential model-based optimiza-
cial interests or personal relationships that could have appeared tion for general algorithm configuration. In: Coello CAC, editor. Learning
to influence the work reported in this paper. and intelligent optimization. Lecture notes in computer science, Berlin,
Heidelberg; 2011, p. 507–23. http://dx.doi.org/10.1007/978-3-642-25566-
3.
Acknowledgments [11] Laadan D, Vainshtein R, Curiel Y, Katz G, Rokach L. RankML: a meta
learning-based approach for pre-ranking machine learning pipelines. 2019,
arXiv:1911.00108.
The authors thank the Université du Littoral Côte d’Opale (ULCO),
[12] Olson RS, Moore JH. TPOT: a tree-based pipeline optimization tool for
France, School of engineering’s and business’ sciences and tech- automating machine learning. In: Hutter F, Kotthoff L, Vanschoren J,
nics (HESTIM), Morocco and CNRST Morocco for the partial fi- editors. Automated machine learning: methods, systems, challenges. The
nancial support, and the CALCULCO computing platform, sup- springer series on challenges in machine learning, 2019, p. 151–60. http:
//dx.doi.org/10.1007/978-3-030-05318-5.
ported by SCoSI/ULCO (Service COmmun du Système d’Information
[13] Li L, Jamieson KG, DeSalvo G, Rostamizadeh A, Talwalkar A. Efficient
de l’Université du Littoral Côte d’Opale) for the computational hyperparameter optimization and infinitely many armed bandits, vol. 16.
facilities. 2016, CoRR, Abs/1603.06560.

5
Moncef Garouani, Adeel Ahmad, Mourad Bouneffa et al. SoftwareX 17 (2022) 100919

[14] Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K. Auto-WEKA: [22] Garouani M, Ahmad A, Bouneffa M, Hamlich M, Bourguin G,
automatic model selection and hyperparameter optimization in WEKA. Lewandowski A. Towards big industrial data mining through explainable
In: Hutter F, Kotthoff L, Vanschoren J, editors. Automated machine learn- automated machine learning. 2021, http://dx.doi.org/10.21203/rs.3.rs-
ing: methods, systems, challenges. The springer series on challenges in 755783/v1.
machine learning, Cham; 2019, p. 81–95. http://dx.doi.org/10.1007/978-3- [23] Lundberg SM, Lee S-I. A unified approach to interpreting model predictions.
030-05318-5. In: Proceedings of the 31st international conference on neural information
[15] RapidMiner, data science & machine learning platform. https://rapidminer. processing systems. 2017, p. 4768–77, arXiv:1705.07874.
com. [24] Tao F, Qi Q, Liu A, Kusiak A. Data-driven smart manufacturing. In:
[16] H2O.ai, AI cloud platform. https://www.h2o.ai/. Special Issue on Smart Manufacturing, J Manuf Syst In: Special Issue
on Smart Manufacturing, 2018;48:157–69.http://dx.doi.org/10.1016/j.jmsy.
[17] DataRobot, AI cloud - the next generation of AI. https://www.datarobot.
2018.01.006,
com/.
[25] Garouani M, Ahmad A, Bouneffa M, Lewandowski A, Bourguin G,
[18] Machine Learning avec MATLAB. https://fr.mathworks.com/solutions/
Hamlich M. Towards the automation of industrial data science: a
machine-learning.html.
meta-learning based approach. In: 23rd international conference on en-
[19] Guyon I, Sun-Hosoya L, Boullé M, Escalante HJ, Escalera S, Liu Z, Jajetic D, terprise information systems. 2021, p. 709–16. http://dx.doi.org/10.5220/
Ray B, Saeed M, Sebag M, Statnikov A, Tu W-W, Viegas E. Analysis of 0010457107090716.
the automl challenge series 2015– 2018. In: Automated machine learning: [26] Garouani M, Hamlich M, Ahmad A, Bouneffa M, Bourguin G,
methods, systems, challenges. 2019, p. 177–219. http://dx.doi.org/10.1007/ Lewandowski A. Towards an automatic assistance framework for the
978-3-030-05318-5. selection and configuration of machine-learning-based data analytics
[20] Samek W, Müller K-R. Towards explainable artificial intelligence. In: solutions in industry 4.0. In: The fifth international conference on big
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. data and internet of things. [in press].
2019, p. 5–22. http://dx.doi.org/10.1007/978-3-030-28954-6. [27] Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J,
[21] Lemke C, Budka M, Gabrys B. Metalearning: A survey of trends and Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I.
technologies. Artif Intell Rev 2015;44(1):117–30. http://dx.doi.org/10.1007/ Apache spark: A unified engine for big data processing. Commun ACM
s10462-013-9406-y. 2016;59(11):56–65. http://dx.doi.org/10.1145/2934664.

Machine Learning Review and Trends
No ratings yet
Machine Learning Review and Trends
75 pages
Big Data Machine Learning Using Apache Spark MLlib
No ratings yet
Big Data Machine Learning Using Apache Spark MLlib
7 pages
Towards Big Industrial Data Mining Through Explainable Automated Machine Learning
No ratings yet
Towards Big Industrial Data Mining Through Explainable Automated Machine Learning
20 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
Data Driven Manufacturing Web in Ar 2023
No ratings yet
Data Driven Manufacturing Web in Ar 2023
16 pages
Big Data Machine Learning Using Apache S
No ratings yet
Big Data Machine Learning Using Apache S
7 pages
Machine Learning Tools and Toolkits in The Explora
No ratings yet
Machine Learning Tools and Toolkits in The Explora
7 pages
BOAT A Bayesian Optimization AutoML Time-Series Framework For Industrial Applications
No ratings yet
BOAT A Bayesian Optimization AutoML Time-Series Framework For Industrial Applications
8 pages
Big Data ML Algorithms Compared
No ratings yet
Big Data ML Algorithms Compared
21 pages
Production ML Pipelines With TensorFlow Extended - TFX - Presentation
No ratings yet
Production ML Pipelines With TensorFlow Extended - TFX - Presentation
234 pages
Ashish Kumar Singh-Research-Proposal
No ratings yet
Ashish Kumar Singh-Research-Proposal
3 pages
Whitepaper v1
No ratings yet
Whitepaper v1
24 pages
Automated Machine Learning A Survey of Tools and T
No ratings yet
Automated Machine Learning A Survey of Tools and T
6 pages
Annex 1 - Description of The Action (Part B)
No ratings yet
Annex 1 - Description of The Action (Part B)
79 pages
Enabling Automated Machine Learning For Model-Driven AI Engineering
No ratings yet
Enabling Automated Machine Learning For Model-Driven AI Engineering
5 pages
BD CH-5 PT2
No ratings yet
BD CH-5 PT2
15 pages
FDP AIML Day1 Part1
No ratings yet
FDP AIML Day1 Part1
61 pages
ABES Presentation
No ratings yet
ABES Presentation
91 pages
A New Platform For Distributed
No ratings yet
A New Platform For Distributed
19 pages
AI Infrastructure 101
No ratings yet
AI Infrastructure 101
8 pages
Machine Learning with Small Data
No ratings yet
Machine Learning with Small Data
22 pages
Tackling Big Data Using Matlab
No ratings yet
Tackling Big Data Using Matlab
73 pages
Lecture 2 Deep Learning
No ratings yet
Lecture 2 Deep Learning
24 pages
Machine Learning with Python Guide
No ratings yet
Machine Learning with Python Guide
7 pages
AI and Machine Learning Basics
No ratings yet
AI and Machine Learning Basics
46 pages
ML Resources CW 2025
No ratings yet
ML Resources CW 2025
5 pages
19 No-Code Data Science Tools
No ratings yet
19 No-Code Data Science Tools
8 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
Machine Learning On Big Data: Opportunities and Challenges: Version of Record
No ratings yet
Machine Learning On Big Data: Opportunities and Challenges: Version of Record
27 pages
20+ Open Source Tools For ML Enthusiasts
No ratings yet
20+ Open Source Tools For ML Enthusiasts
8 pages
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
No ratings yet
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
29 pages
Machine Learning With Spark Nick Pentreath Available Instanly
No ratings yet
Machine Learning With Spark Nick Pentreath Available Instanly
147 pages
PDS Labmanualword
No ratings yet
PDS Labmanualword
32 pages
Unit 2 Data Science
No ratings yet
Unit 2 Data Science
12 pages
Machine Learning in Python Main Developments and T
100% (1)
Machine Learning in Python Main Developments and T
44 pages
DA Python Env Intro
No ratings yet
DA Python Env Intro
47 pages
10 Underrated Python Libraries That Can Take Your Machine Learning Game To The Next Level
No ratings yet
10 Underrated Python Libraries That Can Take Your Machine Learning Game To The Next Level
12 pages
Notes PDF ML Day 3
No ratings yet
Notes PDF ML Day 3
14 pages
Unit 4
No ratings yet
Unit 4
28 pages
ML Ops
100% (1)
ML Ops
19 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
10 pages
CII Connect Madurai 2018 - Industry Problem Statement
No ratings yet
CII Connect Madurai 2018 - Industry Problem Statement
64 pages
ML System Architecture Guide
No ratings yet
ML System Architecture Guide
47 pages
REF-10-Automated Machine Learning The New Wave of Machine Learning
No ratings yet
REF-10-Automated Machine Learning The New Wave of Machine Learning
8 pages
Machine Learning Python
No ratings yet
Machine Learning Python
48 pages
Building Machine Learning Systems With A Feature Store Batch, Real-Time, and LLM Systems Early Release Jim
No ratings yet
Building Machine Learning Systems With A Feature Store Batch, Real-Time, and LLM Systems Early Release Jim
84 pages
Machine Learning and Hadoop
No ratings yet
Machine Learning and Hadoop
26 pages
Intro To Machine Learning With Apache Cassandra and Apache Spark
No ratings yet
Intro To Machine Learning With Apache Cassandra and Apache Spark
80 pages
Lecture 6 - Spark ML
No ratings yet
Lecture 6 - Spark ML
31 pages
Naukri TejaswihiAhirkar (4y 0m)
No ratings yet
Naukri TejaswihiAhirkar (4y 0m)
2 pages
FL AutoML ModelBuilder EN
No ratings yet
FL AutoML ModelBuilder EN
6 pages
1DataScience MachineLearning AI Syllabus.-1.PDF 20240118 174213 0000
No ratings yet
1DataScience MachineLearning AI Syllabus.-1.PDF 20240118 174213 0000
9 pages
Batch Vs Online ML: Wednesday, March 17, 2021 5:30 PM
No ratings yet
Batch Vs Online ML: Wednesday, March 17, 2021 5:30 PM
436 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
383 pages
Art. Oscar Cabellos
No ratings yet
Art. Oscar Cabellos
5 pages
Predicting UCS of AAS-Based CPB
No ratings yet
Predicting UCS of AAS-Based CPB
13 pages
A Rockmass System Approach To Prediction of Ground Performance in Underground Mines
No ratings yet
A Rockmass System Approach To Prediction of Ground Performance in Underground Mines
199 pages
3D Rock Joint Photogrammetry Guide
No ratings yet
3D Rock Joint Photogrammetry Guide
16 pages
Optimizing RAC Strength Prediction
No ratings yet
Optimizing RAC Strength Prediction
17 pages
Hybrid ML Models for Scour Depth Prediction
No ratings yet
Hybrid ML Models for Scour Depth Prediction
16 pages
2022 Optimization of Random Forest Through The Use of MVO, GWO and MFO in Evaluating The Stability of Underground Entry-Type Excavations
No ratings yet
2022 Optimization of Random Forest Through The Use of MVO, GWO and MFO in Evaluating The Stability of Underground Entry-Type Excavations
22 pages
2024 Rockburst Prediction and Prevention in Underground Space Excavation
No ratings yet
2024 Rockburst Prediction and Prevention in Underground Space Excavation
29 pages
Construction and Building Materials: Jian Zhou, Peixi Yang, Chuanqi Li, Kun Du
No ratings yet
Construction and Building Materials: Jian Zhou, Peixi Yang, Chuanqi Li, Kun Du
16 pages
Optimizing Tunnel Squeezing Prediction
No ratings yet
Optimizing Tunnel Squeezing Prediction
24 pages
Predicting Blast-Induced Rock Movement
No ratings yet
Predicting Blast-Induced Rock Movement
23 pages
Geology: Magma & Igneous Rocks
No ratings yet
Geology: Magma & Igneous Rocks
31 pages
2022 Predicting Clay Compressibility Using A Novel Manta Ray Foraging Optimization-Based Extreme Learning Machine Model
No ratings yet
2022 Predicting Clay Compressibility Using A Novel Manta Ray Foraging Optimization-Based Extreme Learning Machine Model
16 pages
2022 Developing Hybrid ELM-ALO, ELM-LSO and ELM-SOA Models For Predicting Advance Rate of TBM
No ratings yet
2022 Developing Hybrid ELM-ALO, ELM-LSO and ELM-SOA Models For Predicting Advance Rate of TBM
12 pages
2022 Prediction of Blasting Induced Air-Overpressure Using A Radial Basis Function Network With An Additional Hidden Layer
No ratings yet
2022 Prediction of Blasting Induced Air-Overpressure Using A Radial Basis Function Network With An Additional Hidden Layer
14 pages
O RAN - WG1.Use Cases Analysis Report R003 v14.00
No ratings yet
O RAN - WG1.Use Cases Analysis Report R003 v14.00
90 pages
DHCP Client DORA Process
No ratings yet
DHCP Client DORA Process
3 pages
Electromechanical Materials Testing Machines EM1 MICROTEST
No ratings yet
Electromechanical Materials Testing Machines EM1 MICROTEST
8 pages
Rethinking Reusability in Vue Sample 1.1.0
No ratings yet
Rethinking Reusability in Vue Sample 1.1.0
36 pages
FKKINV Account Assignment of The Business Area in The Business Partner Items
No ratings yet
FKKINV Account Assignment of The Business Area in The Business Partner Items
3 pages
RTS/OTS810 Series Total Station
No ratings yet
RTS/OTS810 Series Total Station
2 pages
Omnik Solar Solutions Guide
100% (1)
Omnik Solar Solutions Guide
28 pages
Lab 28
No ratings yet
Lab 28
6 pages
SROS Upgrade Process - Single RP
No ratings yet
SROS Upgrade Process - Single RP
11 pages
DEV-C++ and OPENGL (For MS Windows 98/NT/2000/XP) : Installation
No ratings yet
DEV-C++ and OPENGL (For MS Windows 98/NT/2000/XP) : Installation
4 pages
3 9AKK101130D9837 E FAU810 Product Data Sheet
No ratings yet
3 9AKK101130D9837 E FAU810 Product Data Sheet
14 pages
XLS140-2 Specsheet
No ratings yet
XLS140-2 Specsheet
8 pages
Why Post Processing Tool Is Mandatory in Telecom
No ratings yet
Why Post Processing Tool Is Mandatory in Telecom
6 pages
Wa0009.
No ratings yet
Wa0009.
38 pages
DGFT Regional Offices Contact List
No ratings yet
DGFT Regional Offices Contact List
7 pages
Punch 2d Design
No ratings yet
Punch 2d Design
1 page
Star vs Snowflake Schema Explained
No ratings yet
Star vs Snowflake Schema Explained
4 pages
StudyBuddyAI Report
No ratings yet
StudyBuddyAI Report
19 pages
Sinda: Advanced Thermal Simulation
No ratings yet
Sinda: Advanced Thermal Simulation
2 pages
JYoungPharm 10-3-313
No ratings yet
JYoungPharm 10-3-313
5 pages
Les Fonctions Du Transfert: M CSMV
No ratings yet
Les Fonctions Du Transfert: M CSMV
10 pages
Fortinet NAC Quiz Review
50% (2)
Fortinet NAC Quiz Review
2 pages
Sqflite
No ratings yet
Sqflite
6 pages
Bot2 Manual
43% (21)
Bot2 Manual
78 pages
How To Create Pine Script in Tradingview
No ratings yet
How To Create Pine Script in Tradingview
2 pages
AWS Certified Security - Specialty SCS-C02 Exam - Free Exam Q&as, Page 1 - ExamTopics - PDF 201-250
100% (1)
AWS Certified Security - Specialty SCS-C02 Exam - Free Exam Q&as, Page 1 - ExamTopics - PDF 201-250
25 pages
3M Clean-Trace Hygiene Management Software User Manual
No ratings yet
3M Clean-Trace Hygiene Management Software User Manual
45 pages
Systems - Architecture - L1 - Teacher - PowerPoint - v1
No ratings yet
Systems - Architecture - L1 - Teacher - PowerPoint - v1
16 pages
Manual VE - Bus Firmware Versions Explained en
No ratings yet
Manual VE - Bus Firmware Versions Explained en
7 pages
Pharmacy Report
No ratings yet
Pharmacy Report
34 pages

AutoML Tool for Big Industrial Data

Uploaded by

AutoML Tool for Big Industrial Data

Uploaded by

SoftwareX 17 (2022) 100919

Contents lists available at ScienceDirect

Original software publication

AMLBID: An auto-explained Automated Machine Learning tool for Big

Current code version v0.1

Fig. 1. Workflow of the white-box internal structures of AutoML.

3.1. The recommendation module 1 from AMLBID . recommender import AMLBID_Recommender

3.2. The explainer module 5. Conclusion

making them more trustworthy instead of black-box solutions. References

You might also like