0% found this document useful (0 votes)

31 views3 pages

Chapter 9

Chapter of APM

Uploaded by

karthyfsu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views3 pages

Chapter 9

Chapter of APM

Uploaded by

karthyfsu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Chapter 9

A Summary of Solubility Models

Across the last few chapters, a variety of models have been fit to the solubility
data set. How do the models compare for these data and which one should
be selected for the final model? Figs. 9.1 and 9.2 show scatter plots of the
performance metrics calculated using cross-validation and the test set data.
With the exception of poorly performing models, there is a fairly high
correlation between the results derived from resampling and the test set (0.9
for the RMSE and 0.88 for R2 ). For the most part, the models tend to rank
order similarly. K-nearest neighbors were the weakest performer, followed by
the two single tree-based methods. While bagging these trees did help, it
did not make the models very competitive. Additionally, conditional random
forest models had mediocre results.
There was a “pack” of models that showed better results, including model
trees, linear regression, penalized linear models, MARS, and neural networks.
These models are more simplistic but would not be considered interpretable
given the number of predictors involved in the linear models and the com-
plexity of the model trees and MARS. For the most part, they would be
easy to implement. Recall that this type of model might be used by a phar-
maceutical company to screen millions of potential compounds, so ease of
implementation should not be taken lightly.
The group of high-performance models include support vector machines
(SVMs), boosted trees, random forests, and Cubist. Each is essentially a
black box with a highly complex prediction equation. The performance of
these models is head and shoulders above the rest so there is probably some
value in finding computationally efficient implementations that can be used
to predict large numbers of new samples.
Are there any real differences between these models? Using the resampling
results, a set of confidence intervals were constructed to characterize the
differences in RMSE in the models using the techniques shown in Sect. 4.8.
Figure 9.3 shows the intervals. There are very few statistically significant

M. Kuhn and K. Johnson, Applied Predictive Modeling, 221

DOI 10.1007/978-1-4614-6849-3 9,
© Springer Science+Business Media New York 2013
222 9 A Summary of Solubility Models

Cross−Validation Test Set

Cubist
Boosted Tree
SVMp
SVMr
Random Forest
Elastic Net
Neural Net
MARS
Ridge
PLS
Linear Reg.
M5
Bagged Cond. Tree
Cond. Random Forest
Bagged Tree
Tree
Cond. Tree
KNN

0.75 0.80 0.85 0.90

R−Squared

Fig. 9.1: A plot of the R2 solubility models estimated by 10-fold cross-

validation and the test set

Cross−Validation Test Set

KNN
Cond. Tree
Tree
Bagged Tree
Cond. Random Forest
Bagged Cond. Tree
M5
Linear Reg.
PLS
Ridge
MARS
Neural Net
Elastic Net
Random Forest
SVMr
SVMp
Boosted Tree
Cubist
0.6 0.7 0.8 0.9 1.0 1.1
RMSE

Fig. 9.2: A plot of the RMSE solubility models estimated by 10-fold cross-
validation and the test set
9 A Summary of Solubility Models 223

rf − SVMr

rf − gbm

rf − cubist

gbm − SVMr

gbm − cubist

cubist − SVMr

−0.05 0.00 0.05 0.10

Difference in RMSE
Confidence Level 0.992 (multiplicity adjusted)

Fig. 9.3: Conﬁdence intervals for the diﬀerences in RMSE for the high-
performance models

differences. Additionally, most of the estimated mean differences are less than
0.05 log units, which are not scientifically meaningful. Given this, any of these
models would be a reasonable choice.

Application of Predictive Analytics in Volume Forecasting and Resource Planning
No ratings yet
Application of Predictive Analytics in Volume Forecasting and Resource Planning
69 pages
Machine Learning: Random Forests & Regression
No ratings yet
Machine Learning: Random Forests & Regression
26 pages
Vol 14 SI Dec 2021 355-363
No ratings yet
Vol 14 SI Dec 2021 355-363
9 pages
Stock Price Trend Forecasting Using Supervised Learning Methods
No ratings yet
Stock Price Trend Forecasting Using Supervised Learning Methods
2 pages
Will We Ever Be Able To Accurately Predict Solubility?: Analysis
No ratings yet
Will We Ever Be Able To Accurately Predict Solubility?: Analysis
26 pages
R Swot
No ratings yet
R Swot
10 pages
Week 4 R Programming Model Validation
No ratings yet
Week 4 R Programming Model Validation
5 pages
Sustainable Analysis of COVID-19 Co-Packaged Paxlovid: Exploring Advanced Sampling Techniques and Multivariate Processing Tools
No ratings yet
Sustainable Analysis of COVID-19 Co-Packaged Paxlovid: Exploring Advanced Sampling Techniques and Multivariate Processing Tools
24 pages
R Markdown for Data Analysts
No ratings yet
R Markdown for Data Analysts
27 pages
Course PDF
No ratings yet
Course PDF
44 pages
Session 2 On Random Forest
No ratings yet
Session 2 On Random Forest
11 pages
Analysis Complex Samples 131108
No ratings yet
Analysis Complex Samples 131108
31 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Performance Issue - Machine Learning To The Rescue Maarten Smeets
No ratings yet
Performance Issue - Machine Learning To The Rescue Maarten Smeets
49 pages
Spss
100% (1)
Spss
35 pages
Chapter 9 Multivariate Regression Tree - Workshop 10 - Advanced Multivariate Analyses in R
No ratings yet
Chapter 9 Multivariate Regression Tree - Workshop 10 - Advanced Multivariate Analyses in R
14 pages
SASprimer
No ratings yet
SASprimer
125 pages
AMTA Assignment AMTA B (Aswin Avni Navya)
No ratings yet
AMTA Assignment AMTA B (Aswin Avni Navya)
13 pages
A Performance Comparison of Modern Stati
No ratings yet
A Performance Comparison of Modern Stati
12 pages
7047 001 Group 8 Water Potability
No ratings yet
7047 001 Group 8 Water Potability
19 pages
Lecture1 - 2 R Basic
No ratings yet
Lecture1 - 2 R Basic
35 pages
A Very Brief Introduction To R: - Matthew Keller
No ratings yet
A Very Brief Introduction To R: - Matthew Keller
36 pages
Random Forest PDF
No ratings yet
Random Forest PDF
14 pages
CMB Project Report
No ratings yet
CMB Project Report
6 pages
Optimizing R VM: Allocation Removal and Path Length Reduction Via Interpreter-Level Specialization
No ratings yet
Optimizing R VM: Allocation Removal and Path Length Reduction Via Interpreter-Level Specialization
11 pages
A Brief Survey On Random Forest Ensembles in Classification Model
No ratings yet
A Brief Survey On Random Forest Ensembles in Classification Model
8 pages
Ens Embling
No ratings yet
Ens Embling
19 pages
Numerical Analysis of Sulfamerazine Solubility in Acetonitrile + 1-Propanol Cosolvent Mixtures at Different Temperatures
No ratings yet
Numerical Analysis of Sulfamerazine Solubility in Acetonitrile + 1-Propanol Cosolvent Mixtures at Different Temperatures
16 pages
Psikometri
No ratings yet
Psikometri
6 pages
Unit 5
No ratings yet
Unit 5
40 pages
04.0 PP Xiii Xiv Preface To The First Edition
No ratings yet
04.0 PP Xiii Xiv Preface To The First Edition
2 pages
Artikel Algoritma C4.5
No ratings yet
Artikel Algoritma C4.5
7 pages
Forex Market Trends Predictive
No ratings yet
Forex Market Trends Predictive
34 pages
E Lumley
No ratings yet
E Lumley
225 pages
Random Forest
No ratings yet
Random Forest
5 pages
Predictive Modeling
No ratings yet
Predictive Modeling
300 pages
Classifier Evaluation for Researchers
No ratings yet
Classifier Evaluation for Researchers
49 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
R Tutorial Slides
No ratings yet
R Tutorial Slides
13 pages
Predicting Solubility Limits of Organic Solutes For A Wide Range of Solvents and Temperatures - Verneire
No ratings yet
Predicting Solubility Limits of Organic Solutes For A Wide Range of Solvents and Temperatures - Verneire
36 pages
Splicing Explanation
No ratings yet
Splicing Explanation
20 pages
Bank Marketing Data
100% (2)
Bank Marketing Data
14 pages
Solubility Models for API Process Design
No ratings yet
Solubility Models for API Process Design
142 pages
12 PAGES - Random Forest Algorithm, Support Vector Machine For Regression Analysis
No ratings yet
12 PAGES - Random Forest Algorithm, Support Vector Machine For Regression Analysis
12 pages
An Example of Statistical Data Analysis
100% (1)
An Example of Statistical Data Analysis
145 pages
Random Forest
No ratings yet
Random Forest
29 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Stock Price Trend Forecasting Using Supervised Learning Methods
No ratings yet
Stock Price Trend Forecasting Using Supervised Learning Methods
2 pages
Report
No ratings yet
Report
2 pages
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
No ratings yet
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
22 pages
Random Forests: Paper Presentation For CSI5388 Pengcheng Xi Mar. 23, 2005
No ratings yet
Random Forests: Paper Presentation For CSI5388 Pengcheng Xi Mar. 23, 2005
23 pages
PRE HoW01
No ratings yet
PRE HoW01
100 pages
R Programming
100% (4)
R Programming
163 pages
R Programming PDF
No ratings yet
R Programming PDF
163 pages
Lab 8
No ratings yet
Lab 8
4 pages
Comprehensive Report Machine Learning Model Performance and Computational Cost Analysis
No ratings yet
Comprehensive Report Machine Learning Model Performance and Computational Cost Analysis
14 pages
Ab 14
No ratings yet
Ab 14
8 pages
Ashtanga Yoga - The Practice Manual - An Illustrated Guide - (David Swenson Studio Photographs by Raul Marroquin) - 2nd Spiral Edition, August 20, - 9781891252082 - Anna's Archive
No ratings yet
Ashtanga Yoga - The Practice Manual - An Illustrated Guide - (David Swenson Studio Photographs by Raul Marroquin) - 2nd Spiral Edition, August 20, - 9781891252082 - Anna's Archive
268 pages
Alexander: Technique
No ratings yet
Alexander: Technique
164 pages
Vector and Matrix Calculus
100% (2)
Vector and Matrix Calculus
383 pages
Reading Better and Faster
No ratings yet
Reading Better and Faster
12 pages
Retain Information Longer
No ratings yet
Retain Information Longer
17 pages
Dream Sculpting
No ratings yet
Dream Sculpting
61 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
End Sem
No ratings yet
End Sem
104 pages
1D Bar Element Fe Formulation From Gde
No ratings yet
1D Bar Element Fe Formulation From Gde
15 pages
Transmitter/Receiver Code Description Introduction To Digital Communication Receiver Design
No ratings yet
Transmitter/Receiver Code Description Introduction To Digital Communication Receiver Design
14 pages
Control Systems Engineering - I. J. Nagrath and M. Gopal PDF
No ratings yet
Control Systems Engineering - I. J. Nagrath and M. Gopal PDF
81 pages
Polynomial Division & Theorems Module
100% (10)
Polynomial Division & Theorems Module
9 pages
6 BSTs and AVL Trees
No ratings yet
6 BSTs and AVL Trees
12 pages
AI Crash Course For Beginners
No ratings yet
AI Crash Course For Beginners
60 pages
Linear Programming for Manufacturers
No ratings yet
Linear Programming for Manufacturers
3 pages
Quiz 1 Practice: Fourier Series and Transforms: BIOEN 316, Spring 2013 Name
No ratings yet
Quiz 1 Practice: Fourier Series and Transforms: BIOEN 316, Spring 2013 Name
4 pages
Finite Difference Method Guide
No ratings yet
Finite Difference Method Guide
11 pages
Contiguous Memory Allocation
No ratings yet
Contiguous Memory Allocation
5 pages
PCM and Digital Signal Processing
No ratings yet
PCM and Digital Signal Processing
17 pages
MTH601-MidTerm-solved MCQ Mega File 2
No ratings yet
MTH601-MidTerm-solved MCQ Mega File 2
14 pages
Experiment-4 W.A.P To Implement Merge Sort
No ratings yet
Experiment-4 W.A.P To Implement Merge Sort
6 pages
Digital Filter Design
100% (3)
Digital Filter Design
360 pages
DWDM Assignments Fall 24 25
No ratings yet
DWDM Assignments Fall 24 25
4 pages
CLRS Solutions Manual PDF
100% (1)
CLRS Solutions Manual PDF
511 pages
Linear Programming Problem
100% (1)
Linear Programming Problem
27 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Data Structures and Algorithms Final Exam PDF
No ratings yet
Data Structures and Algorithms Final Exam PDF
4 pages
PSD of Modulated Signals
No ratings yet
PSD of Modulated Signals
22 pages
Churn Buster Uncovering Patterns and Predicting Churn in OTT Platforms
No ratings yet
Churn Buster Uncovering Patterns and Predicting Churn in OTT Platforms
6 pages
Links For Competitive Programming
No ratings yet
Links For Competitive Programming
3 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
Digital Image Processing Course KUET
No ratings yet
Digital Image Processing Course KUET
3 pages
Data Sructure and Algrithm
No ratings yet
Data Sructure and Algrithm
12 pages
Chapter 15 - PDF
No ratings yet
Chapter 15 - PDF
10 pages
Digital Control Systems
No ratings yet
Digital Control Systems
4 pages
FinalExam 2005
No ratings yet
FinalExam 2005
16 pages
DSP Sampling for Engineering Students
No ratings yet
DSP Sampling for Engineering Students
66 pages

Chapter 9

Uploaded by

Chapter 9

Uploaded by

Chapter 9

A Summary of Solubility Models

M. Kuhn and K. Johnson, Applied Predictive Modeling, 221

Cross−Validation Test Set

0.75 0.80 0.85 0.90

Fig. 9.1: A plot of the R2 solubility models estimated by 10-fold cross-

Cross−Validation Test Set

−0.05 0.00 0.05 0.10

You might also like