8000 ml-regression plot, first version · plotly/plotly.r-docs@8cd6847 · GitHub
[go: up one dir, main page]

Skip to content

Commit 8cd6847

Browse files
author
Kalpit Desai
committed
ml-regression plot, first version
1 parent eada8db commit 8cd6847

File tree

1 file changed

+184
-0
lines changed

1 file changed

+184
-0
lines changed

r/2021-07-08-ml-regression.Rmd

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
<!-- #region -->
2+
This page shows how to use Plotly charts for displaying various types of regression models, starting from simple models like Linear Regression and progressively move towards models like Decision Tree and Polynomial Features. We highlight various capabilities of plotly, such as comparative analysis of the same model with different parameters, displaying Latex, and [surface plots](https://plotly.com/r/3d-surface-plots/) for 3D data.
3+
4+
We will use [tidymodels](https://tidymodels.tidymodels.org/) to split and preprocess our data and train various regression models. Tidymodels is a popular Machine Learning (ML) library in R that is compatible with the "tidyverse" concepts, and offers various tools for creating and training ML algorithms, feature engineering, data cleaning, and evaluating and testing models. It is the next-gen version of the popular [caret](http://topepo.github.io/caret/index.html) library in R.
5+
6+
<!-- #endregion -->
7+
8+
## Basic linear regression plots
9+
10+
In this section, we show you how to apply a simple regression model for predicting tips a server will receive based on various client attributes (such as sex, time of the week, and whether they are a smoker).
11+
12+
We will be using the [Linear Regression][lr], which is a simple model that fit an intercept (the mean tip received by a server), and add a slope for each feature we use, such as the value of the total bill.
13+
14+
[lr]: https://parsnip.tidymodels.org/reference/linear_reg.html
15+
16+
### Linear Regression with R
17+
18+
```{r}
19+
library(plotly)
20+
data(tips)
21+
22+
y <- tips$tip
23+
X <- tips$total_bill
24+
25+
lm_model <- linear_reg() %>%
26+
set_engine('lm') %>%
27+
set_mode('regression') %>%
28+
fit(tip ~ total_bill, data = tips)
29+
30+
x_range <- seq(min(X), max(X), length.out = 100)
31+
x_range <- matrix(x_range, nrow=100, ncol=1)
32+
x_range <- data.frame(x_range)
33+
colnames(x_range) <- c('total_bill')
34+
35+
y_range <- lm_model %>% predict(x_range)
36+
37+
colnames(y_range) <- c('tip')
38+
xy <- data.frame(x_range, y_range)
39+
40+
fig <- plot_ly(tips, x = ~total_bill, y = ~tip, type = 'scatter', alpha = 0.65, mode = 'markers', name = 'Tips')
41+
fig <- fig %>% add_trace(data = xy, x = ~total_bill, y = ~tip, name = 'Regression Fit', mode = 'lines', alpha = 1)
42+
fig
43+
```
44+
## Model generalization on unseen data
45+
46+
With `add_trace()`, you can easily color your plot based on a predefined data split. By coloring the training and the testing data points with different colors, you can easily see if whether the model generalizes well to the test data or not.
47+
48+
```{r}
49+
library(plotly)
50+
data(tips)
51+
52+
y <- tips$tip
53+
X <- tips$total_bill
54+
55+
set.seed(123)
56+
tips_split <- initial_split(tips)
57+
tips_training <- tips_split %>%
58+
training()
59+
tips_test <- tips_split %>%
60+
testing()
61+
62+
lm_model <- linear_reg() %>%
63+
set_engine('lm') %>%
64+
set_mode('regression') %>%
65+
fit(tip ~ total_bill, data = tips_training)
66+
67+
x_range <- seq(min(X), max(X), length.out = 100)
68+
x_range <- matrix(x_range, nrow=100, ncol=1)
69+
x_range <- data.frame(x_range)
70+
colnames(x_range) <- c('total_bill')
71+
72+
y_range <- lm_model %>%
73+
predict(x_range)
74+
75+
colnames(y_range) <- c('tip')
76+
xy <- data.frame(x_range, y_range)
77+
78+
fig <- plot_ly(data = tips_training, x = ~total_bill, y = ~tip, type = 'scatter', name = 'train', mode = 'markers', alpha = 0.65) %>%
79+
add_trace(data = tips_test, x = ~total_bill, y = ~tip, type = 'scatter', name = 'test', mode = 'markers', alpha = 0.65 ) %>%
80+
add_trace(data = xy, x = ~total_bill, y = ~tip, name = 'prediction', mode = 'lines', alpha = 1)
81+
fig
82+
```
83+
84+
## Comparing different kNN models parameters
85+
86+
In addition to linear regression, it's possible to fit the same data using [k-Nearest Neighbors][knn]. When you perform a prediction on a new sample, this model either takes the weighted or un-weighted average of the neighbors. In order to see the difference between those two averaging options, we train a kNN model with both of those parameters, and we plot them in the same way as the previous graph.
87+
88+
Notice how we can combine scatter points with lines using Plotly. You can learn more about [multiple chart types](https://plotly.com/r/graphing-multiple-chart-types/).
89+
90+
[knn]: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html
91+
92+
```{r}
93+
library(plotly)
94+
library(kknn)
95+
data(tips)
96+
97+
y <- tips$tip
98+
X <- tips$total_bill
99+
100+
knn_dist <- nearest_neighbor(neighbors = 10, weight_func = 'inv') %>%
101+
set_engine('kknn') %>%
102+
set_mode('regression') %>%
103+
fit(tip ~ total_bill, data = tips)
104+
knn_uni <- nearest_neighbor(neighbors = 10, weight_func = 'rectangular') %>%
105+
set_engine('kknn') %>%
106+
set_mode('regression') %>%
107+
fit(tip ~ total_bill, data = tips)
108+
109+
x_range <- seq(min(X), max(X), length.out = 100)
110+
x_range <- matrix(x_range, nrow=100, ncol=1)
111+
x_range <- data.frame(x_range)
112+
colnames(x_range) <- c('total_bill')
113+
114+
y_dist <- knn_dist %>%
115+
predict(x_range)
116+
y_uni <- knn_uni %>%
117+
predict(x_range)
118+
119+
colnames(y_dist) <- c('dist')
120+
colnames(y_uni) <- c('uni')
121+
xy <- data.frame(x_range, y_dist, y_uni)
122+
123+
fig <- plot_ly(tips, type = 'scatter', mode = 'markers', colors = c("#FF7F50", "#6495ED")) %>%
124+
add_trace(data = tips, x = ~total_bill, y = ~tip, type = 'scatter', mode = 'markers', color = ~sex, alpha = 0.65) %>%
125+
add_trace(data = xy, x = ~total_bill, y = ~dist, name = 'Weights: Distance', mode = 'lines', alpha = 1) %>%
126+
add_trace(data = xy, x = ~total_bill, y = ~uni, name = 'Weights: Uniform', mode = 'lines', alpha = 1)
127+
fig
128+
```
129+
130+
## 3D regression surface with `mesh3d` and `add_surface`
131+
132+
Visualize the decision plane of your model whenever you have more than one variable in your input data. Here, we will use [`svm_rbf`](https://parsnip.tidymodels.org/reference/svm_rbf.html) with [`kernlab`](https://cran.r-project.org/web/packages/kernlab/index.html) engine in `regression` mode. For generating the 2D mesh on the surface, we use the package [`pracma`](https://cran.r-project.org/web/packages/pracma/index.html)
133+
134+
```{r}
135+
library(plotly)
136+
library(kernlab)
137+
library(pracma) #For meshgrid()
138+
data(iris)
139+
140+
mesh_size <- .02
141+
margin <- 0
142+
X <- iris %>% select(Sepal.Width, Sepal.Length)
143+
y <- iris %>% select(Petal.Width)
144+
145+
model <- svm_rbf(cost = 1.0) %>%
146+
set_engine("kernlab") %>%
147+
set_mode("regression") %>%
148+
fit(Petal.Width ~ Sepal.Width + Sepal.Length, data = iris)
149+
150+
x_min <- min(X$Sepal.Width) - margin
151+
x_max <- max(X$Sepal.Width) - margin
152+
y_min <- min(X$Sepal.Length) - margin
153+
y_max <- max(X$Sepal.Length) - margin
154+
xrange <- seq(x_min, x_max, mesh_size)
155+
yrange <- seq(y_min, y_max, mesh_size)
156+
xy <- meshgrid(x = xrange, y = yrange)
157+
xx <- xy$X
158+
yy <- xy$Y
159+
dim_val <- dim(xx)
160+
xx1 <- matrix(xx, length(xx), 1)
161+
yy1 <- matrix(yy, length(yy), 1)
162+
final <- cbind(xx1, yy1)
163+
pred <- model %>%
164+
predict(final)
165+
166+
pred <- pred$.pred
167+
pred <- matrix(pred, dim_val[1], dim_val[2])
168+
169+
dim(pred)
170+
fig <- plot_ly(iris, x = ~Sepal.Width, y = ~Sepal.Length, z = ~Petal.Width ) %>%
171+
add_markers(size = 5) %>%
172+
add_surface(x=xrange, y=yrange, z=pred, alpha = 0.65, type = 'mesh3d', name = 'pred_surface')
173+
fig
174+
175+
```
176+
177+
178+
```{r}
179+
180+
```
181+
182+
```{r}
183+
184+
```

0 commit comments

Comments
 (0)
0