0 ratings0% found this document useful (0 votes) 236 views13 pagesSimplified Machine Learning Crash Course
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
ca
Simplified Machine Learning Crash Course
Before we dive into the different types of machine learning algorithms, let’s briefly talk about what
machine learning is.
1. Introduction to Machine Learning
Machine learning is a field of study that gives computers the ability to learn from data without
being explicitly programmed. In other words, instead of telling a computer exactly what to do, we
give it a bunch of examples and let it figure out the patterns on its own. This is achieved through
the use of algorithms that can learn from and make predictions on data.
There are three main types of machine learning algorithms: supervised learning, unsupervised
Jearning, and reinforcement learning. We'll go over each of these in more detail.
# Types of Machine Learning Algorithms
2, Supervised Learning
Supervised learning is a type of machine learning where the algorithm leams from labeled data.
Labeled data is data that has already been categorized or classified. The goal of supervised learning.
is to use this labeled data to make predictions about new, unseen data
‘There are two main types of supervised learning: regression and classification,
- Regression
Regression is a type of supervised learning where the goal is to predict a continuous value, For
example, we might want to prediet the price of a house based on its fes
of bedrooms, square footage, and location,
sures such as the number
Let’s start by importing some necessary libraries and loading in a dataset that we'll be using for
‘this example.
import pandas as pd
import numpy as ap
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# Load in the dataset
af = pd.read_csv(‘house_prices.csv')
# Split the data into training and testing sets
Xtrain, X_test, y_train, y test = train_test_split(df(['bedrooms’, 'sqft'..
“'location']], df["price'], test_size=0.2)ta
1
In this example, we're using the Linear Regression algorithm from the scikit-lear library, We're
also splitting our data into training and testing sets using the train_test_split() function.
Now, let’s train our model on the training data and make predictions on the testing data.
# Train the model
regressor = LinearRegression()
regressor fit (KX train, y_train)
# Make predictions on the testing data
predictions = regressor predict (X_test)
# Print the predictions
print (predictions)
‘This will give us an array of predicted house prices for the testing data. We can then evaluate our
model using various metries, which we'll cover later.
+ Classification
Classification is a type of supervised learning where the goal is to predict a categorical value, For
example, we might want to predict whether an email is spam or not based on its content.
Let’s use the Iris dataset to demonstrate classification. ‘The Iris dataset is a famous dataset that
contains measurements of different types of Iris flowers. Our goal will be to predict the species of
‘the flower based on its measurements,
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# Load the iris dataset
iris = load_iris()
# Split the data into training and testing sets
X.train, Xtest, y train, y_test = train test_split(iris.data, iris.target..,
“Stest_size=0.2)
# Train the decision tree classifier
classifier = DecisionTreeClassifier()
classifier fit(X_train, y_train)
# Make predictions on the testing data
predictions = classifier.predict (X_test)
# Print the predictions
print (predictions)
In this example, we're using the Decision Tyee algorithm from the scikitJeamn library. We're also
splitting our data into training and testing sets using the train_test_split() famction.Now, let's move on to unsupervised learning,
3. Unsupervised Learning
‘Unsupervised learning is a type of machine learning where the algorithm learns from unlabeled
data. Unlabeled data is data that has not been categorized or classified. The goal of unsupervised
Jearning is to find patterns in the data without any prior knowledge.
‘There are two main types of unsupervised learning: clustering and dimensionality reduction.
- Clustering
Clustering is a type of unsupervised learning where the goal is to group similar data points together.
For example, we might want to group customers based on their purchasing habits,
Let's use the K-Means algorithm to demonstrate clustering. The K-Means algorithm is a popular
clustering algorithm that partitions the data into K clusters.
( ]:|from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# Generate some random data
X, y = make_blobs(n_samples=100, center:
» cluster_std=1.0, random_state=42)
# Train the K-Means modet
kmeans = KNeans(n_cluster:
kmeans. fit (X)
# Visualize the clusters
plt.scatter(X{:, 0], XC:, 1], c>kmeans. labels
plt scatter (kmeans.cluster_centers_(:, 0], kmeans.cluster_centers_[:, 1],
*", 9-300, cred")
cmap='viridis')
marker
plt.shou()
In this example, we're using the K-Means algorithm from the scikit-leamn library. We're generating
some random data using the make_blobs() fiction and training our model on this data, We then
visualize the clusters using a scatter plot.
- Dimensionality Reduction
Dimensionality reduction is a type of unsupervised learning where the goal is to reduce the number
of features in the data while retaining as much information as possible. For example, we might
want to reduce the number of features in an image to make it easier to process.
Let’s use Principal Component Analysis (PCA) to demonstrate dimensionality reduction. PCA is
a popular dimensionality reduction technique that projects the data onto a lowerdimensional space
while preserving the variance of the data.
(1: [from sklearn.datasets import load_digits
fron sklearn.decomposition import PCA
import matplotlib.pyplot as plt1
# Load the digits dataset
digits = load digits ()
# Perform PCA on the data
pea = PCA(n_components=2)
X_transformed = pca.fit_transforn(digits.data)
# Visualize the transformed data
plt.scatter(X_transformed{:, 0], X_transforned[:, 1], c-digits. target,
somap=" viridis‘)
plt.show()
In this example, we're using PCA from the scikitea library to reduce the dimensions of the
digits datasct. We then visualize the transformed data using a scatter plot
Now, let’s move on to reinforcement learning,
4, Reinforcement Learning
Reinforcement learning is a type of machine learning where the algorithm learns through trial and
exror. The algorithm receives feedback in the form of rewards or punishtnents for each action it
takes, The goal of reinforcement learning is to learn the optimal sequence of actions to maximize
‘the cumulative reward.
Let’s use the Q-Learning algorithm to demonstrate reinforcement learning, Q-Learning is a popular
reinforcement learning algorithm that uses a Q-Table to store the value of each action in each state.
import numpy as np
# Define the environment
env = np.array({{0, 0, 0, 0,
fo, -1, 0, -17,
fo, 0, 0, -19,
(1, 0, 0, 111)
# Define the Q-Table
q.table = np.zeros((4, 4))
# Define the hyperparameters
learning rate = 0.1
@iscount_factor = 0
epsilon = 0.1
num_episodes = 1000
# Define the training Loop
for episode in range (num_episodes)
state = (0, 0)
done = Falsewhile not done
# Choose an action
if np.random.uniform() < epsilon
action = np.random.choice({0, 1, 2, 31)
else
action = np.argnax(q table(state[0], state(1], =)
# Take the action
next_state = (state(0] + [0, 0,
0] faction] )
revard = env(next_state[0], next_state[1]]
» A)faction), state[1] + [-1, 1, 0,4
# Update the Q-Table
q.table(state[0], state[1], action] = (1 - learning rate) +.
»q_table(state(0], state[i], action] + learning rate + (reward +
sdiscount_factor + np.max(q table(next_state[0], next_state(i], :1))
state = next_state
# Print the final Q-Table
print (q_ table)
In this example, we're defining a simple 4x4 environment with rewards and punishments, We're
using the Q-Learning algorithm to train our agent to navigate the environment and maximize its
reward,
Now, let’s move on to deep learning.
5. Deep Learning
Deep learning is a type of machine learning that uses artificial neural networks to learn from data.
Neural networks are modeled after the structure of the human brain and are capable of learning
complex patterns in data,
‘There are several types of neural networks, but we'll focus on three main types: artificial neural
networks, convolutional neural networks, and recurrent neural networks,
Artificial Neural Networks
Antificial neural networks (ANNs) are the most basic type of neural network. ANNs are composed
of imput, hidden, and output layers of nodes. Each node in the hidden and output layers has a
weight associated with it, which is used to determine the output of the node.
Let’s use the Keras library to demonstrate how to create an ANN. We'll be using the Iris dataset
again, but this time we'll be using all four features to predict the species of the flower(]:|from sklearn.datasets import load_iris
fron sklearn.model_selection import train_test_split
fron tensorflow.keras.nodels import Sequential
fron tensorflow-keras.layers import Dense
# Load the iris dataset
iris = load_iris()
# Split the data into training and testing sets
Xtrain, X test, ytrain, y test - train test split(iris.data, iris target,
test_size=0.2)
# Create the model
model = Sequential()
model.add(Dense(i0, input_dim-4, activation
model. add (Dense(3, activation-'softmax'))
relu'))
# Compile the model
nodel. compile (Loss=' categorical _crossentropy', optimizer-'adan’ ,.,
smetrics=[' accuracy")
# Fit the model
model. fit(X_train, y_train, epochs=100, batch_siz
# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print('Accuracy:', accuracy)
In this example, we're using the Keras library to create an ANN with one hidden layer of 10 nodes
and an output layer of 3 nodes, We're using the categorical crossentropy loss function and the
Adam optimizer. We're also training our model for 100 epochs with a bateh size of 10.
- Convolutional Neural Networks
Convolutional neural networks (CNNs) ate a type of neural network that ate specialized for image
processing tasks. CNNs use convolutional layers to learn features from images and pooling layers
to reduce the dimensionality of the feature maps.
Let’s use Keras to create a CNN for the CIFAR-10 dataset. The CIFAR-10 dataset is a popular
dataset for image classification tasks and contains 60,000 32x32 color images in 10 classes.
(1: {from tensorflow.keras.datasets import cifarl0
from tensorflow.keras.models import Sequential
fron tensorflow-Keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
fron tensorflow.Keras.utils import to_categorical
# Load the CIFAR-10 dataset
(Ktrain, y train), (X test, y test) = cifari0 load data()1
# Preprocess the data
X_train = Xtrainastype('float32") /
X test = X_test.astype('floata2') / 2:
y_train = to_categorical(y_train)
y_test = to_categorical (y_test)
# Create the model
model - Sequential ()
model. add(Conv2D(32, (3, 3), activation='relu', padding-'same',.
vinput_shape-X_train[0] .shape))
model. add (NaxPooling20((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding-'same'))
model. add (NaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activatio
model. add (NaxPooling2D((2, 2)))
model. add(Flatten())
model. add(Dense(i28, activation='relu'))
model. add(Dense(i0, activation='softmax'))
elu’, padding-'sane'))
# Compile the model
model. compile (loss='categorical_crossentropy', optimizer='adam',,
smetrics=['accuracy'])
# Fit the model
nodel.fit(X_train, y_train, epochs=10, batch_size-64)
# Evaluate the model
loss, accuracy = model-evaluate(X test, y_test)
print('Accuracy:', accuracy)
In this example, we're using Keras to create a CNN with three convolutional layers and one
layer. We're using the CIFAR-10 dataset and preprocessing the data by scaling the pixel values
to be between 0 and 1. We're also using the categorical crossentropy loss function and the Adam
optimizer. We're training our model for 10 epochs with a bateh size of 64
- Recurrent Neural Networks
Recurrent neural networks (RNNs) are a type of neural n
processing tasks. RNNs use recurrent layers to maintain a
allowing the network to learn long+
vork that are specialized for sequence
tate that is updated with each input,
rm dependencies in the sequence.
Let’s use Keras to create an RNN for a language modeling task. We'll be using the Shakespeare
dataset, which contains a collection of Shakespeare's plays.
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequence:
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.models import Sequentialimport numpy as np
# Load the Shakespeare dataset
with open('shakespeare.txt', 'r') as f
text = f.read()
# Preprocess the data
‘tokenizer = Tokenizer()
tokenizer. fit_on_texts((text])
sequences ~ tokenizer. texts_to_sequences( (text) [0]
vocab_size = len(tokenizer.word_index) + i
x-0
y= 0
for i in range(1, len(sequences))
X. append (sequences: i])
y-append (sequences (i])
X = pad_sequences(X, maxlen-maxlen, padding='pre')
y = tf keras.utils.to_categorical(y, num_classes-vocab_size)
# Create the model
model = Sequential()
model.add(Embedding(vocab_size, 128, input_length-maxlen-1))
model .add(LSTM(128))
model. add (Dense(vocab_size, activatio
softmax'))
# Compile the model
model. compile (loss=' categorical _crossentropy', optimizer-'adan’ ,.,
smetrics=['accuracy'])
# Fit the model
model.fit(X, y, epochs=100)
# Generate some teat
seed_text = "ROMEO: "
next_uords = 100
for _ in range(next_words)
Sequence = tokenizer texts_to_sequences([seed_text]) [0]
sequence = pad_sequences({sequence], maxlen-naxlen-1, padding-'pre')
prediction ~ np.argnax(nodel predict (sequence), axis--i)
output_vord
for word, index in tokenizer. word_index. itens()
prediction:
output_word = vord
if indexta
break
seed text +=" " + output_vord
print (seed_text)
In this example, we're using Keras to create an RNN with one LSTM layer and one dense layer.
‘We're using the Shakespeare dataset and preprocessing the data by tokenizing the text and padding,
‘the sequences to have the same length. We're also using the categorical erossentropy loss function
and the Adam optimizer. We're training our model for 100 epochs and then using it to generate
some text,
Now, let’s move on to model evaluation,
6. Mode! Evaluation
Model evaluation is a critical part of the machine learning process. There are several metrics that
wwe can use to evaluate the performance of our models, including the confusion matrix, aceuraey,
precision, recall, FI score, and ROC curve.
- Confusion Matrix
A confusion matrix is a table that is used to evaluate the performance of a classification model.
‘The confusion matrix shows the number of true positives, true negatives, false positives, and false
negatives
Let’s use scikit-learn to create a confusion matrix for our classification model
from sklearn.metrics import confusion_matrix
import seaborn as sns
# Make predictions on the testing data
predictions = classifier.predict(X_test)
# Create the confusion matriz
cm = confusion matrix(y_test, predictions)
# Visualize the confusion matris
sns.heatmap(cm, annot-True, cnap='Blues')
plt.xlabel (‘Predicted')
plt.ylabel(‘Actual')
plt.show()
In this example, we're using the confusion_matrix function from scikit-learn to create a confusion
matrix for our classification model. We're also using seaborn to visualize the confusion matrix.
+ Accuracy, Precision, Recall, and F1 Score
Accuracy, precision, recall, and F score are common metrics used to evaluate classification models.
Accuracy is the number of correct predictions divided by the total number of predictions.
Precision is the number of true positives divided by the total number of predicted positives.1
1
Recall is the number of true positives divided by the total number of actual positives
‘The F1 score is the harmonic mean of precision and recall.
Let’s use scikit-leam to calculate these metrics for our classification model
from sklearn.metrics import accuracy_score, precision_score, recall_score,,,
of 1_score
# Calculate the accuracy, precision, recall, and Fi score
accuracy = accuracy_score(y_test, predictions)
precision = precision_score(y test, predictions, average-'veighted")
recall = recall_score(y test, predictions, average-'ueighted')
£1 = f1_score(y test, predictions, average-'ueighted')
# Print the metrics
print ('Accuracy:', accuracy)
print ('Precision:', precision)
print(’Recall:', recall)
print('F1 Score:', £1)
In this example, we're using the aceuracy_score, precision_score, recall_score, and fl_score fune-
tions ftom scikit-learn to calculate the metrics for our classification model.
- ROC Curve
‘The ROC eurve is a graphical representation of the performance of a binary classification model.
‘The ROC curve shows the true positive rate (TPR) versus the false positive rate (FPR) for different
‘thresholds,
Let's use scikit-learn to create an ROC curve for our classification model
from sklearn.metrics import roc_curve, roc_auc_score
# Calculate the predicted probabilities
probabilities - classifier. predict_proba(K test) (:, 1]
# Calculate the FPR and TPR for different thresholds
fpr, tpr, thresholds = roc_curve(y test, probabilities)
# Calculate the AUC score
auc = roc_auc_score(y_test, probabilities)
# Visualize the ROC curve
plt.plot (fpr, tpr)
plt-plot((0, 1], [0, 1], linestyle:
plt.xlabel('False Positive Rate!)
plt-ylabel('True Positive Rate!)
plt.title('ROC Curve (AUC = {:.2#})'.format(auc))
plt.show()
10v1
(1
In this example, we're using the roc_curve and roc_aue_score functions from scikit-learm to create
an ROC curve for our classification model, We'ze also using matplotlib to visualize the ROC curve.
. Cross-Validation
Cross-validation is a technique for evaluating the performance of a machine Jearning model by
partitioning the data into multiple subsets, or folds, and training and testing the model on different
combinations of the folds. The most common type of cross-validation is k-fold cross-validation,
where the data is divided into k equally sized folds.
Let’s use scikit-learn to perform k-fold cross-validation on our classification model.
from sklearn.model_selection import cros!
val_score
# Perform 5-fold cross-validation
scores = cross_val_score(classifier, X, y, ©
# Print the scores
print('Accuracy Scores:', scores)
print('Mean Accuracy:', scores.mean())
In this example, we're using the cross_val_score function from scikit-learn to perform 5-fold cross-
validation on our classification model, We're also calculating the mean accuracy score across all
folds
Cross-validation can also be used for hyperparameter tuning. Let's use scikit-leam to perform grid
search cross-validation on our classification model.
from sklearn.model_selection import GridSearchCV
# Define the hyperparameters to search
paramgrid = {'C': [0.1, 1, 10, 100], ‘gamma’: [1, 0.1, 0.01, 0.001]}
# Perform grid search cross-validation
grid = GridSearchCV(SVC(), param_grid, cv-5)
grid fit(X_train, y_train)
# Print the best hyperparameters and the corresponding score
print('Best Hyperparameters:', grid.best_params_)
print('Best Score:', grid.best_score_)
In this example, we're using the GridSearchCV function from scikit-learn to perform grid search,
cross-validation on our classification model. We're searching over a grid of hyperparameters and
using 5-fold cross-validation to evaluate the performance of each combination of hyperparameters.
‘We're also printing the best hyperparameters and the corresponding score.
# Conclusion
In this ultimate guide to machine learning algorithms in Python, we covered the basics of supervised
earning, unsupervised learning, reinforcement learning, and deep learning. We also covered how to
evaluate the performance of our models using metrics like the confusion matrix, accuracy, precision,
recall, FI score, and ROC curve.
uThope this guide was helpful and provided you with a good starting point for your machine learning
journey, Remember, practice makes perfect, so keep experimenting and building your own models!
12