[go: up one dir, main page]

0% found this document useful (0 votes)
18 views8 pages

Aychew Chernet

Classification is a supervised machine learning method that predicts the label or category of new data based on patterns learned from training data consisting of inputs and labels. Clustering is an unsupervised learning method that groups unlabeled data points based on similarities without referring to predefined labels, with the goal of discovering hidden patterns in the data. Regression finds relationships between features and continuous outcomes to predict future trends or values from new data based on patterns from labeled training data.

Uploaded by

aychewchernet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

Aychew Chernet

Classification is a supervised machine learning method that predicts the label or category of new data based on patterns learned from training data consisting of inputs and labels. Clustering is an unsupervised learning method that groups unlabeled data points based on similarities without referring to predefined labels, with the goal of discovering hidden patterns in the data. Regression finds relationships between features and continuous outcomes to predict future trends or values from new data based on patterns from labeled training data.

Uploaded by

aychewchernet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MADDA WALABU

UNIVERSITY

INDIVIDUAL ASSIGNMENT

NAME : AYCHEW
What is Classification ?
Classification is a supervised machine learning method where the model
tries to
predict the correct label of a given input data.
In classification, the model is fully trained using the training data, and
then it is
evaluated on test data before being used to perform prediction on new
unseen
data.
For instance, an algorithm can learn to predict whether a given email is
spam or
ham (no spam), as illustrated below.

Example of classification with python

In this blog, we will focus on logistic regression. Logistic regression is a


method
that statistically models a binary classification task. It predicts the
probability p that the input features fall into a specific class.
Mathematically, we model the logistic regression model as follows:

1
p=1/(1+−).
Here,z defines the weighted linear combination of the input features and is
calculated
as follows:
z=0+111+22+...+ wn
x
n.
The linear regression algorithm, such as gradient descent, finds the optimal
values
for the weights that maximize the likelihood of the observed data.

Let’s see how this can be done using Python:

1# Importing libraries and dataset


2 import numpy as np
3 from sklearn.datasets import load_iris
4 from sklearn.linear_model import LogisticRegression
5 from sklearn.model_selection import train_test_split
6 #from sklearn import metrics
7
8 # Load the Iris dataset
9 iris = load_iris()
10 X = iris.data
11 y = iris.target
12
13 # Splitting the data into training and testing sets
14 X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=42)
15
16 # Creating the logistic regression model
17 model = LogisticRegression()
18
19 # Training the model
20 model.fit(X_train, y_train)
Lines 4–5: We import Logistic Regression and train test split from the
sklearn
library.Line 14: We split the features X and target y into training and test
datasets.
The training dataset trains the model, while the test dataset evaluates its
performance.
Lines 17–20: We create a logistic regression model and train the classifier
on
training data X train and y train.

2
What is regression?
Regression is a method for understanding the relationship between
independent variables or features and a dependent variable or outcome.
Outcomes can then be predicted once the relationship between
independent and dependent variables has been estimated.

Regression is a field of study in statistics which forms a key part of


forecast models in machine learning. It’s used as an approach to predict
continuous outcomes in predictive modelling, so has utility in forecasting
and predicting outcomes from data. Machine learning regression
generally involves plotting a line of best fit through the data points. The
distance between each point and the line is minimised to achieve the best
fit line.

Alongside classification, regression is one of the main applications of the


supervised type of machine learning (https://www.seldon.io/four-types-
of-machine-learning-algorithms-explained/).

Regression analysis is used to understand the relationship between


different independent variables and a dependent variable or outcome.
Models that are trained to forecast or predict trends and outcomes will be
trained using regression techniques. These models will learn the
relationship between input and output data from labelled training data. It
can then forecast future trends or predict outcomes from unseen input
data, or be used to understand gaps in historic data.

As with all supervised machine learning, special care should be taken to


ensure the labelled training data is representative of the overall population.
If the training data is not representative, the predictive model will be
overfit to data that doesn’t represent new and unseen data. This will result
in inaccurate predictions once the model is deployed. Because regression
analysis involves the relationships of features and outcomes, care should
be taken to include the right selection of features too.

Example of regression

3
# Python code to illustrate
# regression using data set
import matplotlib
matplotlib.use('GTKAgg')

import matplotlib.pyplot as plt


import numpy as np
from sklearn import datasets, linear_model
import pandas as pd

# Load CSV and columns


df = pd.read_csv("Housing.csv")

Y = df['price']
X = df['lotsize']

X=X.values.reshape(len(X),1)
Y=Y.values.reshape(len(Y),1)

# Split the data into training/testing sets


X_train = X[:-250]
X_test = X[-250:]

# Split the targets into training/testing sets


Y_train = Y[:-250]
Y_test = Y[-250:]

# Plot outputs
plt.scatter(X_test, Y_test, color='black')
plt.title('Test Data')
plt.xlabel('Size')
plt.ylabel('Price')
plt.xticks(())
plt.yticks(())

# Create linear regression object


regr = linear_model.LinearRegression()

# Train the model using the training sets


regr.fit(X_train, Y_train)

# Plot outputs

4
plt.plot(X_test, regr.predict(X_test), color='red',linewidth=3)
plt.show()

What is Clustering?
Introduction to Clustering: It is basically a type of unsupervised learning
method (https://www.geeksforgeeks.org/supervised-unsupervised-
learning/). An unsupervised learning method is a method in which we
draw references from datasets consisting of input data without labeled
responses. Generally, it is used as a process to find meaningful structure,
explanatory underlying processes, generative features, and groupings
inherent in a set of examples.

Clustering is the task of dividing the population or data points into a


number of groups such that data points in the same groups are more
similar to other data points in the same group and dissimilar to the data
points in other groups. It is basically a collection of objects on the basis
of similarity and dissimilarity between them.
For example The data points in the graph below clustered together can be
classified into one single group. We can distinguish the clusters, and we
can identify that there are 3 clusters in the below picture.

It is not necessary for clusters to be spherical as depicted below:

5
Example of Clustering

#Implementing E step
def assign_clusters(X, clusters):
for idx in range(X.shape[0]):
dist = []

curr_x = X[idx]

for i in range(k):
dis = distance(curr_x,clusters[i]['center'])
dist.append(dis)
curr_cluster = np.argmin(dist)
clusters[curr_cluster]['points'].append(curr_x)
return clusters

#Implementing the M-Step


def update_clusters(X, clusters):
for i in range(k):
points = np.array(clusters[i]['points'])

6
if points.shape[0] > 0:
new_center = points.mean(axis =0)
clusters[i]['center'] = new_center

clusters[i]['points'] = []
return clusters

You might also like