[go: up one dir, main page]

0% found this document useful (0 votes)
33 views201 pages

Part3 ML

This document provides an overview of machine learning and data science concepts along with how Python can be used for data analysis and machine learning. It discusses popular Python libraries for data analysis (pandas, NumPy, Matplotlib), machine learning (scikit-learn), and deep learning (TensorFlow, Keras). Specific machine learning algorithms covered include KNN, Naive Bayes classification, decision trees, time series analysis, and association rule mining. Code examples are provided to demonstrate how to implement these algorithms in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views201 pages

Part3 ML

This document provides an overview of machine learning and data science concepts along with how Python can be used for data analysis and machine learning. It discusses popular Python libraries for data analysis (pandas, NumPy, Matplotlib), machine learning (scikit-learn), and deep learning (TensorFlow, Keras). Specific machine learning algorithms covered include KNN, Naive Bayes classification, decision trees, time series analysis, and association rule mining. Code examples are provided to demonstrate how to implement these algorithms in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 201

MACHINE LEARNING

How to apply Data Science in your domain?


USE CASES
• To predict or automate some task in your domain
• Industry-predicting failure of a machine
• Healthcare-predicting occurrence of a disease
• Banking-fraud detection
• Finance-sales prediction
• HR department-predicting salary based on candidate’s credentials
• Real estate-House prediction
• Education-Customized and dynamic learning experience, career prediction,better
student performance evaluation etc
Roles of Data Analytics in Education
Industry
Why Python for Data Science?
1)Python is easy to use-syntax is simple
• Less lines of codes for implementing a task when compared to other programming languages
• 2)Python supports many libraries and framework
• Opensource
• Main libraries-numpy,pandas,scikitlearn,matplotlib,seaborn
• DL lib-tensorflow,keras,Theano,pytorch- for dl nn
• Python is used for creating web based applications,web services,web scrapping
• 3)Python has community and corporate support-google search-github link,online repositories,
various online learning resources apart from youtube tutorials
• Many top companies like Google,fb,amazon use python to implement their product,eg amazon
alexa,google assistant,siri from apple,Netflix movie recommendation system,fb using friend
recommendation system using python
Creating File/Project
Writing Code
Variable Explorer and Output Console
File Explorer
Python Basics
16
• Variables

• Data types

• Operators

• Conditional Statements

• Loops

• Functions
17
Exploratory Data Analysis refers to the critical process of
performing initial investigations on data so as to discover
patterns,to spot anomalies,to test hypothesis and to check
assumptions with the help of summary statistics and graphical
representations.
Python Tools and Libraries
for Data Science
DATA ANALYSIS
DEPLOYMENT
IDE 1.Panda
1.Flask
1.Spyder 2.Numpy
2.Django
2.Pycharm 3.Matplotlib
3.AWS
3.Jupyter 4.Seaborn
4.Azure
5.Scipy

DATA SCIENCE

MACHINE
VISUALLIZATION LEARNING &
1.Tableau DEEP LEARNING
2.Power BI 1.Sklearn
2.Tensorflow
3.Keras
4.Pytorch
Python Libraries for Data Analysis, Data
Modelling and Visualisation
Numpy
• Numpy provides array oriented computing
• Numpy provides a fast built-in object(ndarray)which is a multi dimensional array of
homogeneous data
Python Implementation
Pandas
• Pandas is a high-level data manipulation tool
• It is built on the Numpy package - key data structure is DataFrame
• DataFrames allow to store and manipulate tabular
data in rows of observations and columns of variables
Loading the data
25
Python Implementation
Data Visualisation
Why data visualisation
Python Implementation
Seaborn
• Used for data visualization and is based on Matplotlib
• Seaborn allows the creation of statistical graphs

Functionalities
• Allows comparison between multiple variables
• Supports multigrid plot
• Univariate and bivariate visualizations
• Availability of different color palettes
Python Implementation
Scipy
• SciPy is an Open Source library of scientific tools for Python. It depends on the NumPy library, and it gathers a
variety of high level science and engineering modules together as a single package. SciPy provides modules for
• file input/output
• statistics
• optimization
• numerical integration
• linear algebra
• Fourier transforms
• signal processing
• image processing
Scikit learn-Sklearn
• Sklearn is machine learning library
• Simple and efficient tool for data analysis
• It features various regression, classification and clustering algorithms
• Dimensionality reduction, model selection and preprocessing algorithm
• Built on Numpy, Scipy and Matplotlib
MACHINE LEARNING ALGORITHMS
MACHINE LEARNING
ALGORITHMS
ML ALGORITHM

SUPERVISED UNSUPERVISED RE-INFORCEMENT

REGRESSION CLASSIFICATION CLUSTERING ASSOCIATION SYSTEMS WITH


ANALYSIS FEEDBACK
1.SIMPLE LINEAR 1.K-NEAREST 1.K-MEANS
2.MULTIPLE LINEAR NEIGHBOURS 2.DBSCAN
3.LOGISTIC 2.NAIVE BAYERS 3.HIERARCHICAL
4POISSON 3.DECISION TREE
5.NEGATIVE BINOMIAL
6.ZERO INFLATED
SUPERVISED MACHINE LEARNING
Python Implementation
KNN algorithm is based of feature similarity. Choosing right value of
‘k’ is a process called ’parameter tuning’ and is it important for better
accuracy
How to choose k?
When do we use KNN?
How does KNN algorithm work?
According to Euclidean distance formula,distance between two
points with coordinates(x,y) and (a,b) is given by:
Python Implementation
Python Implementation
Python Implementation
Python Implementation
Forecasting –Time Series Analysis
Python Implementation
• Support represents the popularity of that product of all the product
transactions
• Confidence can be interpreted as the likelihood of purchasing both
the products A and B
• Confidence is calculated as the number of transactions that include
both A and B divided by the number of transactions includes only
product A
• The lift value is a measure of importance of a rule

• For an association rule X ==> Y, if the lift is equal to 1, it means that X


and Y are independent. If the lift is higher than 1, it means that X and
Y are positively correlated. If the lift is lower than 1, it means that X
and Y are negatively correlated
Python Implementation

You might also like