UNIT-1
INTRODUCTION TO AI AND ML
• CONTENT
Introduction to artificial intelligence and machine learning.
Application of Machine learning.
Types of machine learning
Python programming basics for ML: NumPy, Pandas, and
Matplotlib.
Data cleaning and handling missing values
INTRODUCTION TO ARTIFICIAL
INTELLIGENCE AND MACHINE LEARNING.
What is Artificial Intelligence?
Artificial Intelligence, or AI, is the result of our efforts to automate tasks
normally performed by humans, such as image pattern recognition,
document classification, or a computerized chess rival.
What is Machine Learning?
Machine Learning, or ML, focuses on the creation of systems or models
that can learn from data and improve their performance in specific tasks,
without the need to be explicitly programmed, making them learn from past
experiences or examples to make decisions on new data. This differs from
traditional programming, where human programmers write rules in code,
transforming the input data into desired results
REGRESSION VS. CLASSIFICATION
TYPES OF ML CLASSIFICATION
ALGORITHMS:
Logistic Regression
K-Nearest Neighbours
Support Vector Machines
Kernel SVM
Naïve Bayes
Decision Tree Classification
Random Forest Classification
Types of Regression Algorithm:
Simple Linear Regression
Multiple Linear Regression
Polynomial Regression
Support Vector Regression
Decision Tree Regression
Random Forest Regression
CLUSTERING
Unsupervised clustering algorithms are
classified into four different types:
1.Exclusive clustering
2.Overlapping clustering
3.Hierarchical clustering
4.Probabilistic clustering
DIMENSIONALITY REDUCTION
ASSOCIATION RULE
How Association Rule Works
Apriori: One of the original and oldest algorithm used for building association rules. We will be
using Apriori for building all the rules in this blog.
Itemsets: It refers to the collection of items. N item set means set of n items. Simply, it is the set
of item purchased by customers.
Support: It is percentage of time X and Y occur together out of all transaction.
((Frequency of X and Y) / (Total # of records))
Confidence: It is defined as measure of certainty associated with each discovered rule. It is
percent of transactions that contains both X and Y out of all transaction that contains X
(Frequency of X and Y) / (Frequency of X)
Lift: It is measure of how X and Y are related rather than coincidentally happening together. It
measures how many times more often X and Y occur together then expected if they are
statistically independent to each other. This measure will be our main focus when evaluating the
algorithm results.
Lift (X => Y) = Confidence(X => Y) / Support(Y)
.
Minlen: the minimum number of items in the rule
Maxlen: the maximum number of items in the rule
Target: indicates the type of association mined
Frequent Itemsets Generation: Find the most frequent itemsets from the data
based on predetermined support and minimum item and maximum item
Rule Generation: This step involves generating all the rules from frequent item
sets. We can control the number of rules generated by controlling support,
confidence or lift.
LHS > RHS: Left hand side and Right-hand side are usually used to understand
how often item A and item B occur together. If we are trying to understand how
often people go to store A after going to store B. Store A would be LHS and store
B would be RHS. Similarly, If we are trying to understand which stores people
usually go to before going to store A, Store A would be on RHS and other stores
would be on LHS.
REINFORCEMENT LEARNING
DATA CLEANING ,HANDLING MISSING
VALUES, HANDLING CATEGORICAL DATA
DATA PROCESSING
• Getting the dataset
• Importing libraries
• Importing datasets
• Finding Missing Data
• Encoding Categorical Data
1.Getting the dataset
2. Importing Libraries
Numpy: import numpy as nm
Matplotlib: import matplotlib.pyplot as mpt
Pandas: import pandas as pd
3. Importing datasets
1. Save your Python file in the directory which contains
dataset.
2. Go to File explorer option in Spyder IDE, and select the
required directory.
3. Click on F5 button or run option to execute the file.
4.Handling/Finding Missing Data
1. By deleting the particular row:
2. By calculating the mean:
5.Encoding Categorical Data