FAZAIA BILQUIS COLLEGE OF EDUCATION FOR WOMEN
PAF NUR KHAN BASE
COMPUTER SCIENCE DEPARTMENT
Artificial Intelligence
LAB MANUAL
Class: CS (4th - A)
Table of Contents
LAB # 01.........................................................................................................................................3
What is artificial intelligence?.....................................................................................................3
What is Jupyter notebook?...........................................................................................................3
Use of “Print” in Jupyter notebook:......................................................................................3
If statement:..........................................................................................................................4
For loop:................................................................................................................................5
While loop:...........................................................................................................................6
Libraries in Jupyter Notebook:.............................................................................................6
LAB # 02.........................................................................................................................................7
Data Cleaning and Data Processing......................................................................................7
LAB # 03.......................................................................................................................................11
Feature Transformation.............................................................................................................11
Label Encoding...................................................................................................................11
OneHot Encoding...............................................................................................................12
Target Encoding..................................................................................................................13
Min Max Scaling:...............................................................................................................13
Standardization...................................................................................................................14
LAB # 04.......................................................................................................................................14
SVM and Random Forest(RF)...................................................................................................14
LAB # 05.......................................................................................................................................17
K-Nearest Neighbours (KNN)...................................................................................................18
LAB # 06 , 07 , 08.........................................................................................................................19
Eid Holidays, Mid Exams..........................................................................................................20
LAB # 09.......................................................................................................................................20
K-mean Clustering.....................................................................................................................20
LAB # 10.......................................................................................................................................22
DBSCAN Clustering (with dataset)...........................................................................................22
DBSCAN Clustering (Core Points, Border Points, and Noise).................................................23
LAB # 01
What is artificial intelligence?
Artificial Intelligence (AI) refers to the development of computer systems that can perform
tasks typically requiring human intelligence, such as decision-making, problem-solving, and
pattern recognition. A core component of AI is machine learning, which enables systems to
learn from data and improve their performance over time without being explicitly programmed.
In machine learning, algorithms identify patterns in large datasets and make predictions or
decisions based on that analysis.
What is Jupyter notebook?
Jupyter Notebook is an open-source web-based application that allows users to create and share
documents containing live code, equations, visualizations, and narrative text. It supports many
programming languages, with Python being the most commonly used. Jupyter is especially
popular in data science, machine learning, and academic research because it enables interactive
coding and real-time feedback. Users can break their code into cells, making it easier to test,
debug, and explain their work step by step. With its blend of code and documentation, Jupyter
Notebook is a powerful tool for both learning and presenting computational projects.
Use of “Print” in Jupyter notebook:
CODE:
print("Lab 1")
name = (" Artifical Intelligence ")
print(name)
OUTPUT:
CODE:
def name():
print(" Lab 1 ")
name()
OUTPUT:
If statement:
CODE:
i = 10
if i>10:
print(" i > 15")
else:
print(" i is less than or equal to 10")
OUTPUT:
CODE:
x=5
y = 10
z = "AI LAB"
print(x)
print(y)
print(z)
OUTPUT:
For loop:
CODE:
for i in range(1,6):
print(i)
OUTPUT:
While loop:
CODE:
i=1
while i <= 5:
print(i)
i += 1
OUTPUT:
Libraries in Jupyter Notebook:
CODE:
import math
print(math.pi)
print(math.sqrt(16))
OUTPUT:
CODE:
import random
print(random.randint(1,10))
print(random.choice([1,2,3,4,5]))
OUTPUT:
LAB # 02
Data Cleaning and Data Processing
CODE:
import numpy as np
import pandas as pd
df = pd.read_csv( r"E:\Desktop\dirty_cafe_sales.csv")
df.head()
df.info()
df.tail()
df.columns
df.isnull().sum()
# Change data type of specific columns to float
df['Quantity'] = pd.to_numeric(df['Quantity'],
errors='coerce').astype(float)
df['Price Per Unit'] = pd.to_numeric(df['Price Per Unit'],
errors='coerce').astype(float)
df['Total Spent'] = pd.to_numeric(df['Total Spent'],
errors='coerce').astype(float)
df.head()
# Convert 'Total Spent' column to numeric, coerce invalid values to NaN
df['Total Spent'] = pd.to_numeric(df['Total Spent'], errors='coerce')
# Handle NaN values properly
df['Quantity'] = df['Quantity'].bfill() # Backward fill for Quantity
df['Price Per Unit'] = df['Price Per Unit'].fillna(df['Price Per
Unit'].mean()) # Fill Price Per Unit with mean
df['Total Spent'] = df['Total Spent'].fillna(df['Total Spent'].median())
# Fill Total Spent with median
df['Payment Method'] = df['Payment Method'].fillna(df['Payment
Method'].mode()[0]) # Fill Payment Method with mode
df['Location'] = df['Location'].fillna(df['Location'].mode()[0]) # Fill
Location with mode
df.isnull().sum()
# Remove duplicates
df = df.drop_duplicates()
# Display duplicate count before removal
duplicate_count = df.duplicated().sum()
print(f"Number of duplicate rows: {duplicate_count}")
# Detect and remove outliers for specific numeric columns using IQR
specific_columns = ['Quantity', 'Price Per Unit', 'Total Spent']
for column in specific_columns:
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# Check number of outliers before removal
outliers_count = df[(df[column] < lower_bound) | (df[column] >
upper_bound)].shape[0]
print(f"Number of outliers in {column}: {outliers_count}")
# Remove outliers
df = df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]
# Check dataset shape after removal
print(f"Dataset shape after removing outliers in {column}:
{df.shape}")
# Display cleaned dataset info
print("\nCleaned dataset shape:", df.shape)
print("\nCleaned dataset info:")
df.info()
output_path = 'E:\\Desktop\\cafe_clean_dataset.csv' # Use double
backslashes (\\)
df.to_csv(output_path, index=False)
print("Cleaned dataset saved to", output_path)
LAB # 03
Feature Transformation
Label Encoding
CODE:
from sklearn.preprocessing import LabelEncoder
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({'Gender': ['Male', 'Female', 'Male', 'Female']})
print(df)
# Initialize LabelEncoder
encoder = LabelEncoder()
# Transform the categorical column
df['Gender'] = encoder.fit_transform(df['Gender'])
print(df)
OneHot Encoding
CODE:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({'City': ['Paris', 'New York', 'London', 'Paris']})
# Initialize OneHotEncoder (correcting the parameter name)
onehot_encoder = OneHotEncoder(sparse_output=False) # Updated parameter
name
encoded = onehot_encoder.fit_transform(df[['City']])
# Convert to DataFrame
df_encoded = pd.DataFrame(encoded,
columns=onehot_encoder.get_feature_names_out(['City']))
# Merge with original DataFrame
df = pd.concat([df, df_encoded], axis=1)
print(df)
Target Encoding
CODE:
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'Education': ['Graduate', 'Undergrad', 'Graduate', 'PhD', 'Undergrad',
'PhD'],
'Loan_Approved': [1, 0, 1, 1, 0, 0] # Target variable
})
# Compute mean Loan_Approved for each category
target_mapping = df.groupby('Education')['Loan_Approved'].mean()
# Apply Target Encoding
df['Education_Encoded'] = df['Education'].map(target_mapping)
print(df)
Min Max Scaling:
CODE:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
# Sample DataFrame
df = pd.DataFrame({'Salary': [20000, 50000, 80000, 100000, 150000]})
# Initialize MinMaxScaler
scaler = MinMaxScaler()
# Apply Min-Max Scaling
df['Salary_Scaled'] = scaler.fit_transform(df[['Salary']])
print(df)
Standardization
CODE:
from sklearn.preprocessing import StandardScaler
# Initialize StandardScaler
scaler = StandardScaler()
# Apply Standardization
df['Salary_Standardized'] = scaler.fit_transform(df[['Salary']])
print(df)
LAB # 04
SVM and Random Forest(RF)
CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score,
recall_score, f1_score
# Step 1: Manually Create Dataset
data = {
"Height (cm)": [170, 160, 175, 180, 165, 168, 172, 177, 158, 174, 182,
159, 169, 173, 166],
"Weight (kg)": [65, 55, 75, 80, 60, 63, 70, 78, 50, 72, 85, 52, 67,
73, 58],
"Category": [0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1]
}
df = pd.DataFrame(data)
print("Dataset:")
print(df)
# Step 2: Prepare Data for Training
X = df[["Height (cm)", "Weight (kg)"]]
y = df["Category"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
print("\nTraining Data:")
print(X_train)
print("\nTesting Data:")
print(X_test)
# Step 3: Train SVM Model
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)
y_train_pred_svm = svm_model.predict(X_train)
y_test_pred_svm = svm_model.predict(X_test)
print("\nSVM Model Performance:")
print("Training Accuracy: {:.2f}%".format(accuracy_score(y_train,
y_train_pred_svm) * 100))
print("Testing Accuracy: {:.2f}%".format(accuracy_score(y_test,
y_test_pred_svm) * 100))
print("Training Precision: {:.2f}%".format(precision_score(y_train,
y_train_pred_svm) * 100))
print("Testing Precision: {:.2f}%".format(precision_score(y_test,
y_test_pred_svm) * 100))
print("Training Recall: {:.2f}%".format(recall_score(y_train,
y_train_pred_svm) * 100))
print("Testing Recall: {:.2f}%".format(recall_score(y_test,
y_test_pred_svm) * 100))
print("Training F1-Score: {:.2f}%".format(f1_score(y_train,
y_train_pred_svm) * 100))
print("Testing F1-Score: {:.2f}%".format(f1_score(y_test, y_test_pred_svm)
* 100))
# Step 4: Train Random Forest Model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_train_pred_rf = rf_model.predict(X_train)
y_test_pred_rf = rf_model.predict(X_test)
print("\nRandom Forest Model Performance:")
print("Training Accuracy:", accuracy_score(y_train, y_train_pred_rf))
print("Testing Accuracy:", accuracy_score(y_test, y_test_pred_rf))
LAB # 05
K-Nearest Neighbours (KNN)
CODE:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score,
f1_score
# Step 1: Manually Create Dataset
data = {
"Height (cm)": [170, 160, 175, 180, 165, 168, 172, 177, 158, 174],
"Weight (kg)": [65, 55, 75, 80, 60, 63, 70, 78, 50, 72],
"Category": [0, 1, 1, 0, 1, 1, 0, 1, 0, 1]
}
df = pd.DataFrame(data)
print (df)
# Step 2: Split Dataset into Train & Test
X = df[["Height (cm)", "Weight (kg)"]].values
y = df["Category"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Step 3: Feature Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Step 4: Implementing KNN Classifier
k = 3 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
# Step 6: Evaluation
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1 Score: {f1:.2f}')
# Step 7: Predicting for a new individual
new_data = np.array([[169, 67]]) # Example: Height=169cm, Weight=67kg
new_data_scaled = scaler.transform(new_data)
prediction = knn.predict(new_data_scaled)
print(f'Predicted Category for Height=169cm and Weight=67kg:
{prediction[0]}')
LAB # 06 , 07 , 08
Eid Holidays, Mid Exams
LAB # 09
K-mean Clustering
CODE:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Create DataFrame from the given table
data = {
'X': [185, 170, 168, 179, 182, 188],
'Y': [72, 56, 60, 68, 72, 77]
}
df = pd.DataFrame(data)
# Initialize KMeans with 2 clusters and 2 iterations
kmeans = KMeans(n_clusters=2, max_iter=2, random_state=0, n_init='auto')
# Fit the model and predict clusters
df['Cluster'] = kmeans.fit_predict(df[['X', 'Y']])
# View results
print(df)
# Plot the clusters
plt.figure(figsize=(8, 6))
plt.scatter(df['X'], df['Y'], c=df['Cluster'], cmap='viridis', s=100)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1],
s=200, c='red', marker='X', label='Centroids')
plt.xlabel("X")
plt.ylabel("Y")
plt.title("K-Means Clustering (2 Clusters, 2 Iterations)")
plt.legend()
plt.grid(True)
plt.show()
LAB # 10
DBSCAN Clustering (with dataset)
CODE:
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
import seaborn as sns
# Load the Iris dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
# DBSCAN clustering
dbscan = DBSCAN(eps=0.5, min_samples=5) # Initialize DBSCAN with eps=0.5
and min_samples=5
df['Cluster'] = dbscan.fit_predict(df) # Fit DBSCAN and add the cluster
labels
# Visualizing the clustering results
plt.figure(figsize=(8,6))
sns.scatterplot(x=df.iloc[:, 0], y=df.iloc[:, 1], hue=df['Cluster'],
palette='viridis', style=df['Cluster'], markers=["o", "s", "D"])
plt.title('DBSCAN Clustering - Sepal Length vs Sepal Width')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.legend(title='Cluster')
plt.show()
# Display the cluster labels for each sample
print("Cluster labels for each sample:")
print(df['Cluster'].value_counts())
DBSCAN Clustering (Core Points, Border Points, and Noise)
CODE:
# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN
# Generate a synthetic dataset with 3 clusters
X, y = make_blobs(n_samples=300, centers=3, cluster_std=0.60,
random_state=0)
# DBSCAN clustering
dbscan = DBSCAN(eps=0.5, min_samples=5) # Initialize DBSCAN with eps=0.5
and min_samples=5
y_dbscan = dbscan.fit_predict(X) # Fit DBSCAN and get cluster labels
# Identifying core points, border points, and noise
core_samples_mask = np.zeros_like(y_dbscan, dtype=bool)
core_samples_mask[dbscan.core_sample_indices_] = True
# Visualizing the DBSCAN clustering results
plt.figure(figsize=(8,6))
# Core points
plt.scatter(X[core_samples_mask, 0], X[core_samples_mask, 1],
c=y_dbscan[core_samples_mask], cmap='viridis', s=50,
marker='o', label='Core Points')
# Border points
plt.scatter(X[~core_samples_mask & (y_dbscan != -1), 0],
X[~core_samples_mask & (y_dbscan != -1), 1],
c=y_dbscan[~core_samples_mask & (y_dbscan != -1)],
cmap='viridis', s=30, marker='^', label='Border Points')
# Noise points
plt.scatter(X[y_dbscan == -1, 0], X[y_dbscan == -1, 1],
c='red', s=30, marker='x', label='Noise')
plt.title('DBSCAN Clustering: Core Points, Border Points, and Noise')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.colorbar(label='Cluster Label')
plt.show()
# Displaying the cluster labels
print(f"Number of clusters: {len(set(y_dbscan)) - (1 if -1 in y_dbscan
else 0)}")
print(f"Number of noise points (outliers): {list(y_dbscan).count(-1)}")
print(f"Number of core points: {np.sum(core_samples_mask)}")
print(f"Number of border points: {np.sum((~core_samples_mask) & (y_dbscan
!= -1))}")