0% found this document useful (0 votes)

21 views28 pages

Index: SR. NO. Practical Name Date of Perform NO. Sign

The document outlines practical exercises for a Data Science course, covering topics such as Excel usage, data pre-processing, hypothesis testing, ANOVA, regression analysis, logistic regression, clustering, PCA, and data visualization. Each section includes specific tasks, programming examples, and aims to provide hands-on experience with data manipulation and analysis techniques. The practicals are structured with dates and page numbers for reference.

Uploaded by

rajsawant03042005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views28 pages

Index: SR. NO. Practical Name Date of Perform NO. Sign

Uploaded by

rajsawant03042005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

INDEX

SR. DATE OF PAGE

PRACTICAL NAME SIGN.
NO. PERFORM NO.
1. Introduction to Excel
 Perform conditional formatting on a dataset
using various criteria.
 Create a pivot table to analyze and
summarize data.
 Use VLOOKUP function to retrieve
14/12/2024 4-6
information from a different worksheet or
table.
 Perform what-if analysis using Goal Seek to
determine input values for desired output.
2. Data Frames and Basic Data Pre-processing
 Read data from CSV and JSON files into a
data frame.
 Perform basic data pre-processing tasks such
as handling missing values and outliers.
04/01/2025 7-9
 Manipulate and transform data using
functions like filtering, sorting, and
grouping.
3. Feature Scaling and Domification
 Apply feature-scaling techniques like
standardization and normalization to
numerical features. 04/01/2025 10-12
 Perform feature domification to convert
categorical variables into numerical
representations
4. Hypothesis Testing
 Formulate null and alternative hypotheses
for a given problem.
 Conduct a hypothesis test using appropriate 11/01/2025 13-15
statistical tests (e.g., t-test, chi square test).
 Interpret the results and draw conclusions
based on the test outcomes.
Name: Yash Pandit Std: TY. B.Sc. Computer Science
Subject: Data Science - Practical Roll No. 51 , Batch :A , Div : A

5. ANOVA (Analysis of Variance)

 Perform one-way ANOVA to compare
means across multiple groups. 11/01/2025 16-17
 Conduct post-hoc tests to identify significant
differences between group means.
6. Regression and Its Types
 Implement simple linear regression using a
dataset.
 Explore and interpret the regression model
coefficients and goodness-of-fit measures.
18/01/2025 18-19
 Extend the analysis to multiple linear
regression and assess the impact of
additional predictors.
7. Logistic Regression and Decision Tree
 Build a logistic regression model to predict
a binary outcome.
 Evaluate the model's performance using
classification metrics (e.g., accuracy,
25/01/2025 20-21
precision, recall).
 Construct a decision tree model and interpret
the decision rules for classification.
8. K-Means Clustering
 Apply the K-Means algorithm to group
similar data points into clusters.
 Determine the optimal number of clusters 08/02/2025 22-23
using elbow method or silhouette analysis.
 Visualize the clustering results and analyse
the cluster characteristics.
9. Principal Component Analysis (PCA)
 Perform PCA on a dataset to reduce
dimensionality.
08/02/2025 24-26
 Evaluate the explained variance and select
the appropriate number of principal
components.

2
 Visualize the data in the reduced-
dimensional space.

10. Data Visualization and Storytelling

 Create meaningful visualizations using data
visualization tools
 Combine multiple visualizations to tell a 15/02/2025 27-29
compelling data story.
 Present the findings and insights in a clear
and concise manner.

3
Practical 1
Aim: Introduction to Excel
 Perform conditional formatting on a dataset using various criteria.
 Create a pivot table to analyze and summarize data.
 Use VLOOKUP function to retrieve information from a different worksheet or
table.
 Perform what-if analysis using Goal Seek to determine input values for desired
output.
A: Perform conditional formatting on a dataset using various criteria.
Step 1: Go to conditional formatting > Greater Than

Step 2: Enter the greater than filter value for example 2000.

Step 3: Go to Data Bars > Solid Fill in conditional formatting.

4
B. Create a pivot table to analyse and summarize data.
Step 1: select the entire table and go to Insert tab PivotChart > Pivotchart .
Step 2: Select “New worksheet” in the create pivot chart window.

Step 3: Select and drag attributes in the below boxes.

C. Use VLOOKUP function to retrieve information from a different worksheet or table.

Step 1: Click on an empty cell and type the following command. =VLOOKUP(B3, B3:D3,1,
TRUE)

5
Perform what-if analysis using Goal Seek to determine input values for desired output.
Steps-
Step 1: In the Data tab go to the what if analysis>Goal seek.

Step 2: Fill the information in the window accordingly and click ok.

6
Practical 2
Aim: Data Frames and Basic Data Pre-processing
 Read data from CSV and JSON files into a data frame.
 Perform basic data pre-processing tasks such as handling missing values and
outliers.
 Manipulate and transform data using functions like filtering, sorting, and
grouping.

Program 1: Read data from CSV and JSON files into a data frame.

import pandas as pd
db=pd.read_csv('D:\DATA SCIENCE\student_marks.csv')
data=pd.read_json('D:\DATA SCIENCE\IRIS.json')
print("CSV Dataset")
print(df)
print("JSON Dataset")
print(data)

Output:

Program 2 : Perform basic data pre-processing tasks such as handling missing values
and outliers.
import pandas as pd
df=pd.read_csv(r'D:\\DATA SCIENCE\\titanic.csv')
print(df.head(10))
data=pd.read_json(r'D:\\DATA SCIENCE\\IRIS.json')
print("Dataset after filling NA values with 0:")
df.fillna(value=0, inplace=True)
print(df.head(10))
print("Dataset after dropping remaining NA Values:")

7
df.dropna(inplace=True)
print(df.head(10))

Output:

Program 3: Manipulate and transform data using functions like filtering, sorting, and
grouping.

import pandas as pd
iris = pd.read_csv('iris.csv')
setosa = iris[iris['Species'] == 'setosa']
print("Setosa samples: ")
print(setosa.head())
sorted_iris = iris.sort_values(by='SepalLengthCm', ascending=False)
print('\nSorted iris dataset: ')

8
print(sorted_iris.head())
grouped_species=iris.groupby('Species').mean()
print('\nMean measurements for each species:')
print(grouped_species)

Output:

9
Practical 3
Aim: Feature Scaling and Dummification
 Apply feature-scaling techniques like standardization and normalization to
numerical features.
 Perform feature dummification to convert categorical variables into numerical
representations.

Program 1: Apply feature-scaling techniques like standardization and normalization to

numerical features.

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Reading Data
df = pd.read_csv('D:/Data Science/wine.csv', header=None, usecols=[0, 1, 2], skiprows=1)
# Renaming Columns
df.columns = ['Class', 'Alcohol', 'Malic Acid']
# Printing Original DataFrame
print("Original DataFrame:")
print(df)
# MinMax Scaling
scaling = MinMaxScaler()
scaled_value = scaling.fit_transform(df[['Alcohol', 'Malic Acid']])
df[['Alcohol', 'Malic Acid']] = scaled_value
# Printing DataFrame after MinMax Scaling
print("\nDataFrame after MinMax Scaling:")
print(df)
# Standard Scaling
scaling = StandardScaler()
scaled_standard_value = scaling.fit_transform(df[['Alcohol', 'Malic Acid']])
df[['Alcohol', 'Malic Acid']] = scaled_standard_value
# Printing DataFrame after Standard Scaling
print("\nDataFrame after Standard Scaling:")
print(df)

Output:

10
Program 2: Perform feature dummification to convert categorical variables into
numerical representations.
import pandas as pd
from sklearn.preprocessing import LabelEncoder
#Reading Data
iris=pd.read_csv("D:\Data Science\computer.csv")
#Printing Iris Columns
print("Columns in dataset: ")
print(iris.columns)
#Printing Iris Rows
print("Head in dataset: ")
print(iris.head())

11
#Encoding Categorical Data
le=LabelEncoder()
if 'Species' in iris.columns:
iris['code'] = le.fit_transform(iris['Species'])
print("\nDataset after Label Encoding: ")
print(iris)
else:
print("The column 'Species' is not found in dataset")

Output:

12
Practical 4
Aim: Hypothesis Testing
 Formulate null and alternative hypotheses for a given problem.
 Conduct a hypothesis test using appropriate statistical tests (e.g., t-test, chi square
test).
 Interpret the results and draw conclusions based on the test outcomes.
Program: -
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from scipy import stats

np.random.seed(42)

# Two-Sample t-test

sample1 = np.random.normal(10, 2, 30)

sample2 = np.random.normal(12, 2, 30)

t_stat, p_value = stats.ttest_ind(sample1, sample2)

alpha = 0.05

print(f'T-statistic: {t_stat}, P-value: {p_value}, DF: {len(sample1) + len(sample2) - 2}')

# Plotting the distributions

plt.hist([sample1, sample2], alpha=0.5, label=['Sample 1', 'Sample 2'], color=['blue', 'orange'])

plt.axvline(np.mean(sample1), color='blue', linestyle='dashed', linewidth=2)

plt.axvline(np.mean(sample2), color='orange', linestyle='dashed', linewidth=2)

plt.title('Distributions of Sample 1 and Sample 2')

plt.xlabel('Values')
plt.ylabel('Frequency')
plt.legend()

if p_value < alpha:

plt.fill_between(np.linspace(min(sample1.min(), sample2.min()), max(sample1.max(),
sample2.max()), 1000), 0, 0.15, color='red', alpha=0.3, label='Critical Region')

plt.text(np.mean(sample2), 5, f'T-statistic: {t_stat:.2f}', ha='center', va='center', color='black',

backgroundcolor='white')

plt.show()

# Conclusion for t-test

if p_value < alpha:

13
print(f"Conclusion: Reject null hypothesis. Mean of Sample {'1' if np.mean(sample1) >
np.mean(sample2) else '2'} is significantly higher.")
else:
print("Conclusion: Fail to reject null hypothesis. No significant difference in means.")

# Chi-Square Test on 'mpg' dataset

df = sb.load_dataset('mpg')

# Bin the 'horsepower' column into categories

df['horsepower_new'] = pd.cut(df['horsepower'], bins=[0, 75, 150, 240], labels=['low',
'medium', 'high'])

# Bin the 'model year' column into categories

df['modelyear_new'] = pd.cut(df['model_year'], bins=[69, 72, 74, 84], labels=['t1', 't2', 't3'])

# Perform Chi-Square test

chi2_stat, p_val_chi, dof, expected =
stats.chi2_contingency(pd.crosstab(df['horsepower_new'], df['modelyear_new']))

print(f"Chi-square: {chi2_stat}, P-value: {p_val_chi}, DF: {dof}")

# Conclusion for Chi-Square Test

if p_val_chi < alpha:
print("Conclusion: Reject null hypothesis. Significant association between horsepower and
model year.")
else:
print("Conclusion: Fail to reject null hypothesis. No significant association.")

Output: -

14
15
Practical no. 5
Aim: ANOVA (Analysis of Variance)
 Perform one-way ANOVA to compare means across multiple groups.
 Conduct post-hoc tests to identify significant differences between
group means.

Program:

import pandas as pd
from scipy.stats import stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd
group1 = [23,25,29,34,30]
group2 = [19,20,22,24,25]
group3 = [15,18,20,21,17]
group4 = [28,24,26,30,29]
all_data = group1 + group2 + group3 + group4
group_labels = ['Group 1']*len(group1) + ['Group 2']*len(group2) + ['Group 3']*len(group3)
+ ['Group 4']*len(group4)
f_stats, p_value = stats.f_oneway(group1,group2,group3,group4)
print("One-way ANOVA Results: ")
print(f"F-statistics: {f_stats: .4f}")
print(f"P-value: {p_value:.4f}")
if p_value < 0.05:
print("\nTukey-Kramer post-hoc test:")
tukey_results = pairwise_tukeyhsd(all_data, group_labels)
print(tukey_results)
else:
print("\nNo significant differences found in ANOVA; post-hoc test not needed.")

Output:

16
Practical 6
Aim: -Regression and Its Types
 Implement simple linear regression using a dataset.
 Explore and interpret the regression model coefficients and goodness-
of-fit measures.
 Extend the analysis to multiple linear regression and assess the impact
of additional predictors.
Program: -

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score # Fixed the import

# Load dataset
housing = fetch_california_housing()
housing_df = pd.DataFrame(housing.data, columns=housing.feature_names)
housing_df['PRICE'] = housing.target

print("First few rows of the dataset:")

print(housing_df.head())

print("\nSimple Linear Regression:")

X = housing_df[['AveRooms']] # Fixed column name to match dataset
y = housing_df['PRICE']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")

print(f"R-squared: {r2:.4f}")
print(f"Intercept: {model.intercept_:.4f}")
print(f"Coefficient: {model.coef_[0]:.4f}")

print("\nMultiple Linear Regression:")

X = housing_df.drop('PRICE', axis=1)
y = housing_df['PRICE']

17
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) #
Fixed missing y_train

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")

print(f"R-squared: {r2:.4f}")
print(f"Intercept: {model.intercept_:.4f}")

print("Coefficients:")
for feature, coef in zip(housing_df.columns[:-1], model.coef_):
print(f"{feature}: {coef:.4f}")

Output: -

18
Practical 7
Aim: Logistic Regression and Decision Tree
 Build a logistic regression model to predict a binary outcome.
 Evaluate the model's performance using classification metrics (e.g.,
accuracy, precision, recall).
 Construct a decision tree model and interpret the decision rules for
classification.
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score,
classification_report

# Load the Iris dataset and create a binary classification problem

iris = load_iris()
iris_df = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
columns=iris['feature_names'] + ['target'])

# Keep only two classes for binary classification

binary_df = iris_df[iris_df['target'] != 2]
X = binary_df.drop('target', axis=1)
y = binary_df['target']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression model and evaluate its performance

logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
y_pred_logistic = logistic_model.predict(X_test)

print("Logistic Regression Metrics")

print("Accuracy:", accuracy_score(y_test, y_pred_logistic))
print("Precision:", precision_score(y_test, y_pred_logistic))
print("Recall:", recall_score(y_test, y_pred_logistic))
print("\nClassification Report")
print(classification_report(y_test, y_pred_logistic))

# Train a decision tree model and evaluate its performance

19
decision_tree_model = DecisionTreeClassifier()
decision_tree_model.fit(X_train, y_train)
y_pred_tree = decision_tree_model.predict(X_test)

print("\nDecision Tree Metrics")

print("Accuracy:", accuracy_score(y_test, y_pred_tree))
print("Precision:", precision_score(y_test, y_pred_tree))
print("Recall:", recall_score(y_test, y_pred_tree))
print("\nClassification Report")
print(classification_report(y_test, y_pred_tree))

Output:

20
Practical 8
Aim: K-Means Clustering
 Apply the K-Means algorithm to group similar data points into
clusters.
 Determine the optimal number of clusters using elbow method or
silhouette analysis.
 Visualize the clustering results and analyse the cluster characteristics.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
# Load the dataset
data_path = "C:\Users\Admin\Downloads\wholesaler.csv"
data = pd.read_csv(data_path)
print(data.head())
# Define categorical and continuous features
categorical_features = ['Channel', 'Region']
continuous_features = ['Fresh', 'Milk', 'Grocery', 'Frozen', 'Detergents_Paper', 'Delicassen']
print(data[continuous_features].describe())
# One-hot encoding for categorical features
for col in categorical_features:
dummies = pd.get_dummies(data[col], prefix=col)
data = pd.concat([data, dummies], axis=1)
data.drop(col, axis=1, inplace=True)
print(data.head())
# Scale the data
scaler = MinMaxScaler()
data_transformed = scaler.fit_transform(data)
# Elbow method to determine the optimal number of clusters
sum_of_squared_distances = []
k_range = range(1, 15)
for k in k_range:
km = KMeans(n_clusters=k, random_state=42)
km.fit(data_transformed)
sum_of_squared_distances.append(km.inertia_)
# Plot the Elbow graph
plt.figure()
plt.plot(k_range, sum_of_squared_distances, 'bo-')

21
plt.xlabel("Number of clusters (K)")
plt.ylabel("Sum of squared distances (Inertia)")
plt.title("Elbow Method for Optimal K")
plt.show()

Output:

22
Practical 9
Aim: Principal Component Analysis (PCA)
 Perform PCA on a dataset to reduce dimensionality.
 Evaluate the explained variance and select the appropriate number of
principal components.
 Visualize the data in the reduced-dimensional space.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the dataset

iris = load_iris()
iris_df = pd.DataFrame(data=np.c_[iris['data'], iris['target']], columns=iris['feature_names'] +
['target'])

# Separate features and target

X = iris_df.drop("target", axis=1)
y = iris_df["target"]

# Standardize the features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)
explained_variance_ratio = pca.explained_variance_ratio_

# Plot cumulative explained variance

plt.figure(figsize=(8, 6))
plt.plot(np.cumsum(explained_variance_ratio), marker='o', linestyle='-')
plt.title('Cumulative Explained Variance Ratio')
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance Ratio')
plt.grid(True)

# Find number of components explaining 95% variance

23
n_components = np.argmax(np.cumsum(explained_variance_ratio) >= 0.95) + 1
plt.axvline(x=n_components, color='r', linestyle='--')
plt.text(n_components, 0.9, '95% variance\nexplained', color='red', ha='right')
plt.show()

print(f"Number of principal components to explain 95% variance: {n_components}")

# Reduce dimensions using selected components

pca = PCA(n_components=n_components)
X_reduced = pca.fit_transform(X_scaled)

# Scatter plot of first two principal components

plt.figure(figsize=(10, 6))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='viridis', s=50, alpha=0.7)
plt.title('Data in Reduced-dimensional Space (PC1 and PC2)')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar(label="Target")
plt.show()
Output:

24
25
Practical 10
Aim: Data Visualization and Storytelling
 Create meaningful visualizations using data visualization tools
 Combine multiple visualizations to tell a compelling data story.
 Present the findings and insights in a clear and concise manner.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

def generate_data():
np.random.seed(42)

data = pd.DataFrame({
'variable1': np.random.normal(0, 1, 1000),
'variable2': np.random.normal(2, 2, 1000) + 0.5 * np.random.normal(0, 1, 1000),
'variable3': np.random.normal(-1, 1.5, 1000),
'category': pd.Series(np.random.choice(['A', 'B', 'C', 'D'], size=1000, p=[0.4, 0.3, 0.2,
0.1]), dtype='category')
})

return data

def create_visualizations(data):
# Scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(data['variable1'], data['variable2'], alpha=0.5, c='b')
plt.title('Figure 1: Relationship between Variable 1 and Variable 2', fontsize=16)
plt.xlabel('Variable 1', fontsize=14)
plt.ylabel('Variable 2', fontsize=14)
plt.grid(True)
plt.show()

# Count plot
plt.figure(figsize=(10, 6))
sns.countplot(x='category', data=data, palette='coolwarm')
plt.title('Figure 2: Distribution of Categories', fontsize=16)
plt.xlabel('Category', fontsize=14)
plt.ylabel('Count', fontsize=14)
plt.xticks(rotation=45)
plt.show()

# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(data[['variable1', 'variable2', 'variable3']].corr(), annot=True,
cmap='coolwarm')

26
plt.title('Figure 3: Correlation Heatmap', fontsize=16)
plt.show()

def data_storytelling():
print("\nData Storytelling\n")
print("Title: Exploring the Relationship between Variable 1 and Variable 2")
print("\nFigure 1: Scatter Plot of Variable 1 and Variable 2 shows a positive correlation.")
print("Figure 2: Bar Chart of Categories shows Category A is the most common.")
print("Figure 3: Correlation Heatmap shows a strong correlation between Variable 1 and
Variable 2.")

def main():
data = generate_data()
create_visualizations(data)
data_storytelling()

if name == " main ":

main()

Output:

Scatter Plot:

27
Bar Chart:

Correlation Heatmap

DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Data Science Journal
No ratings yet
Data Science Journal
40 pages
Disha Data Science
No ratings yet
Disha Data Science
27 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
Data - Science - Manaul (Te)
No ratings yet
Data - Science - Manaul (Te)
78 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
DS Practical (BSC CS)
No ratings yet
DS Practical (BSC CS)
49 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
DS Journal - Final
No ratings yet
DS Journal - Final
37 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
33 pages
3-DSEs UGCF CS (H) Approved Facultymay25
No ratings yet
3-DSEs UGCF CS (H) Approved Facultymay25
44 pages
Data Science Lab Manual - EDITED - MAIN
No ratings yet
Data Science Lab Manual - EDITED - MAIN
34 pages
Practicals
No ratings yet
Practicals
42 pages
Data Science Practical With Solutions BSC Cs Sem 6
No ratings yet
Data Science Practical With Solutions BSC Cs Sem 6
29 pages
DS Journal-1
No ratings yet
DS Journal-1
25 pages
Datascience Practicals
No ratings yet
Datascience Practicals
23 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Data Science
No ratings yet
Data Science
30 pages
Ds Practical Final
No ratings yet
Ds Practical Final
25 pages
Essential Python
No ratings yet
Essential Python
16 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
DSBDAL Lab Manual
No ratings yet
DSBDAL Lab Manual
26 pages
Dsbda Lab - 1 - 1736243987425
No ratings yet
Dsbda Lab - 1 - 1736243987425
10 pages
OJT-Field Report - Research Project Format 2025
No ratings yet
OJT-Field Report - Research Project Format 2025
9 pages
Omkar
No ratings yet
Omkar
37 pages
PracticalList - EDT - BCA - 2024 SET B1 - 4
No ratings yet
PracticalList - EDT - BCA - 2024 SET B1 - 4
8 pages
Week 6 - Data Cleaning
No ratings yet
Week 6 - Data Cleaning
8 pages
Department of Collegiate and Technical Education
No ratings yet
Department of Collegiate and Technical Education
11 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Data Science
No ratings yet
Data Science
18 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
Question-Answers in Machine Learning
No ratings yet
Question-Answers in Machine Learning
14 pages
Datascience
No ratings yet
Datascience
8 pages
Diploma in Information Technology: Centralized Question Bank
No ratings yet
Diploma in Information Technology: Centralized Question Bank
4 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Problem Set 4
No ratings yet
Problem Set 4
2 pages
Diagnosis of GCM RCM Driven Rainfall Patterns Under Changing Climate Through The Robust Selection of Multi Model Ensemble and Sub Ensembles
No ratings yet
Diagnosis of GCM RCM Driven Rainfall Patterns Under Changing Climate Through The Robust Selection of Multi Model Ensemble and Sub Ensembles
30 pages
22CB340
No ratings yet
22CB340
4 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Online Payment Fraud Detection Using Machine Learning Model
No ratings yet
Online Payment Fraud Detection Using Machine Learning Model
8 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
Decision Making
No ratings yet
Decision Making
8 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Dissertation On Fraud Detection Using Machine Learning
No ratings yet
Dissertation On Fraud Detection Using Machine Learning
92 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
ML Lab Manual
No ratings yet
ML Lab Manual
53 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Junsay Ms9 Module 1 Leganes Bsa3 A e
No ratings yet
Junsay Ms9 Module 1 Leganes Bsa3 A e
23 pages
Machine Learning-Based Models For Accurate Car Pri
No ratings yet
Machine Learning-Based Models For Accurate Car Pri
6 pages
Thesis Template Final Content v6
No ratings yet
Thesis Template Final Content v6
75 pages
Drug Recommendation System
No ratings yet
Drug Recommendation System
21 pages
Structured Data Classification MCQ's
No ratings yet
Structured Data Classification MCQ's
6 pages
KTU Fund
No ratings yet
KTU Fund
13 pages
A Course in Machine Learning 1648562733
No ratings yet
A Course in Machine Learning 1648562733
193 pages
Developing AI-based Fraud Detection Systems For Banking and Finance
No ratings yet
Developing AI-based Fraud Detection Systems For Banking and Finance
7 pages
DrOGA An Artificial Intelligence Solution For
No ratings yet
DrOGA An Artificial Intelligence Solution For
17 pages
Onlinepay
No ratings yet
Onlinepay
23 pages
Decision Tree Learning Through A Predictive Model F - 2021 - Computers and Educa
No ratings yet
Decision Tree Learning Through A Predictive Model F - 2021 - Computers and Educa
12 pages
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
No ratings yet
Machine Learning-Powered Web Application For Predicting and Identifying Fake Job Listing
6 pages
Ai Unit 5 Part 3
No ratings yet
Ai Unit 5 Part 3
9 pages
Regression Model To Predict Bike Sharing 12110784
No ratings yet
Regression Model To Predict Bike Sharing 12110784
12 pages
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
No ratings yet
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
3 pages
Comparative Research On Network Intrusion Detection Methods Based
No ratings yet
Comparative Research On Network Intrusion Detection Methods Based
17 pages
Report Assignment TIS3151 (The Nanobots)
No ratings yet
Report Assignment TIS3151 (The Nanobots)
21 pages
Tesfaye Kumsa Moroda ID. 2202954 Risk Individual Assignment
No ratings yet
Tesfaye Kumsa Moroda ID. 2202954 Risk Individual Assignment
9 pages
Decision Making Tools: Presented By:-Nikita Saini Mba 4
No ratings yet
Decision Making Tools: Presented By:-Nikita Saini Mba 4
9 pages
Q.1) Distinguish Between Decisions Under Uncertainty and Under Risk For An Appropriate Decision Analysis
No ratings yet
Q.1) Distinguish Between Decisions Under Uncertainty and Under Risk For An Appropriate Decision Analysis
17 pages
Journal Paper 1
No ratings yet
Journal Paper 1
5 pages
GAMBOA - Assignment No.1 Decision Tree Analysis
No ratings yet
GAMBOA - Assignment No.1 Decision Tree Analysis
4 pages
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

Index: SR. NO. Practical Name Date of Perform NO. Sign

Uploaded by

Index: SR. NO. Practical Name Date of Perform NO. Sign

Uploaded by

INDEX

SR. DATE OF PAGE

5. ANOVA (Analysis of Variance)

10. Data Visualization and Storytelling

Step 3: Go to Data Bars > Solid Fill in conditional formatting.

Step 3: Select and drag attributes in the below boxes.

C. Use VLOOKUP function to retrieve information from a different worksheet or table.

Program 1: Apply feature-scaling techniques like standardization and normalization to

sample1 = np.random.normal(10, 2, 30)

t_stat, p_value = stats.ttest_ind(sample1, sample2)

print(f'T-statistic: {t_stat}, P-value: {p_value}, DF: {len(sample1) + len(sample2) - 2}')

# Plotting the distributions

plt.axvline(np.mean(sample1), color='blue', linestyle='dashed', linewidth=2)

plt.title('Distributions of Sample 1 and Sample 2')

if p_value < alpha:

plt.text(np.mean(sample2), 5, f'T-statistic: {t_stat:.2f}', ha='center', va='center', color='black',

# Conclusion for t-test

# Chi-Square Test on 'mpg' dataset

# Bin the 'horsepower' column into categories

# Bin the 'model year' column into categories

# Perform Chi-Square test

print(f"Chi-square: {chi2_stat}, P-value: {p_val_chi}, DF: {dof}")

# Conclusion for Chi-Square Test

print("First few rows of the dataset:")

print("\nSimple Linear Regression:")

mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")

print("\nMultiple Linear Regression:")

mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error: {mse:.4f}")

# Load the Iris dataset and create a binary classification problem

# Keep only two classes for binary classification

# Split the data into training and testing sets

# Train a logistic regression model and evaluate its performance

print("Logistic Regression Metrics")

# Train a decision tree model and evaluate its performance

print("\nDecision Tree Metrics")

# Load the dataset

# Separate features and target

# Standardize the features

# Plot cumulative explained variance

# Find number of components explaining 95% variance

print(f"Number of principal components to explain 95% variance: {n_components}")

# Reduce dimensions using selected components

# Scatter plot of first two principal components

if name == " main ":

You might also like