Continuous Internal Evaluation (CIE) Laboratory Work
Academic Year:-2023-2024 Course Name: - Introduction to Data Science
Name of Student: Yuv Sharma Class: BCA 4th sem Batch:-2022-23
Department : - ICT. Roll No : 03221302022
Date Title of Experiment
Performance &
Understanding
Exp
Signature of
&Participati
Submission
Submission
Attendance
(Out of 30)
.
Total
Teacher
Date of
Timely
No.
on
10 10 10 30
1 Create a pandas series from a
dictionary of values and an ndarray
2 Create a Series and print all the
elements that are above 75th
percentile.
3 Perform sorting on Series data and
DataFrames
4 Write a program to implement pivot()
and pivot-table() on a DataFrame.
5 Write a program to find mean
absolute deviation on a DataFrame
6 Create a DataFrame based on E-
Commerce data and generate mean,
mode, median.
7 Create a DataFrame based on
employee data and generate quartile
and
variance.
8 Write a program to create a
DataFrame to store weight, age and
name of
three people. Print the DataFrame and
its transpose.
9 Series objects Temp1, temp2, temp3,
temp 4 stores the temperature of
days of week 1, week 2, week 3,
week 4. Write a script to:-
a. Print average temperature per week
b. Print average temperature of entire
month
10 Predict the Weather with machine
learning
Subject In-charge Student Signature
Practical – 1
1. Create a pandas series from a dictionary of values and array
Code:
import pandas as pd
import numpy as np
# Example 1: Creating a Pandas Series from a dictionary
dictionary = {'A': 10, 'B': 20, 'C': 30}
series_from_dict = pd.Series(dictionary)
print("Series from dictionary:")
print(series_from_dict)
# Example 2: Creating a Pandas Series from an ndarray
arr = np.array([1, 3, 4, 7, 8, 8, 9])
series_from_ndarray = pd.Series(arr)
print("\nSeries from ndarray:")
print(series_from_ndarray)
OUTPUT :
Practical – 2
2. Create a Series and print all the elements that are above 75th percentile.
Code:
import pandas as pd
import numpy as np
# Create a Pandas Series
arr = np.array([42, 12, 72, 85, 56, 100])
ser = pd.Series(arr)
# Calculate the 75th percentile
quantile_value = ser.quantile(q=0.75)
print("75th Percentile is:", quantile_value)
print("Values that are greater than 75th percentile are:")
for val in ser:
if val > quantile_value:
print(val)
OUTPUT :
Practical-3
3. Perform sorting on Series data and DataFrames.
Code:
import pandas as pd
# Create a sample numeric series
s = pd.Series([100, 200, 54.67, 300.12, 400])
# Sort the series in ascending order
sorted_series = s.sort_values(ascending=True)
print(sorted_series)
Output:
Practical-4
4. Write a program to implement pivot() and pivot-table() on a DataFrame.
Code:
import pandas as pd
# Create a sample DataFrame
data = {
'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
'Fruit': ['Apple', 'Banana', 'Apple', 'Orange'],
'Sales': [100, 200, 150, 300]
}
df = pd.DataFrame(data)
# Create a pivot table
pivot_table = df.pivot_table(
values='Sales', # Column to aggregate
index='Date', # Rows (index)
columns='Fruit', # Columns
aggfunc='sum', # Aggregation function (sum of sales)
fill_value=0 # Fill missing values with 0
)
print(pivot_table)
Output:
Practical-5
5. Write a program to find mean absolute deviation on a DataFrame.
Code:
import pandas as pd
# Create a sample DataFrame
data = {
'A': [10, 20, 30, 40],
'B': [15, 25, 35, 45]
}
df = pd.DataFrame(data)
# Calculate MAD for each column
mad_A = df['A'].mad()
mad_B = df['B'].mad()
print(f"MAD for column A: {mad_A:.2f}")
print(f"MAD for column B: {mad_B:.2f}")
# Calculate overall MAD for the entire DataFrame
overall_mad = df.mad().mean()
print(f"Overall MAD: {overall_mad:.2f}")
Output:
Practical-6
6. Create a DataFrame based on E Commerce data and generate mean, mode, median.
Code:
import pandas as pd
# Sample e-commerce data
data = {
'Order_ID': [101, 102, 103, 104, 105],
'Product': ['Laptop', 'Phone', 'Tablet', 'Headphones', 'Camera'],
'Price': [1000, 800, 500, 150, 600]
}
df = pd.DataFrame(data)
# Display the first 5 records
print("Sample e-commerce DataFrame:")
print(df.head())
# Calculate mean, mode, and median Output:
mean_price = df['Price'].mean()
mode_price = df['Price'].mode().iloc[0]
median_price = df['Price'].median()
print(f"\nMean Price: ${mean_price:.2f}")
print(f"Mode Price: ${mode_price:.2f}")
print(f"Median Price: ${median_price:.2f}")
Practical-7
7. Create a DataFrame based on employee data and generate quartile and variance.
Code:
import pandas as pd
# Sample employee data
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [30, 25, 28, 32],
'Salary': [60000, 70000, 55000, 80000],
'Department': ['HR', 'IT', 'Finance', 'Sales']
}
df = pd.DataFrame(data)
# Calculate quartiles for the 'Salary' column
quartiles = df['Salary'].quantile([0.25, 0.5, 0.75])
print("Quartiles for Salary:")
print(quartiles)
salary_variance = df['Salary'].var()
print(f"Variance of Salary: {salary_variance:.2f}")
Output:
Practical-8
8. Write a program to create a DataFrame to store weight, age and name of three people.
Print the DataFrame and its transpose.
Code:
import pandas as pd
# Create a dictionary with sample data
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [30, 25, 28],
'Weight': [60.5, 70.2, 65.8]
}
# Create the DataFrame
df = pd.DataFrame(data)
# Print the original DataFrame
print("Original DataFrame:")
print(df)
# Print the transpose (switch rows and columns)
print("\nTransposed DataFrame:") Output:
print(df.transpose())
Practical-9
9. Series objects Temp1, temp2, temp3, temp 4 stores the temperature of days of week 1,
week 2, week 3, week 4. Write a script to: - a. Print average temperature per week b. Print
average temperature of entire month.
Code:
# Sample temperature data (replace with actual values)
Temp1 = [25, 28, 30, 27, 26, 29, 24] # Week 1
Temp2 = [23, 22, 24, 25, 21, 20, 22] # Week 2
Temp3 = [28, 27, 26, 29, 30, 28, 27] # Week 3
Temp4 = [31, 32, 30, 33, 34, 31, 32] # Week 4
# Calculate average temperature per week Output:
def average_temperature(temps):
return sum(temps) / len(temps)
avg_week1 = average_temperature(Temp1)
avg_week2 = average_temperature(Temp2)
avg_week3 = average_temperature(Temp3)
avg_week4 = average_temperature(Temp4)
print(f"Average temperature for Week 1: {avg_week1:.2f}°C")
print(f"Average temperature for Week 2: {avg_week2:.2f}°C")
print(f"Average temperature for Week 3: {avg_week3:.2f}°C")
print(f"Average temperature for Week 4: {avg_week4:.2f}°C")
# Calculate average temperature for the entire month
all_temps = Temp1 + Temp2 + Temp3 + Temp4
avg_month = average_temperature(all_temps)
print(f"Average temperature for the entire month: {avg_month:.2f}°C")
Practical-10
10. Predict the Weather with machine learning
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier
from sklearn.metrics import precision_score, recall_score, f1_score
# Load the dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-
births.csv'
df = pd.read_csv(url)
# Preprocessing: Convert date to datetime, extract month and day
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
# Encode 'Rain' column (target variable)
le = LabelEncoder()
df['Rain'] = le.fit_transform(df['Rain'])
# Features and target
X = df[['Month', 'Day']]
y = df['Rain']
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize classifiers
xgb = XGBClassifier()
knn = KNeighborsClassifier()
ada = AdaBoostClassifier()
# Train classifiers
xgb.fit(X_train, y_train)
knn.fit(X_train, y_train)
ada.fit(X_train, y_train)
# Evaluate performance
y_pred_xgb = xgb.predict(X_test)
y_pred_knn = knn.predict(X_test)
y_pred_ada = ada.predict(X_test)
precision = precision_score(y_test, y_pred_xgb)
recall = recall_score(y_test, y_pred_xgb)
f1 = f1_score(y_test, y_pred_xgb)
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}") Output: