Assignment 1: Data Handling using Python
Roll number: 24071227 and 24071232
Student Name: Samanpreet Singh and Hardeep singh
Group: 7
Date of submission: 28-01-2025
Submitted to: Dr. Sukhjeet Kaur Ranade & Ms. Rama Rani
Program Title: Dataset of Phone Usage in india
Code
# Import the pandas library
import pandas as pd
# load the original data
df=pd.read_csv("E:\phone_usage_india_dirty.csv")
# Here we add dashline to understand more easyliy using following command
print("-" * 40)
# Display the number of rows, columns and Datatypes
datatypes=df.dtypes
print("Number of rows:",df.shape[0])
print("Number of columns:",df.shape[1])
print("-" * 40)
print("Data types:")
print(datatypes)
print("-" * 40)
# List of continuous features by only taking integer values
continue_features=df.select_dtypes(include=['int']).columns
# Print the continuous series
print("Continue_features:")
for feature in continue_features:
print(feature)
print("-" * 40)
# Display the dataset size (Number of rows)
print("Dataset size (Number of rows):",df.shape[0])
print("-" * 40)
# Find the number of null values in each column
null_values = df.isnull().sum()
# Print the results
print("Null values in the dataset")
print(null_values)
print("-" * 40)
# Identify discrete (categorical) features
categorical_features = df.select_dtypes(include=['object', 'category', 'float', 'int'])
# Count the number of unique categories for each discrete feature
category_counts = categorical_features.nunique()
# Print the results with proper alignment
print(f"{'Feature':<25}{'Unique Categories':<50}")
for feature, count in category_counts.items():
print(f"{feature:<30} {count:<60}")
print("-" * 40)
# Print the range according to user specifications
def print_csv_range(df, start, end):
try:
# Load the CSV file into a DataFrame
df=pd.read_csv("E:\phone_usage_india_dirty.csv")
# Print the specified range of rows
print(df.iloc[start:end])
# In case of finding an error
except FileNotFoundError:
print(f"Error: The file at '{df}' was not found.")
except Exception as e:
print(f"An error occurred: {e}")
# Get value from user
start=int(input("enter the starting range value:" ))
end=int(input("Enter the ending range value:" ))
print_csv_range(df, start, end)
print("-" * 40)
Output
df=pd.read_csv("E:\phone_usage_india_dirty.csv")
----------------------------------------
Number of rows: 53058
Number of columns: 19
----------------------------------------
Data types:
User ID object
Age float64
Gender object
Location object
Phone Brand object
OS object
Screen Time (hrs/day) float64
Data Usage (GB/month) float64
Calls Duration (mins/day) float64
Number of Apps Installed float64
Social Media Time (hrs/day) float64
E-commerce Spend (INR/month) float64
Streaming Time (hrs/day) float64
Gaming Time (hrs/day) float64
Monthly Recharge Cost (INR) float64
Primary Use object
Timestamp object
Customer_Satisfaction int64
Customer_Lifetime_Value float64
dtype: object
----------------------------------------
Continue_features:
Customer_Satisfaction
----------------------------------------
Dataset size (Number of rows): 53058
----------------------------------------
Null values in the dataset
User ID 5452
Age 5048
Gender 5335
Location 5247
Phone Brand 5303
OS 5353
Screen Time (hrs/day) 4822
Data Usage (GB/month) 4997
Calls Duration (mins/day) 5238
Number of Apps Installed 5456
Social Media Time (hrs/day) 5292
E-commerce Spend (INR/month) 5018
Streaming Time (hrs/day) 5334
Gaming Time (hrs/day) 5245
Monthly Recharge Cost (INR) 5270
Primary Use 5289
Timestamp 0
Customer_Satisfaction 0
Customer_Lifetime_Value 5018
dtype: int64
----------------------------------------
Feature Unique Categories
User ID 17665
Age. 47
Gende 3
Location 10
Phone Brand 10
OS 2
Screen Time (hrs/day) 112
Data Usage (GB/month) 492
Calls Duration (mins/day) 2945
Number of Apps Installed 191
Social Media Time (hrs/day) 56
E-commerce Spend (INR/month) 8195
Streaming Time (hrs/day) 76
Gaming Time (hrs/day) 51
Monthly Recharge Cost (INR) 1901
Primary Use 5
Timestamp 53058
Customer_Satisfaction 5
Customer_Lifetime_Value 48040
----------------------------------------
enter the starting range value:100
Enter the ending range value:2000
User ID Age Gender Location Phone Brand ... Monthly Recharge Cost (INR) Primary Use Timestamp Customer_Satisfaction
Customer_Lifetime_Value
100 U00101 31.0 Female Pune Apple ... NaN Education 2023-01-05 04:00:00 2 168606.410928
101 U00102 39.0 Male Pune Apple ... NaN Social Media 2023-01-05 05:00:00 4 64562.179611
102 U00103 15.0 NaN Mumbai Realme ... 687.0 Work 2023-01-05 06:00:00 0 136436.260992
103 U00104 21.0 Male NaN Apple ... 1129.0 Entertainment 2023-01-05 07:00:00 0 86387.988358
104 U00105 56.0 NaN NaN Motorola ... 1037.0 Entertainment 2023-01-05 08:00:00 1 91850.388545
... ... ... ... ... ... ... ... ... ... ... ...
1995 U01996 28.0 Other Bangalore Google Pixel ... 547.0 Entertainment 2023-03-25 03:00:00 2
108289.398497
1996 U01997 57.0 Male NaN Motorola ... 900.0 Work 2023-03-25 04:00:00 3 89147.220506
1997 U01998 23.0 NaN Ahmedabad OnePlus ... 509.0 Entertainment 2023-03-25 05:00:00 2
374678.862645
1998 U01999 25.0 Male Pune NaN ... 138.0 Gaming 2023-03-25 06:00:00 1 129811.377124
1999 U02000 35.0 Female Kolkata Vivo ... NaN Gaming 2023-03-25 07:00:00 1 142723.216312
[1900 rows x 19 columns]