PYTHON PROGRAMMING: Data Handling
PYTHON PROGRAMMING: Data Handling
import pandas as pd
# Create a dictionary
series_from_dict = pd.Series(data_dict)
print(series_from_dict)
OUTPUT:
import numpy as np
import pandas as pd
OUTPUT:
Q2: Given a Series, print all the elements that are above the 75th percentile
import pandas as pd
# Example Series
data = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
percentile_75 = data.quantile(0.75)
# Filter the Series to include only elements above the 75th percentile
print(above_75th_percentile)
OTPUT:
Q3: Create a Data Frame quarterly sales where each row contains the item category, item name, and
expenditure. Group the rows by the category and print the total expenditure per category.
Coding
import pandas as pd
data = {
df = pd.DataFrame(data)
# Group by 'Item Category' and calculate the total expenditure per category
print(total_expenditure_per_category)
Output:
Q4: . Create a data frame for examination result and display row labels, column labels data types of
each column and the dimensions.
Coding:
import pandas as pd
data = {
df = pd.DataFrame(data)
print(df.index)
print("\nColumn labels:")
print(df.columns)
print(df.dtypes)
print(df.shape)
Output:
Q5:
Let's start by creating an example DataFrame that contains some duplicate rows:
data = {
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
To remove duplicate rows based on all columns or specific columns, you can use the
drop_duplicates() method.
print(df_no_duplicates_all)
print("\nDataFrame with rows where Math score > 80 and Science score > 85:")
print(df_filtered_multiple)
Removing Rows with Missing Values
# Assume some missing values for demonstration
df.loc[2, 'Math'] = None
Notes:
Summary of Filtering Methods
1. Removing Duplicate Rows:
o df.drop_duplicates(): Removes duplicate rows based on all columns.
o df.drop_duplicates(subset=['column_name']): Removes duplicates based on
specific columns.
2. Filtering Based on Conditions:
o df[df['column_name'] > value]: Filters rows based on a condition.
o df[(df['column1'] > value1) & (df['column2'] < value2)]: Filters rows based on
multiple conditions.
3. Removing Rows with Missing Values:
o df.dropna(): Removes rows with any missing values.
o df.dropna(subset=['column_name']): Removes rows with missing values in
specific columns.
Q6: Write python code to read ‘dept.csv’ file containing columns as (d_id, d_name and city )
and display all records separated by comma
Ans:
import csv
print('Details of csv file:')
print(', '.join(row))
Output is:
10,sales,kolkatta
20,marketing,delhi
Q7: Write a program to read the contents of “dept.csv” file using with open()
Ans:
import csv
reader=csv.reader(csv_file)
rows=[ ]
rows.append(rec)
print(rows)
Output is:
[['d_id d_name city '], ['10 sales kolkatta'], ['11 marketing delhi']]
Q8: Write a program to count the number of records present in “dept.csv” file
Ans:
import csv
f=open('dept.csv','r')
csv_reader=csv.reader(f)
columns=next(csv_reader)
c=0
c=c+1
Output is
Use of next() = The function next() is used to directly point to this list of fields to read the
next line in the CSV file. .next() method returns the current row and advances the iterator to
Q9: Write a program to search the record of a particular student from CSV file on the basis of
inputted name.
Ans:
import csv
f=open('student.csv','r')
csv_reader=csv.reader(f)
if (row[1]==name):
print(row)
Output is:
Q10: Write a program to add (append) Employee records onto a csv file.
Ans:
import csv
mywriter=csv.writer(csvfile,delimiter=',')
ans='y'
while ans.lower()=='y':
clas=input('Enter class:')
mywriter.writerow([rno,name,clas])
ans=input('Add More?:')
Output is:
Enter class:XI
## Data Saved ##
Add More?:y
Enter class:XII
## Data Saved ##
Add More?:n
(b) Write all the records in one single go onto the csv.
Ans:
import csv
def crcsv1():
fobj=csv.writer(csvfile)
while True:
name=input('Name:')
marks = float(input("Marks:"))
Line=[rno,name,marks]
fobj.writerow(Line)
ch=input('More (Y/N):')
if ch=='N':
break
def crcsv2():
fobj=csv.writer(csvfile)
Lines=[]
while True:
name=input('Name:')
marks = float(input("Marks:"))
Lines.append([rno,name,marks])
fobj.writerow(Lines)
ch=input('More (Y/N):')
if ch=='N':
break
fobj.writerows(Lines)
def showall():
csvobj=csv.reader(csvfile)
print(','.join(line))
while True:
if opt=='1':
crcsv1()
elif opt=='2':
crcsv2()
elif opt=='3':
showall()
else:
break
Output is:
Name:ARCHANA SAKAT
Marks:453
More (Y/N):Y
Name:SUNIL
Marks:433
More (Y/N):N
1,ARCHANA SAKAT,453.0
2,SUNIL,433.0
Name:AAKASH
Marks:455
More (Y/N):N
>>>
Q12: Given the school result data, analyses the performance of the students on different
parameters, e.g subject wise or class wise.
coding
import pandas as pd
import matplotlib.pyplot as plt
# Class-wise performance
class_performance = df.groupby('Class').agg({'Math': 'mean', 'Science': 'mean', 'English':
'mean'})
print("\nClass-wise performance:")
print(class_performance)