Assignment - 10 - Pandas
Assignment - 10 - Pandas
November 8, 2024
Species
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
.. …
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
1. Write a Pandas program to get the data types of the given csv data (iris.csv ) fields
1
[1]: ### code here
import pandas as pd
Id int64
SepalLengthCm float64
SepalWidthCm float64
PetalLengthCm float64
PetalWidthCm float64
Species object
dtype: object
2. Write a Pandas program to find the sum, mean, max, min value of ‘SepalLengthCm’
column of dataframe
[2]: ### code here
import pandas as pd
3. Write a Pandas program to import iris data and skipping first twenty rows into a
Pandas dataframe.
[3]: ### code here
import pandas as pd
2
# Load the iris dataset from csv file, skipping the first 20 rows
df = pd.read_csv('iris.csv', skiprows=20)
Species
140 Iris-virginica
141 Iris-virginica
142 Iris-virginica
143 Iris-virginica
144 Iris-virginica
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
3
5. Write a Pandas program to create a subtotal of “PetalLengthCm” against Species
from the dataframe.
[5]: ### code here
import pandas as pd
Species
Iris-setosa 73.2
Iris-versicolor 213.0
Iris-virginica 277.6
Name: PetalLengthCm, dtype: float64
6. Write a Pandas program to find details where “PetalLengthCm” > 2.
[6]: ### code here?
import pandas as pd
Species
4
50 Iris-versicolor
51 Iris-versicolor
52 Iris-versicolor
53 Iris-versicolor
54 Iris-versicolor
.. …
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
Empty DataFrame
Columns: [Id, SepalLengthCm, SepalWidthCm, PetalLengthCm, PetalWidthCm, Species]
Index: []
8. Write a Pandas program to sort the records by the SepalLengthCm column.
[8]: import pandas as pd
5
41 42 4.5 2.3 1.3 0.3
.. … … … … …
122 123 7.7 2.8 6.7 2.0
118 119 7.7 2.6 6.9 2.3
117 118 7.7 3.8 6.7 2.2
135 136 7.7 3.0 6.1 2.3
131 132 7.9 3.8 6.4 2.0
Species
13 Iris-setosa
42 Iris-setosa
38 Iris-setosa
8 Iris-setosa
41 Iris-setosa
.. …
122 Iris-virginica
118 Iris-virginica
117 Iris-virginica
135 Iris-virginica
131 Iris-virginica
9. Write a Pandas program to insert a column in the sixth position of dataframe and
fill it with NaN values.
[9]: import pandas as pd
import numpy as np
# Insert a new column at the sixth position (index 5) and fill it with NaN␣
↪values
Species
0 Iris-setosa
6
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
Species
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
… …
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
Total NaN
7
[ ]:
8
pandas-assignment-day-2
November 8, 2024
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Write a Pandas program to create a dataframe by given dictionary and index as lables
[21]: # Create a DataFrame
df = pd.DataFrame(data, index=labels)
1. Write a Pandas program to select the rows where the score is missing, i.e. is NaN
[22]: # Create a DataFrame
df = pd.DataFrame(data)
1
# Display the rows with missing scores
print(missing_score_rows)
3. Write a Pandas program to select the ‘name’ and ‘score’ columns from the following
DataFrame.
[28]: # Create a DataFrame
df = pd.DataFrame(data)
name score
0 Anastasia 12.5
1 Dima 9.0
2 Katherine 16.5
3 James NaN
4 Emily 9.0
5 Michael 20.0
2
6 Matthew 14.5
7 Laura NaN
8 Kevin 8.0
9 Jonas 19.0
4. Write a Pandas program to calculate the mean score for each different student in
DataFrame.
[29]: # Create a DataFrame
df = pd.DataFrame(data)
name
Anastasia 12.5
Dima 9.0
Emily 9.0
James NaN
Jonas 19.0
Katherine 16.5
Kevin 8.0
Laura NaN
Matthew 14.5
Michael 20.0
Name: score, dtype: float64
5. Write a Pandas program to append a new row ‘k’ to data frame with any value for
each column. Now delete the row ‘c’ and return the original DataFrame.
[30]: ### code here
# Create a DataFrame
df = pd.DataFrame(data)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
3
~\AppData\Local\Temp\ipykernel_11716\372884500.py in ?()
3 df = pd.DataFrame(data)
4
5 # Append a new row 'k' with values for each column
6 new_row = {'name': 'Frank', 'score': 82, 'age': 26}
----> 7 df = df.append(new_row, ignore_index=True)
8
9 # Delete the row 'c' (Charlie) from the DataFrame
10 df = df.drop(index=2) # Assuming 'c' corresponds to the index of Charlie
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
[43]: survived pclass sex age sibsp parch fare embarked class \
0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third
.. … … … … … … … … …
886 0 2 male 27.0 0 0 13.0000 S Second
887 1 1 female 19.0 0 0 30.0000 S First
888 0 3 female NaN 1 2 23.4500 S Third
889 1 1 male 26.0 0 0 30.0000 C First
890 0 3 male 32.0 0 0 7.7500 Q Third
4
.. … … … … … … …
886 man True NaN Southampton no True NaN
887 woman False B Southampton yes True NaN
888 woman False NaN Southampton no False NaN
889 man True C Cherbourg yes True NaN
890 man True NaN Queenstown no True NaN
1. Write a Pandas program to create a Pivot table and find the total fare amount
class wise, gender wise.
[31]: #### code here
import pandas as pd
# Create a pivot table to find total fare amount class-wise and gender-wise
pivot_table = pd.pivot_table(df, values='fare', index='pclass', columns='sex',␣
↪aggfunc='sum', fill_value=0)
2. Write a Pandas program to create a Pivot table and find survival rate by gender.
[32]: ### code here
import pandas as pd
5
print(pivot_table)
survived
sex
female 0.742038
male 0.188908
3. Write a Pandas to create to find the total count no of people survived pclass wise, class wise
and gender wise.
[33]: ### code here
import pandas as pd
# Create a pivot table to find the total count of people who survived, grouped␣
↪by Pclass and Gender
pivot_table = pd.pivot_table(df,
values='survived',
index='pclass',
columns='sex',
aggfunc='sum',
fill_value=0) # Fill missing values with 0
# Create a pivot table to count how many women and men were in each cabin class
pivot_table = pd.pivot_table(df,
values='class',
index='pclass',
6
columns='sex',
aggfunc='count',
fill_value=0) # Fill missing values with 0
5. Write a Pandas program to count the number of missing values in each column.
[35]: #### code here
import pandas as pd
survived 0
pclass 0
sex 0
age 177
sibsp 0
parch 0
fare 0
embarked 2
class 0
who 0
adult_male 0
deck 688
embark_town 2
alive 0
alone 0
Unnamed: 15 891
dtype: int64
6. Write a Pandas program to replace null values with the value from the previous
row or the next row in a given DataFrame.
7
[36]: ### code here
import pandas as pd
# Replace NaN values with the value from the previous row (forward fill)
df_ffill = df.ffill()
8
# Load the titanic dataset from csv file
df = pd.read_csv('titanic.csv')
9
df = pd.read_csv('titanic.csv')
10
# Display the result
print("\n'Embark_town' column after removing duplicates:\n", embark_town_unique)
# Filter passengers who survived, embarked from 'Southampton', and are children␣
↪(age < 18)
11
261 1 3 male 3.00 4 2 31.3875 S Third
305 1 1 male 0.92 1 2 151.5500 S First
340 1 2 male 2.00 1 1 26.0000 S Second
348 1 3 male 3.00 1 1 15.9000 S Third
407 1 2 male 3.00 1 1 18.7500 S Second
435 1 1 female 14.00 1 2 120.0000 S First
445 1 1 male 4.00 0 2 81.8583 S First
446 1 2 female 13.00 0 1 19.5000 S Second
479 1 3 female 2.00 0 1 12.2875 S Third
489 1 3 male 9.00 1 1 15.9000 S Third
504 1 1 female 16.00 0 0 86.5000 S First
530 1 2 female 2.00 1 1 26.0000 S Second
535 1 2 female 7.00 0 2 26.2500 S Second
549 1 2 male 8.00 1 1 36.7500 S Second
618 1 2 female 4.00 2 1 39.0000 S Second
689 1 1 female 15.00 0 1 211.3375 S First
720 1 2 female 6.00 0 1 33.0000 S Second
750 1 2 female 4.00 1 1 23.0000 S Second
751 1 3 male 6.00 0 1 12.4750 S Third
755 1 2 male 0.67 1 1 14.5000 S Second
777 1 3 female 5.00 0 0 12.4750 S Third
781 1 1 female 17.00 1 0 57.0000 S First
788 1 3 male 1.00 1 2 20.5750 S Third
802 1 1 male 11.00 1 2 120.0000 S First
831 1 2 male 0.83 1 1 18.7500 S Second
853 1 1 female 16.00 0 1 39.4000 S First
869 1 3 male 4.00 1 1 11.1333 S Third
12
445 child False A Southampton yes False NaN
446 child False NaN Southampton yes False NaN
479 child False NaN Southampton yes False NaN
489 child False NaN Southampton yes False NaN
504 woman False B Southampton yes True NaN
530 child False NaN Southampton yes False NaN
535 child False NaN Southampton yes False NaN
549 child False NaN Southampton yes False NaN
618 child False F Southampton yes False NaN
689 child False B Southampton yes False NaN
720 child False NaN Southampton yes False NaN
750 child False NaN Southampton yes False NaN
751 child False E Southampton yes False NaN
755 child False NaN Southampton yes False NaN
777 child False NaN Southampton yes True NaN
781 woman False B Southampton yes False NaN
788 child False NaN Southampton yes False NaN
802 child False B Southampton yes False NaN
831 child False NaN Southampton yes False NaN
853 woman False D Southampton yes False NaN
869 child False NaN Southampton yes False NaN
11. Write a Pandas program to filter all records starting from the 2nd row, access every 5th row
from the dataframe.
[41]: ### code here
import pandas as pd
# Filter the DataFrame starting from the 2nd row (index 1) and access every 5th␣
↪row
filtered_df = df.iloc[1::5]
13
.. … … … … … … … … …
866 1 2 female 27.0 1 0 13.8583 C Second
871 1 1 female 47.0 1 1 52.5542 S First
876 0 3 male 20.0 0 0 9.8458 S Third
881 0 3 male 33.0 0 0 7.8958 S Third
886 0 2 male 27.0 0 0 13.0000 S Second
14
885 0 3 female 39.0 0 5 29.1250 Q Third
889 1 1 male 26.0 0 0 30.0000 C First
890 0 3 male 32.0 0 0 7.7500 Q Third
[ ]:
15
pandas-assignment-day-3
November 8, 2024
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Write a Pandas program to create a dataframe by given dictionary and index as lables
[14]: ### code here
# Create a DataFrame
df = pd.DataFrame(data, index=labels)
1. Write a Pandas program to sort the DataFrame first by ‘name’ in descending order,
then by ‘score’ in ascending order
[63]: ### code here
# Create a DataFrame
df = pd.DataFrame(data)
1
# Sort the DataFrame first by 'name' in descending order, then by 'score' in␣
↪ascending order
Sorted DataFrame:
name score attempts qualify
5 Michael 20.0 3 yes
6 Matthew 14.5 1 yes
7 Laura NaN 1 no
8 Kevin 8.0 2 no
2 Katherine 16.5 2 yes
9 Jonas 19.0 1 yes
3 James NaN 3 no
4 Emily 9.0 2 no
1 Dima 9.0 3 no
0 Anastasia 12.5 1 yes
2. Write a Pandas program to replace the ‘qualify’ column contains the values ‘yes’ and ‘no’ with
True and False
[64]: ### code here
# Create a DataFrame
df = pd.DataFrame(data)
# Replace 'yes' with True and 'no' with False in the 'qualify' column
df['qualify'] = df['qualify'].replace({'yes': True, 'no': False})
Updated DataFrame:
name score attempts qualify
0 Anastasia 12.5 1 True
1 Dima 9.0 3 False
2 Katherine 16.5 2 True
3 James NaN 3 False
4 Emily 9.0 2 False
5 Michael 20.0 3 True
6 Matthew 14.5 1 True
7 Laura NaN 1 False
8 Kevin 8.0 2 False
2
9 Jonas 19.0 1 True
C:\Users\indra\AppData\Local\Temp\ipykernel_21872\2229088605.py:7:
FutureWarning: Downcasting behavior in `replace` is deprecated and will be
removed in a future version. To retain the old behavior, explicitly call
`result.infer_objects(copy=False)`. To opt-in to the future behavior, set
`pd.set_option('future.no_silent_downcasting', True)`
df['qualify'] = df['qualify'].replace({'yes': True, 'no': False})
3. Write a Pandas program to delete the ‘attempts’ column from the DataFrame
[65]: ### code here
# Create a DataFrame
df = pd.DataFrame(data)
# Create a DataFrame
df = pd.DataFrame(data)
3
print("\nUpdated DataFrame (after inserting 'grade' column):\n", df)
5. Write a Pandas program to change the name ‘James’ to ‘Maxwell’ in name column
of the DataFrame.
[67]: ### code here
# Create a DataFrame
df = pd.DataFrame(data)
4
[68]: ### code here
# Create a DataFrame
df = pd.DataFrame(data)
7. Write a Pandas program to convert the datatype of a given column (floats to ints).
5
[69]: ### code here
# Create a DataFrame
df = pd.DataFrame(data)
#Create a dataframe
df = pd.DataFrame(data, index=labels)
6
Updated DataFrame (after converting index to column):
index name score attempts qualify
0 a Anastasia 12.5 1 yes
1 b Dima 9.0 3 no
2 c Katherine 16.5 2 yes
3 d James NaN 3 no
4 e Emily 9.0 2 no
5 f Michael 20.0 3 yes
6 g Matthew 14.5 1 yes
7 h Laura NaN 1 no
8 i Kevin 8.0 2 no
9 j Jonas 19.0 1 yes
# Create a DataFrame
df = pd.DataFrame(data)
Shuffled DataFrame:
name score attempts qualify
0 Jonas 19.0 1 yes
1 Laura NaN 1 no
2 James NaN 3 no
3 Michael 20.0 3 yes
4 Katherine 16.5 2 yes
5 Matthew 14.5 1 yes
6 Kevin 8.0 2 no
7 Emily 9.0 2 no
8 Dima 9.0 3 no
9 Anastasia 12.5 1 yes
10 Write a Pandas program to write a DataFrame to CSV file using tab separator
[72]: ### code here
# Create a DataFrame
df = pd.DataFrame(data)
7
filename = 'output_data.tsv' # Using .tsv as an extension for tab-separated␣
↪values
[ ]:
[ ]:
student_data1 = pd.DataFrame({
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['Danniella Fenton', 'Ryder Storey', 'Bryce Jensen', 'Ed␣
↪Bernal', 'Kwame Morin'],
student_data2 = pd.DataFrame({
'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
'name': ['Scarlette Fisher', 'Carla Williamson', 'Dante Morse', 'Kaiser␣
↪William', 'Madeeha Preston'],
11. Write a Pandas program to join the two given dataframes along rows and assign
all data.
[37]: ### code here
Combined DataFrame:
student_id name marks
0 S1 Danniella Fenton 200
1 S2 Ryder Storey 210
2 S3 Bryce Jensen 190
3 S4 Ed Bernal 222
4 S5 Kwame Morin 199
5 S4 Scarlette Fisher 201
8
6 S5 Carla Williamson 200
7 S6 Dante Morse 198
8 S7 Kaiser William 219
9 S8 Madeeha Preston 201
12. Write a Pandas program to join the two given dataframes along columns and
assign all data.
[38]: ### code here
Combined DataFrame:
student_id name marks student_id name marks
0 S1 Danniella Fenton 200 S4 Scarlette Fisher 201
1 S2 Ryder Storey 210 S5 Carla Williamson 200
2 S3 Bryce Jensen 190 S6 Dante Morse 198
3 S4 Ed Bernal 222 S7 Kaiser William 219
4 S5 Kwame Morin 199 S8 Madeeha Preston 201
13. Write a Pandas program to join the two dataframes using the common column of
both dataframes.
[39]: ### code here
Merged DataFrame:
student_id name_data1 marks_data1 name_data2 marks_data2
0 S4 Ed Bernal 222 Scarlette Fisher 201
1 S5 Kwame Morin 199 Carla Williamson 200
14. Write a Pandas program to join (left join) the two dataframes using keys from left dataframe
only.
[40]: #### code here
9
# Display the result of the left joined DataFrame
print("Left Joined DataFrame:\n", left_joined_data)
[ ]:
[ ]:
C:\Users\indra\AppData\Local\Temp\ipykernel_21872\428872793.py:4: DtypeWarning:
Columns (5,9) have mixed types. Specify dtype option on import or set
low_memory=False.
df = pd.read_csv('ufo_sighting_data.csv')
10
[76]: Date_time city state/province country \
0 10/10/1949 20:30 san marcos tx us
1 10/10/1949 21:00 lackland afb tx NaN
2 10/10/1955 17:00 chester (uk/england) NaN gb
3 10/10/1956 21:00 edna tx us
4 10/10/1960 20:00 kaneohe hi us
… … … … …
80327 9/9/2013 21:15 nashville tn us
80328 9/9/2013 22:00 boise id us
80329 9/9/2013 22:00 napa ca us
80330 9/9/2013 22:20 vienna va us
80331 9/9/2013 23:00 edmond ok us
description date_documented \
0 This event took place in early fall around 194… 4/27/2004
1 1949 Lackland AFB, TX. Lights racing acros… 12/16/2005
2 Green/Orange circular disc over Chester, En… 1/21/2008
3 My older brother and twin sister were leaving … 1/17/2004
4 AS a Marine 1st Lt. flying an FJ4B fighter/att… 1/22/2004
… … …
80327 Round from the distance/slowly changing colors… 9/30/2013
80328 Boise, ID, spherical, 20 min, 10 r… 9/30/2013
80329 Napa UFO, 9/30/2013
80330 Saw a five gold lit cicular craft moving fastl… 9/30/2013
80331 2 witnesses 2 miles apart, Red & White… 9/30/2013
latitude longitude
0 29.8830556 -97.941111
1 29.38421 -98.581082
2 53.2 -2.916667
3 28.9783333 -96.645833
4 21.4180556 -157.803611
… … …
80327 36.165833 -86.784444
11
80328 43.613611 -116.202500
80329 38.297222 -122.284444
80330 38.901111 -77.265556
80331 35.652778 -97.477778
import pandas as pd
# Calculate the number of days between current date and oldest date
12
number_of_days = (current_date.date() - oldest_date_as_date).days # Convert␣
↪current_date to date
# Display results
print("Current date:", current_date.date())
print("Oldest date:", oldest_date_as_date)
print("Number of days between current date and oldest date:", number_of_days)
C:\Users\indra\AppData\Local\Temp\ipykernel_21872\560927568.py:4: DtypeWarning:
Columns (5,9) have mixed types. Specify dtype option on import or set
low_memory=False.
df = pd.read_csv('ufo_sighting_data.csv')
Current date: 2024-10-11
Oldest date: 1906-11-11
Number of days between current date and oldest date: 43069
# Calculate the number of days between current date and oldest date
number_of_days = (current_date - oldest_date).days # Both are Timestamps
# Display results
print("Current date:", current_date)
print("Oldest date:", oldest_date)
print("Number of days between current date and oldest date:", number_of_days)
C:\Users\indra\AppData\Local\Temp\ipykernel_21872\2957761521.py:4: DtypeWarning:
Columns (5,9) have mixed types. Specify dtype option on import or set
low_memory=False.
df = pd.read_csv('ufo_sighting_data.csv')
Current date: 2024-10-11 16:31:57.747711
Oldest date: 1906-11-11 00:00:00
Number of days between current date and oldest date: 43069
13
18. Write a Pandas program to get all the info of the dataframe between 1950-10-10 and 1960-10-10.
[75]: ### code here
import pandas as pd
C:\Users\indra\AppData\Local\Temp\ipykernel_21872\122481063.py:6: DtypeWarning:
Columns (5,9) have mixed types. Specify dtype option on import or set
low_memory=False.
df = pd.read_csv('ufo_sighting_data.csv')
Filtered DataFrame between 1950-10-10 and 1960-10-10:
Date_time city state/province country \
2 1955-10-10 17:00:00 chester (uk/england) NaN gb
3 1956-10-10 21:00:00 edna tx us
480 1952-10-01 03:30:00 fukuoka (japan) NaN NaN
481 1952-10-01 12:00:00 kansas city mo us
482 1954-10-01 19:00:00 flatwoods wv us
… … … … …
79262 1960-09-05 21:00:00 buffalo ny us
79668 1958-09-07 19:00:00 arthur nd us
80101 1952-09-09 20:00:00 philadelphia pa us
80102 1954-09-09 12:30:00 beaumont tx us
80103 1956-09-09 05:55:00 norfolk va us
14
482 circle 60 1 minute
… … … …
79262 oval 180.0 3 minutes
79668 unknown 900.0 5-15 min.
80101 circle 180.0 3 minutes
80102 disk 300.0 5 minutes
80103 cigar 90.0 1.5 minutes
description date_documented \
2 Green/Orange circular disc over Chester, En… 1/21/2008
3 My older brother and twin sister were leaving … 1/17/2004
480 UFO seen by multiple U. S. military personnel;… 12/7/2006
481 1952 daylight sighting of multiple discs in fo… 10/31/2008
482 I saw the craft go across the horizon. It app… 4/12/2013
… … …
79262 Precise movements of a "craft" appar… 9/15/2005
79668 Two lights of alternating color traveling and … 10/31/2003
80101 saucers in a line over Phila Pa. 12/12/2009
80102 Aprox. 30 Disk shaped UFOs fell out of clouds … 1/17/2004
80103 Unidentified Object Hovering over Interstate 2… 2/24/2007
latitude longitude
2 53.2 -2.916667
3 28.9783333 -96.645833
480 33.590355 130.401716
481 39.0997222 -94.578333
482 38.7230556 -80.650000
… … …
79262 42.886389 -78.878611
79668 47.104167 -97.217778
80101 39.952222 -75.164167
80102 30.085833 -94.101667
80103 36.846667 -76.285556
[ ]:
15
pandas-assignment-day-4
November 8, 2024
y z
0 3.98 2.43
1 3.84 2.31
2 4.07 2.31
3 4.23 2.63
4 4.35 2.75
… … …
53935 5.76 3.50
53936 5.75 3.61
53937 5.68 3.56
53938 6.12 3.74
53939 5.87 3.64
1. Write a Pandas program to find the number of rows and columns and data type of each column
of diamonds Dataframe.
1
[4]: ### code here
import pandas as pd
import pandas as pd
2
# Display the summary of object columns
print("Summary of object columns:")
print(object_columns_summary)
import pandas as pd
import pandas as pd
3
df = pd.read_csv('diamonds.csv')
# Define the indices of the rows to be removed (e.g., removing rows 0, 1, and 2)
rows_to_remove = [0, 1, 2]
y z
3 4.23 2.63
4 4.35 2.75
5 3.96 2.48
6 3.98 2.47
7 4.11 2.53
5. Write a Pandas program to sort the ‘cut’ Series in ascending order (returns a Series) of diamonds
Dataframe.
[8]: ### code here
import pandas as pd
4
3850 Fair
51464 Fair
51466 Fair
10237 Fair
10760 Fair
…
7402 Very Good
43101 Very Good
16893 Very Good
16898 Very Good
21164 Very Good
Name: cut, Length: 53940, dtype: object
6. Write a Pandas program to sort the entire diamonds DataFrame by the ‘carat’ Series in ascending
and descending order.
[9]: ### code here
import pandas as pd
y z
31593 3.78 2.32
31597 3.77 2.33
31596 3.71 2.33
5
31595 3.84 2.30
31594 3.80 2.28
y z
27415 10.54 6.98
27630 10.16 6.72
27130 9.85 6.43
25999 9.94 6.24
25998 10.10 6.17
7. Write a Pandas program to filter the DataFrame rows to only show carat weight at least 0.3.
[10]: #### code here
import pandas as pd
# Filter the DataFrame for rows where the carat weight is at least 0.3
filtered_df = df[df['carat'] >= 0.3]
6
y z
4 4.35 2.75
10 4.28 2.73
13 4.37 2.71
15 4.42 2.68
16 4.34 2.68
… … …
53935 5.76 3.50
53936 5.75 3.61
53937 5.68 3.56
53938 6.12 3.74
53939 5.87 3.64
import pandas as pd
# Filter the DataFrame for diamonds where length, width, and depth are all␣
↪greater than 5
# Note: The column names in the diamonds dataset for length, width, and depth␣
↪are 'x', 'y', and 'z'
filtered_diamonds = df[(df['x'] > 5) & (df['y'] > 5) & (df['z'] > 5)]
7
y z
11778 7.28 5.12
13002 7.70 5.36
13118 7.95 5.23
13562 7.56 5.04
13757 8.02 5.36
… … …
27748 7.97 5.04
27749 8.47 5.16
48410 5.15 31.80
49189 31.80 5.12
49905 5.04 5.06
import pandas as pd
row_means = df.select_dtypes(include='number').mean(axis=1)
8
[13]: #### code here
import pandas as pd
11. Write a Pandas program to calculate count, minimum, maximum price for each
cut of diamonds DataFrame.
[14]: ### code here
import pandas as pd
9
Very Good 12082 336 18818
12. Write a Pandas program to display and count the unique values in cut series of diamonds
DataFrame.
[15]: ### code here
import pandas as pd
import pandas as pd
10
print(missing_values_count)
import pandas as pd
# Group by 'cut' and calculate the sum of the 'xyz_product' for each cut
cut_xyz_product = df.groupby('cut')['xyz_product'].sum()
15. Write a Pandas program to read rows 0 through 2 (inclusive), columns ‘color’ and
‘price’ of diamonds DataFrame.
11
[18]: ## code here
import pandas as pd
import pandas as pd
17. Write a Pandas program to get randomly sample rows from diamonds DataFrame.
[20]: ### code here
import pandas as pd
12
# Load the diamonds dataset from CSV file
df = pd.read_csv('diamonds.csv')
y z
2714 4.46 2.74
14653 6.71 4.19
52760 5.54 3.37
48658 4.46 2.80
14812 6.84 4.25
18. Write a Pandas program to get sample 75% of the diamonds DataFrame’s rows without re-
placement and store the remaining 25% of the rows in another DataFrame.
[21]: ### code here
import pandas as pd
13
print(sampled_df.head())
y z
2714 4.46 2.74
14653 6.71 4.19
52760 5.54 3.37
48658 4.46 2.80
14812 6.84 4.25
y z
9 4.05 2.39
14 3.75 2.27
15 4.42 2.68
18 4.26 2.71
20 4.30 2.71
19. Write a Pandas program to read the diamonds DataFrame and detect duplicate color.
[22]: #### code here
import pandas as pd
14
duplicates = df[df['color'].duplicated(keep=False)] # keep=False marks all␣
↪duplicates as True
import pandas as pd
[ ]:
15