Pandas Exercises

learn pandas with exercices

Uploaded by

medimat27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

15 views15 pages

Pandas Exercises

learn pandas with exercices

Uploaded by

medimat27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 15

Let's create a DataFrame import pandas as pd iris = pd.read_csv(“iris-write-fron-docker.csv") iow Let's Look at the type of iris object. print (type(iris)) ‘What columns does the Dataframe consist off inis.colunns Index(['sepal length’, ‘sepal width’, ‘petal length’, ‘petal width’ object’) Let's look at the first 10 lines of the iris iris.head(10) sepal_length sepal_width petal_length petal class 0 54 35 14 0.2 ris-setosa 1 4s 30 14 0.2 ris-setosa 2 47 2»? 3 0? is-setosa 3 46 ar 18 0.2 ris-setosa 4 50 36 14 0.2 ris-setosa 5 54 39 7 04 ris-setosa 6 46 34 14 03. ris-setosa 7 50 34 1s 0.2 ris-setosa 8 44 29 14 0.2 ris-setosa 9 4s 3 1s 0.1 is-setosa Let's look at the last 10 lines of the iris iris.tail(10) sepal length sepal_width petal length petal width class 140 57 34 56 24 Iris virginice 141 59 54 23. Iris-virginice 142 58 ar 51 19. Iris-virginice “class'1, dtypesepal_length sepal width petal length petal width class 143 58 32 59 23. Iris-viginice 144 87 33 ST 25 ris-virginice 145 67 30 52 23. ris-virginice 146 63 25 50 19. Iris-virginice 147 6s 30 52 2.0 ris-virginice 148, 62 34 54 23. ris-virginice 149 58 30 51 18 lris-virginice Let's find out the size of the data inis.shape (158, 5) Another method where we can get information about the data frame is the info) method This method gives us values such as the data type of the columns, the number of rows in the data frame, the number of data in each column. iris. info() RangeIndex: 15@ entries, @ to 149 Data columns (total 5 columns): # Column Non-Null Count Dtype @ sepal_length 15@ non-null float64 1 sepal_width 150 non-null floate4 2 petal_length 150 non-null —float64 3 petal_width 150 non-null —_float64 4 class 150 non-null object dtypes: Floates(4), object(1) nemory usage: 6.0+ KB We an use the copy() method to transfer a dataframe to another dataframe. inis_new = iris. copy() inis_new-head(5) sepal_length sepal width petal length ith class 0 51 35 14 02 tis-setose 1 4s 30 14 02 tis-setosa 2 ay 32 13 02 is-setosa 3 46 3 18 02 tis-setosesepal_length sepal.width petal_length petal width class 4 50 36 14 02. tis-setosa We can implement the methods valid for Numpy in pandas dataframes as well. For example, we can apply the og) method in the Numpy package to find the natural logarithm of values ina data frame consisting of numeric values flet's first select the numeric columns of the iris datafrane. import numpy as np arr = iris.iloc{:,[8,1,2,3]] arr_log = np.log(arr) arr_log sepal_length sepal width petal length petal width 0 162924" 1.252763 «9.336472 -1,609438 11589238 1.098612 0336472 -1,609438 2 1547563 1.163157 «0.262364 -1.609438 3 1526056 1.131402 9.405465 -1,609438 4 1609438 1.280934 9.336472 -1,609438 145 1.902108 1.098612 1.648659 0.832909 146 1.840550 0.916297 1.609438 0.641854 1471871802 1.098612 1.648659 0.693147 148 1.824549—1.22375 1.686399 0.832909 149 1.74952 1.098612 1.629247 0587787 150 rows x 4 columns CHOOSING IN A CATAFRAME iris["sepal_length"] 2 5.1 1 4.9 2 4.7 3 4.6 4 5.@ u45 6.7 146 6.3 1476.5 1486.2 195.9 Name: sepal_length, Length: 158, dtype: floateaIf we use single square brackets next to the data set, it will download the data in the column as one-dimensional. You can also think of it as a list. As a matter of fact, when we examine the type of data we have drawn in this way, we see that itis a one-dimensional Series data type type(iris["sepal_length"]) pandas.core.series.Series If we want the column we want to act as a data frame, we need to write the column name in two square brackets print (type(iris[["sepal_length"]])) inis[["sepal_length"]] sepal_length 0 5a 1 4g 2 az 3 46 4 50 145 87 146 63 147 6S 148 52 149 59 150 rows x 1 columns Itis possible to see more than one column, iris[["sepal_length”,"sepal_width"]] sepal_length sepal_width ° Se 35 1 4g 30 2 az 32 3 46 3 4 50 36sepal_length sepal_width 145 87 30 146 63 25 147 65 30 148 52 34 149 59 30 150 rows x 2 columns /e seen selecting columns, now let's see selecting rows. inis[2:5] sepal_length sepal width petal length ith class 2 ar 32 13 02 tis-setose 3 46 3 18 02 tis-setosa 4 50 36 14 02 is-setosa Now let's want to see the information of the Sth row. For this, we use the Joc{] method anc the desired row's name is given as a parameter aseries iris. loc[5] sepal_length 5.4 sepal_width 3.9 17 0.4 Inis-setosa Name: 5, dtype: object iris.loc{(5]] sepal_length sepal width petal_length petal width class. 5 54 39 W 04 tis-setosa multiple Line selection iris. loc[[5,6]] sepal_length sepal.width petal length petal » class 5 54 39 7 04 tis-setosasepal_length sepal.width petal_length petal width class 6 46 34 14 03. is-setose inis["petal_length"][5] 17 udifferent spelling iris. petal_length[5] 17 inis.loc(5,"class") "Inis-setosa’ iris. loc[[1,2,3,4,5],["petal_length petal_length petalwidth class. 1 14 02 Iris-setose 2 12 02 Iris-setose 3 18 02. Iris-setose 4 14 02. Iris-setose 5 W 04 Iris-setose inis.loc[:,["sepal_length","class"]] sepal_length class ° 5. tis-setosa 1 49 is-setosa 2 47 is-setosa 3 46 is-setose 4 5.0 is-setosa 145 5.1 Iris-vrginica 146 63. Iris-virginica 147 65. Iris-virginica 148, 8.2. Iris-virginicasepal_length class 149 59. Iris-virginice 150 rows x 2 columns The .iloc{] function is used to select index numbers instead of row and column names. iris. iloc(1) sepal_length 4 sepal_width 3 petal_length 1 8, 0 petal_width . class Iris-setos: Name: 1, dtype: object 9 e 4 2 a inis.iloc[[1]] sepal_length sepal.width petal length petal class 1 4g 30 14 02 tis-setose iris. iloc[6,1] 3.4 iris. iloc[[1,2,3,6]] sepal_length sepal_width petal length petal » class 1 4g 30 14 02 tis-setose 2 ay 32 12 02 tis-setosa 3 46 18 02 tis-setosa 6 46 34 14 03 tis-setose To select multiple rows and columns, just like in the loc[] function, but this time with row anc column index numbers iris. iloc[[1,2,3],[2,3]] petal length petal width 1 14 02 2 13 02 3 1s 02Itis possible to use column names and index numbers together. inis["class"][@: 2 Inis-setosa 1 Inis-setosa 2 Inis-setosa 3 Inis-setosa 4 Inis-setosa Name: class, dtype: object iris[["sepal_length”, "sepal_width"]][@:5] sepal_length sepal width 0 5A 35 1 4g 30 2 ay 32 3 46 3 4 50 36 iris. loc[5:10, "sepal_length”:"petal_length"] sepal_length sepal width petal length 5 54 39 7 6 46 34 14 7 50 34 1s 8 44 2¢ 14 9 49 34 1s 10 54 37 1s iris{"sepal_length’] = takes the column as a one-dimensional data array. iis{(’sepal_length"]] = takes the column as pandas dataframe. iris[2:5] = Retrieves rows from line 3 (index number 2) toline 5 (index number 4) iris.loc(5] = takes the column as a one-dimensional data array. iris loc{{5I] = takes the column as pandas dataframe irisiloc{{SI] = Retrieves the row with index number 1. PANDAS DATA ANALYSIS Values such as minimum, maximum, mean, standard deviation, median, 25% slice are available in the .describe() method of the pandas module. inis.describe()sepal_length sepal.width petal length petal width count — 150,000000 150.00000¢ 150.0000 150.0000 mean 5.843333 «3.054000 «3.758667 ‘1.198667 std 0.828065 9.433594 1.764420 0.763161 min 4300000 2.00000¢~—1.000000 0.100000 25% 5.100000 2.800000 ~~ 1.600000 0.300000 30% 5.800000 3.000000 ~—«4.350000 ‘1.300000 75% 5.400000 330000C~=—$.100000 ‘1.800000 max 7.900000 4.40000¢ +~—-6.900000 2.500000 inis["class"] .describe() count: 156 unique 3 top Iris-setose freq 5e Name: class, dtype: object We can use the .unique() method to see which categorical variables are included in a column, inis["class"] -unique() array(['Iris-setosa', 'Iris-versicolor’, ‘Iris-virginica’], dtype=object) -count() method is available to separately calculate summary values such as mean anc standard deviation of numerical information. inis.count() sepal_length 15@ sepal_width 15 petal_length 15@ petal_width 15@ class 158 dtype: inted data = [“petal_length”, ‘petal_width'] iris[data].count() petal_length 15@ petal_width 15@ dtype: inted Mean rmean() Standard deviation -quantile(y) y is amount of percentile > std() Median > amedian() Percentile In order to make row-based calculations, we need to specify the axi columns’ argument inthe method, iris.mean(axis='colunns’) C:\Users \batuh\AppData\Local \Temp\ipykernel_15580\987785517.py:1: FutureWarning: Orc pping of nuisance colunns in Datafrane reductions (with ‘nuneric_only-None') is depr ecated; in a future version this will raise Typetrror. Select only valid colunns be fore calling the reduction iris.mean(axis="columns') 2 2.558 1 2.375 2 2.358 3 2.358 4 2558 154.300 146 3.925 a7 4.175 14g 4.325 14g 3.95@ Length: 150, dtype: floatea inis.mean(axis=1) C:\Users\batuh\AppData\Local \Temp\ipykernel_15580\1464791641.py:1: FutureWarning: Dr opping of nuisance columns in DataFrane reductions (with ‘numeric only=None') is dep recated; in a future version this will raise TypeError. Select only valid columns b efore calling the reduction. inis.mean(axis=1) 2 2.558 1 2.375 2 2.358 3 2.358 4 2.558 154.300 146 3.925 147 4.475 148 4,325 ug 3.95¢ Length: 150, dtype: floatea iris.mean(axis="rows*) C:\Users\batuh\AppData\Local \Temp\ipykernel_15580\2870185531.py:1: FutureWarning: Dr opping of nuisance columns in DataFrame reductions (with ‘numeric_onlysNone") is dep recated; in a future version this will raise TypeError. Select only valid columns b efore calling the reduction. inis.mean(axis="rows' ) sepal_length 5.843333 sepal_width 3.@54000 petal_length 3.758667 petal_width 1.198667 dtype: floatea The way to make conditional selection in text type columns is to apply the .str method.inis2 = pd.read_csv(""iris-write-fron-docker.cs\ cond = inis2["class"].str.contains("Iris-setos: setosa = iris2[ cond) inis2.head() sepal_length sepal.width petal_length petal width class 0 5A 35 14 02 tis-setose 1 4s 30 14 02 tis-setose 2 a7 32 12 02 tis-setosa 3 46 a 18 02 tis-setosa 4 50 36 14 02 tis-setose CONDITIONAL CHOICES inis[iris.sepal_length > 7.5] sepal_length sepal width petal length petal width class 105 16 30 66 2.1 Iis-vinginice "17 17 38 67 22. ris-virginice 118 17 26 69 23. ris-virginice 122 17 28 67 2. ris-virginice 131 79 38 64 2. Iris-virginice 135 17 30 6 23. Iris-virginice inis[(inis.sepal_length > 6.5) & (iris.petal_length <4.5)] sepal_length sepal width petal_length petal width class 65 67 34 44 14. is-versicolor 75 66 30 44 14 ris-versicolor iris[(iris.sepal_length > 7.5) | (iris.petal_length > 6.5)] sepal_length sepal width petal length petal width class 105 16 30 66 2.1 Iris-virginice "7 17 38 67 22. ris-virginice 118 17 26 69 23. ris-virginice 122 17 28 67 2. Iris-virginice 131 79 38 64 2.0 Iris-virginicesepal_length sepal width petal length petal width class 135 17 30 6 23. Iris-virginice Itis also possible to pull data in another column by applying a condition to the data of one column. For example, let's see the petal length values of the rows with the sepal length value > 75. iris.petal_length[iris.sepal_length > 7.5] 105 117 118 122 331 B56. Name: petal_length, dtype: floated ally, let's take a look at the reshaping and manipulation operations that can be performed on Fandas dataframes. Let's create the following df dataframe to use in our examples variable = np.repeat(['A’,'8', val = np.random.random(12) df_dict = {'variable' :variable, ‘value’ :val} df = pd.DataFrane(df_dict) df = df[[ ‘variable’, ‘value’]] dF D’},[3,3,3,3],axis=@) variable value 0 A 0658697 1 A 0986387 2 A 9.980378 3 2 0,84828¢ 4 3 0,60038¢ 5 3 0.131574 6 c 9.188967 7 c 0202935 8 © 0431972 9 D 9.960973 10 D 0939120 " D 0903636 Now let's change this dataframe so that it has variable names A.B,C,D. There is a pivot() method for this.df2 = df.pivot (column: df2 variable A B 0 0.658697 NaN 10986387 NaN 2 0980378 NaN 3 NaN 0848280 4 NaN 9.60038 5 NaN 9.131574 6 NaN NaN 7 NaN NaN 8 NaN NaN 9 NaN NaN 10 © NaN NaN 11 NaN NaN ariable’ value: c D NaN Nat NaN Nat NaN Nat NaN Nay NaN Nat NaN Nat 0.188967 Na 0202935 NaN 0431972 NaN NaN 0.960972 NaN 0839120, NaN 9903636 value") There is a melt() method to rewrite columns line by line #3 = df2.melt(value_vars=["A","B","C", "D'], value_nane: 3 0 1 2 15 16 7 30 31 32 45 46 47 variable A A A ooo value 0.658697 9.986387 9.980378 0.848280 9.600380 0.131574 0.188967 9.202935 9.431972 9.960973 9.839120 9.903636 alue" ) éropna() The merge) method is used to merge two dataframes using a key column:['Al1", "Baran®, ‘Mehmet"] , "Y1":[97,85,76]} "Ali", "Baran’, Umut"] , "Y2":[75,94,96]} df4 = pd.DataFrame(dict2) df5 = pd.DataFrane(dict3) print (dF4) print (d#5) x YL Ali 97 Baran 85 Mehmet 76 x y2 Ali 75 Baran 94 Umut 96 #6 = pd.merge(dF4,df5,how=" Left’ ,on="X") df6 x YI v2 ° Ali 97 75.0 1 Baran 8§ 940 2 Mehmet 76 NaN d€7 = pd.merge(df4, df5, ho d¢7 right ,on="X") 0 Ali 970 75 1 Baran 95,0 94 2 Umut NaN 96 £8 = pd.merge(df4,df5,how=" inner’ ,on="X") dfs x vt 2 Oo Ali 97. 75 1 Baran BS 94 d£9 = pd.merge(dfa,dfS,how='outer’ ,on='X") df9x ° Ali 1 Baran 2 Mehmet 3 Umut ‘1 97 850 766 NaN 750 940 NaN 960

Exno 4
No ratings yet
Exno 4
13 pages
Experiment-2-1-Ml Kritika
No ratings yet
Experiment-2-1-Ml Kritika
11 pages
# Common Datatype: Print Type Print Type Print Type Print Type Print Type
No ratings yet
# Common Datatype: Print Type Print Type Print Type Print Type Print Type
4 pages
Assignment - 10 - Pandas
No ratings yet
Assignment - 10 - Pandas
53 pages
5-1 Dataframes Intro Load Inspect - Instruction
No ratings yet
5-1 Dataframes Intro Load Inspect - Instruction
2 pages
BDA pr2
No ratings yet
BDA pr2
2 pages
Chapter4 Pandas
No ratings yet
Chapter4 Pandas
43 pages
Assigntment 3 Python Lab
No ratings yet
Assigntment 3 Python Lab
1 page
Assignment 5'
No ratings yet
Assignment 5'
4 pages
A09Ass06 - Jupyter Notebook
No ratings yet
A09Ass06 - Jupyter Notebook
29 pages
Dsbda Ouput 1-10
No ratings yet
Dsbda Ouput 1-10
89 pages
Dsfasdflalksdflkasdjfasf
No ratings yet
Dsfasdflalksdflkasdjfasf
4 pages
b21 DSBDA Assignment No 10
No ratings yet
b21 DSBDA Assignment No 10
1 page
Assignment 1
No ratings yet
Assignment 1
6 pages
Trần Mạnh Hùng 20192643.Ipynb - Colab
No ratings yet
Trần Mạnh Hùng 20192643.Ipynb - Colab
6 pages
Untitled5 1
No ratings yet
Untitled5 1
13 pages
Practical No - 1
No ratings yet
Practical No - 1
5 pages
Session-24 - Jupyter Notebook
No ratings yet
Session-24 - Jupyter Notebook
13 pages
31 Pandas 02
No ratings yet
31 Pandas 02
8 pages
Task 1
No ratings yet
Task 1
14 pages
Pandas - Ipynb - Colab
No ratings yet
Pandas - Ipynb - Colab
8 pages
ML LabReport Final Index Edited
No ratings yet
ML LabReport Final Index Edited
35 pages
Chap5 - Wei - Ipynb - Colab
No ratings yet
Chap5 - Wei - Ipynb - Colab
29 pages
Prac 10
No ratings yet
Prac 10
6 pages
Ihtisham Ali 6534
No ratings yet
Ihtisham Ali 6534
3 pages
03 DataFrames
No ratings yet
03 DataFrames
9 pages
Session-25 - Jupyter Notebook
No ratings yet
Session-25 - Jupyter Notebook
20 pages
Assignment No - 10
No ratings yet
Assignment No - 10
3 pages
Batch1 Ds
No ratings yet
Batch1 Ds
15 pages
Dsbda 10
No ratings yet
Dsbda 10
3 pages
ML N PY Programs
No ratings yet
ML N PY Programs
17 pages
10 Min Pandas
No ratings yet
10 Min Pandas
18 pages
Experiment1.Ipynb - Colab
No ratings yet
Experiment1.Ipynb - Colab
11 pages
Data Science: Objectives
No ratings yet
Data Science: Objectives
10 pages
Lab Manual
No ratings yet
Lab Manual
32 pages
Pandas
No ratings yet
Pandas
25 pages
Class 6 Pandas
No ratings yet
Class 6 Pandas
13 pages
Iris - Ipynb - Colab
No ratings yet
Iris - Ipynb - Colab
1 page
Pandas Data Indexing & Selection Guide
No ratings yet
Pandas Data Indexing & Selection Guide
8 pages
Ids Lab 8
No ratings yet
Ids Lab 8
8 pages
DL Experiment - 1
No ratings yet
DL Experiment - 1
10 pages
6 Lab
No ratings yet
6 Lab
16 pages
Tutorial-4 Machine Learning With Pandas
No ratings yet
Tutorial-4 Machine Learning With Pandas
47 pages
Cota12 6
No ratings yet
Cota12 6
4 pages
Iris Flower Classification
No ratings yet
Iris Flower Classification
47 pages
Pandas
No ratings yet
Pandas
8 pages
Dsintro RST
No ratings yet
Dsintro RST
15 pages
Python Pandas Dataframe: Parameter & Description
No ratings yet
Python Pandas Dataframe: Parameter & Description
12 pages
Dsbda 3B
No ratings yet
Dsbda 3B
5 pages
Practical of Professional Skills
No ratings yet
Practical of Professional Skills
4 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Ex No4
No ratings yet
Ex No4
3 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
Unit IV
No ratings yet
Unit IV
49 pages
Loading Pandas
No ratings yet
Loading Pandas
23 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Iris Dataset Analysis & Visualization
No ratings yet
Iris Dataset Analysis & Visualization
4 pages

Pandas Exercises

Uploaded by

Pandas Exercises

Uploaded by

You might also like