[go: up one dir, main page]

0% found this document useful (0 votes)
15 views15 pages

Pandas Exercises

learn pandas with exercices

Uploaded by

medimat27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
15 views15 pages

Pandas Exercises

learn pandas with exercices

Uploaded by

medimat27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 15
Let's create a DataFrame import pandas as pd iris = pd.read_csv(“iris-write-fron-docker.csv") iow Let's Look at the type of iris object. print (type(iris)) ‘What columns does the Dataframe consist off inis.colunns Index(['sepal length’, ‘sepal width’, ‘petal length’, ‘petal width’ object’) Let's look at the first 10 lines of the iris iris.head(10) sepal_length sepal_width petal_length petal class 0 54 35 14 0.2 ris-setosa 1 4s 30 14 0.2 ris-setosa 2 47 2»? 3 0? is-setosa 3 46 ar 18 0.2 ris-setosa 4 50 36 14 0.2 ris-setosa 5 54 39 7 04 ris-setosa 6 46 34 14 03. ris-setosa 7 50 34 1s 0.2 ris-setosa 8 44 29 14 0.2 ris-setosa 9 4s 3 1s 0.1 is-setosa Let's look at the last 10 lines of the iris iris.tail(10) sepal length sepal_width petal length petal width class 140 57 34 56 24 Iris virginice 141 59 54 23. Iris-virginice 142 58 ar 51 19. Iris-virginice “class'1, dtype sepal_length sepal width petal length petal width class 143 58 32 59 23. Iris-viginice 144 87 33 ST 25 ris-virginice 145 67 30 52 23. ris-virginice 146 63 25 50 19. Iris-virginice 147 6s 30 52 2.0 ris-virginice 148, 62 34 54 23. ris-virginice 149 58 30 51 18 lris-virginice Let's find out the size of the data inis.shape (158, 5) Another method where we can get information about the data frame is the info) method This method gives us values such as the data type of the columns, the number of rows in the data frame, the number of data in each column. iris. info() RangeIndex: 15@ entries, @ to 149 Data columns (total 5 columns): # Column Non-Null Count Dtype @ sepal_length 15@ non-null float64 1 sepal_width 150 non-null floate4 2 petal_length 150 non-null —float64 3 petal_width 150 non-null —_float64 4 class 150 non-null object dtypes: Floates(4), object(1) nemory usage: 6.0+ KB We an use the copy() method to transfer a dataframe to another dataframe. inis_new = iris. copy() inis_new-head(5) sepal_length sepal width petal length ith class 0 51 35 14 02 tis-setose 1 4s 30 14 02 tis-setosa 2 ay 32 13 02 is-setosa 3 46 3 18 02 tis-setose sepal_length sepal.width petal_length petal width class 4 50 36 14 02. tis-setosa We can implement the methods valid for Numpy in pandas dataframes as well. For example, we can apply the og) method in the Numpy package to find the natural logarithm of values ina data frame consisting of numeric values flet's first select the numeric columns of the iris datafrane. import numpy as np arr = iris.iloc{:,[8,1,2,3]] arr_log = np.log(arr) arr_log sepal_length sepal width petal length petal width 0 162924" 1.252763 «9.336472 -1,609438 11589238 1.098612 0336472 -1,609438 2 1547563 1.163157 «0.262364 -1.609438 3 1526056 1.131402 9.405465 -1,609438 4 1609438 1.280934 9.336472 -1,609438 145 1.902108 1.098612 1.648659 0.832909 146 1.840550 0.916297 1.609438 0.641854 1471871802 1.098612 1.648659 0.693147 148 1.824549—1.22375 1.686399 0.832909 149 1.74952 1.098612 1.629247 0587787 150 rows x 4 columns CHOOSING IN A CATAFRAME iris["sepal_length"] 2 5.1 1 4.9 2 4.7 3 4.6 4 5.@ u45 6.7 146 6.3 1476.5 1486.2 195.9 Name: sepal_length, Length: 158, dtype: floatea If we use single square brackets next to the data set, it will download the data in the column as one-dimensional. You can also think of it as a list. As a matter of fact, when we examine the type of data we have drawn in this way, we see that itis a one-dimensional Series data type type(iris["sepal_length"]) pandas.core.series.Series If we want the column we want to act as a data frame, we need to write the column name in two square brackets print (type(iris[["sepal_length"]])) inis[["sepal_length"]] sepal_length 0 5a 1 4g 2 az 3 46 4 50 145 87 146 63 147 6S 148 52 149 59 150 rows x 1 columns Itis possible to see more than one column, iris[["sepal_length”,"sepal_width"]] sepal_length sepal_width ° Se 35 1 4g 30 2 az 32 3 46 3 4 50 36 sepal_length sepal_width 145 87 30 146 63 25 147 65 30 148 52 34 149 59 30 150 rows x 2 columns /e seen selecting columns, now let's see selecting rows. inis[2:5] sepal_length sepal width petal length ith class 2 ar 32 13 02 tis-setose 3 46 3 18 02 tis-setosa 4 50 36 14 02 is-setosa Now let's want to see the information of the Sth row. For this, we use the Joc{] method anc the desired row's name is given as a parameter aseries iris. loc[5] sepal_length 5.4 sepal_width 3.9 17 0.4 Inis-setosa Name: 5, dtype: object iris.loc{(5]] sepal_length sepal width petal_length petal width class. 5 54 39 W 04 tis-setosa multiple Line selection iris. loc[[5,6]] sepal_length sepal.width petal length petal » class 5 54 39 7 04 tis-setosa sepal_length sepal.width petal_length petal width class 6 46 34 14 03. is-setose inis["petal_length"][5] 17 udifferent spelling iris. petal_length[5] 17 inis.loc(5,"class") "Inis-setosa’ iris. loc[[1,2,3,4,5],["petal_length petal_length petalwidth class. 1 14 02 Iris-setose 2 12 02 Iris-setose 3 18 02. Iris-setose 4 14 02. Iris-setose 5 W 04 Iris-setose inis.loc[:,["sepal_length","class"]] sepal_length class ° 5. tis-setosa 1 49 is-setosa 2 47 is-setosa 3 46 is-setose 4 5.0 is-setosa 145 5.1 Iris-vrginica 146 63. Iris-virginica 147 65. Iris-virginica 148, 8.2. Iris-virginica sepal_length class 149 59. Iris-virginice 150 rows x 2 columns The .iloc{] function is used to select index numbers instead of row and column names. iris. iloc(1) sepal_length 4 sepal_width 3 petal_length 1 8, 0 petal_width . class Iris-setos: Name: 1, dtype: object 9 e 4 2 a inis.iloc[[1]] sepal_length sepal.width petal length petal class 1 4g 30 14 02 tis-setose iris. iloc[6,1] 3.4 iris. iloc[[1,2,3,6]] sepal_length sepal_width petal length petal » class 1 4g 30 14 02 tis-setose 2 ay 32 12 02 tis-setosa 3 46 18 02 tis-setosa 6 46 34 14 03 tis-setose To select multiple rows and columns, just like in the loc[] function, but this time with row anc column index numbers iris. iloc[[1,2,3],[2,3]] petal length petal width 1 14 02 2 13 02 3 1s 02 Itis possible to use column names and index numbers together. inis["class"][@: 2 Inis-setosa 1 Inis-setosa 2 Inis-setosa 3 Inis-setosa 4 Inis-setosa Name: class, dtype: object iris[["sepal_length”, "sepal_width"]][@:5] sepal_length sepal width 0 5A 35 1 4g 30 2 ay 32 3 46 3 4 50 36 iris. loc[5:10, "sepal_length”:"petal_length"] sepal_length sepal width petal length 5 54 39 7 6 46 34 14 7 50 34 1s 8 44 2¢ 14 9 49 34 1s 10 54 37 1s iris{"sepal_length’] = takes the column as a one-dimensional data array. iis{(’sepal_length"]] = takes the column as pandas dataframe. iris[2:5] = Retrieves rows from line 3 (index number 2) toline 5 (index number 4) iris.loc(5] = takes the column as a one-dimensional data array. iris loc{{5I] = takes the column as pandas dataframe irisiloc{{SI] = Retrieves the row with index number 1. PANDAS DATA ANALYSIS Values such as minimum, maximum, mean, standard deviation, median, 25% slice are available in the .describe() method of the pandas module. inis.describe() sepal_length sepal.width petal length petal width count — 150,000000 150.00000¢ 150.0000 150.0000 mean 5.843333 «3.054000 «3.758667 ‘1.198667 std 0.828065 9.433594 1.764420 0.763161 min 4300000 2.00000¢~—1.000000 0.100000 25% 5.100000 2.800000 ~~ 1.600000 0.300000 30% 5.800000 3.000000 ~—«4.350000 ‘1.300000 75% 5.400000 330000C~=—$.100000 ‘1.800000 max 7.900000 4.40000¢ +~—-6.900000 2.500000 inis["class"] .describe() count: 156 unique 3 top Iris-setose freq 5e Name: class, dtype: object We can use the .unique() method to see which categorical variables are included in a column, inis["class"] -unique() array(['Iris-setosa', 'Iris-versicolor’, ‘Iris-virginica’], dtype=object) -count() method is available to separately calculate summary values such as mean anc standard deviation of numerical information. inis.count() sepal_length 15@ sepal_width 15 petal_length 15@ petal_width 15@ class 158 dtype: inted data = [“petal_length”, ‘petal_width'] iris[data].count() petal_length 15@ petal_width 15@ dtype: inted Mean rmean() Standard deviation -quantile(y) y is amount of percentile > std() Median > amedian() Percentile In order to make row-based calculations, we need to specify the axi columns’ argument in the method, iris.mean(axis='colunns’) C:\Users \batuh\AppData\Local \Temp\ipykernel_15580\987785517.py:1: FutureWarning: Orc pping of nuisance colunns in Datafrane reductions (with ‘nuneric_only-None') is depr ecated; in a future version this will raise Typetrror. Select only valid colunns be fore calling the reduction iris.mean(axis="columns') 2 2.558 1 2.375 2 2.358 3 2.358 4 2558 154.300 146 3.925 a7 4.175 14g 4.325 14g 3.95@ Length: 150, dtype: floatea inis.mean(axis=1) C:\Users\batuh\AppData\Local \Temp\ipykernel_15580\1464791641.py:1: FutureWarning: Dr opping of nuisance columns in DataFrane reductions (with ‘numeric only=None') is dep recated; in a future version this will raise TypeError. Select only valid columns b efore calling the reduction. inis.mean(axis=1) 2 2.558 1 2.375 2 2.358 3 2.358 4 2.558 154.300 146 3.925 147 4.475 148 4,325 ug 3.95¢ Length: 150, dtype: floatea iris.mean(axis="rows*) C:\Users\batuh\AppData\Local \Temp\ipykernel_15580\2870185531.py:1: FutureWarning: Dr opping of nuisance columns in DataFrame reductions (with ‘numeric_onlysNone") is dep recated; in a future version this will raise TypeError. Select only valid columns b efore calling the reduction. inis.mean(axis="rows' ) sepal_length 5.843333 sepal_width 3.@54000 petal_length 3.758667 petal_width 1.198667 dtype: floatea The way to make conditional selection in text type columns is to apply the .str method. inis2 = pd.read_csv(""iris-write-fron-docker.cs\ cond = inis2["class"].str.contains("Iris-setos: setosa = iris2[ cond) inis2.head() sepal_length sepal.width petal_length petal width class 0 5A 35 14 02 tis-setose 1 4s 30 14 02 tis-setose 2 a7 32 12 02 tis-setosa 3 46 a 18 02 tis-setosa 4 50 36 14 02 tis-setose CONDITIONAL CHOICES inis[iris.sepal_length > 7.5] sepal_length sepal width petal length petal width class 105 16 30 66 2.1 Iis-vinginice "17 17 38 67 22. ris-virginice 118 17 26 69 23. ris-virginice 122 17 28 67 2. ris-virginice 131 79 38 64 2. Iris-virginice 135 17 30 6 23. Iris-virginice inis[(inis.sepal_length > 6.5) & (iris.petal_length <4.5)] sepal_length sepal width petal_length petal width class 65 67 34 44 14. is-versicolor 75 66 30 44 14 ris-versicolor iris[(iris.sepal_length > 7.5) | (iris.petal_length > 6.5)] sepal_length sepal width petal length petal width class 105 16 30 66 2.1 Iris-virginice "7 17 38 67 22. ris-virginice 118 17 26 69 23. ris-virginice 122 17 28 67 2. Iris-virginice 131 79 38 64 2.0 Iris-virginice sepal_length sepal width petal length petal width class 135 17 30 6 23. Iris-virginice Itis also possible to pull data in another column by applying a condition to the data of one column. For example, let's see the petal length values of the rows with the sepal length value > 75. iris.petal_length[iris.sepal_length > 7.5] 105 117 118 122 331 B56. Name: petal_length, dtype: floated ally, let's take a look at the reshaping and manipulation operations that can be performed on Fandas dataframes. Let's create the following df dataframe to use in our examples variable = np.repeat(['A’,'8', val = np.random.random(12) df_dict = {'variable' :variable, ‘value’ :val} df = pd.DataFrane(df_dict) df = df[[ ‘variable’, ‘value’]] dF D’},[3,3,3,3],axis=@) variable value 0 A 0658697 1 A 0986387 2 A 9.980378 3 2 0,84828¢ 4 3 0,60038¢ 5 3 0.131574 6 c 9.188967 7 c 0202935 8 © 0431972 9 D 9.960973 10 D 0939120 " D 0903636 Now let's change this dataframe so that it has variable names A.B,C,D. There is a pivot() method for this. df2 = df.pivot (column: df2 variable A B 0 0.658697 NaN 10986387 NaN 2 0980378 NaN 3 NaN 0848280 4 NaN 9.60038 5 NaN 9.131574 6 NaN NaN 7 NaN NaN 8 NaN NaN 9 NaN NaN 10 © NaN NaN 11 NaN NaN ariable’ value: c D NaN Nat NaN Nat NaN Nat NaN Nay NaN Nat NaN Nat 0.188967 Na 0202935 NaN 0431972 NaN NaN 0.960972 NaN 0839120, NaN 9903636 value") There is a melt() method to rewrite columns line by line #3 = df2.melt(value_vars=["A","B","C", "D'], value_nane: 3 0 1 2 15 16 7 30 31 32 45 46 47 variable A A A ooo value 0.658697 9.986387 9.980378 0.848280 9.600380 0.131574 0.188967 9.202935 9.431972 9.960973 9.839120 9.903636 alue" ) éropna() The merge) method is used to merge two dataframes using a key column :['Al1", "Baran®, ‘Mehmet"] , "Y1":[97,85,76]} "Ali", "Baran’, Umut"] , "Y2":[75,94,96]} df4 = pd.DataFrame(dict2) df5 = pd.DataFrane(dict3) print (dF4) print (d#5) x YL Ali 97 Baran 85 Mehmet 76 x y2 Ali 75 Baran 94 Umut 96 #6 = pd.merge(dF4,df5,how=" Left’ ,on="X") df6 x YI v2 ° Ali 97 75.0 1 Baran 8§ 940 2 Mehmet 76 NaN d€7 = pd.merge(df4, df5, ho d¢7 right ,on="X") 0 Ali 970 75 1 Baran 95,0 94 2 Umut NaN 96 £8 = pd.merge(df4,df5,how=" inner’ ,on="X") dfs x vt 2 Oo Ali 97. 75 1 Baran BS 94 d£9 = pd.merge(dfa,dfS,how='outer’ ,on='X") df9 x ° Ali 1 Baran 2 Mehmet 3 Umut ‘1 97 850 766 NaN 750 940 NaN 960

You might also like