Important libraries:
• Numpy – arrays
• Matplotlib – data analysis
• Pandas – dataframe manipulation
• String – strings manipulation
• Wget – web scrapping tool
• Qgrid – dataframe visualization
• Zfile – zip file manipulation
• Investpy – investing data harvesting
• Ipywidget – interaction with graphs – (Interact)
F’string – It’s a way of getting dictionaries faster less verbose. Ex: k = “Allan” / f’{k} is a genius’
If it’s needed to use multiple quotes in the sentence, it should be considered to use different quote mark
for f string. Ex: f”{k} told ‘Fuck you’ to the teacher”.
Creating lists with for loop – Can be done by using the command append() or concat(), a for loop.
Arq = []
for i in range(2011,2021):
arq.append(f’qualquer_nome_{i}’, columns=[“arquivo”]) – Create a dataframe and store
qualquernome_nome_{i(ano)} into the column “arquivo”
qgrid.show_grid(dataframe) – opens the dataframe for visualization with grids and filters.
Sorting items
Dataframe.set_index([‘column A’],[‘column B’]) – applies indices to the dataframe
df.sort_values(by=[‘Column_A’,‘Column_B’])
df[Column_A']=df['Column_A'].map("{:,}".format) – this section puts comma on every thousand of the
dataframe data.
df.T - returns the transport of df
Filtering
Simple filtering
df = df[df['Column_A’] == 'filter'] – Will return a dataframe with data where there Will be the string
‘filter’ on ‘column A’
df1 = df["Column_A"].str.contains("Filter") – Will return a dataframe with Boolean check whether the
rows of ‘column A’ contains or not the string ‘Filter’
by calling df[df1] it will apply the filter method ‘df1’ to ‘df’.
Data Analysis
Dataframe.agg({“column_A”: ["min","max","mean","median", "skew"]}) – will return aggregating
function for column A
df.describe() - returns statistics
Data Manipulation
df.shape - shows the dimension of the dataframe you're looking at.
replace() - it's not a string method. It is used to replace multiples elements in the dataframe. Ex:
titanic["Sex_short"] = titanic["Sex"].replace("Male": "M", "Female": "F") – It’ll create a column named
“Sex_Short”, copy the values from “Sex” and replace them with short for male and female.
pd.to_numeric(dataframe[“column_A”]) – mutate the data of the Column A into numeric
Resample() - re-organize data frequency
Query – Allows one to search in the dataframe based on conditions. Ex: df.query( ‘a > b’)
The query sentence must be entered inside quote marks. For columns with spaces in the name it must
be entered with backtick ` ` . Ex: df.query( ‘ `Col Ex` == “Improving”’) . The strings must be entered with
doble quote marks.