Mypnotes
Mypnotes
I'll include more detailed explanations for each topic along with some basic code
examples. Here's a comprehensive outline of the Pandas Notes:
1. Introduction to Pandas
What is Pandas?
Pandas is an open-source data manipulation and analysis library for Python. It provides
high-performance data structures and data analysis tools, making it easier to manipulate
large datasets, particularly in tabular form (like Excel spreadsheets, SQL tables, etc.).
Key Components:
o Series: A one-dimensional labeled array.
o DataFrame: A two-dimensional labeled data structure with columns of
potentially different types.
Series:
o A Series is similar to a list or an array but comes with labels (called index).
o Code Example:
o import pandas as pd
o s = pd.Series([10, 20, 30, 40])
o print(s)
DataFrame:
o A DataFrame is like a table with rows and columns, where each column can have
different types of data (e.g., integer, float, string).
o Code Example:
o data = {'Name': ['John', 'Alice', 'Bob'],
o 'Age': [23, 35, 30],
o 'City': ['New York', 'Los Angeles', 'Chicago']}
o df = pd.DataFrame(data)
o print(df)
Importing Data: Pandas can read from various file formats like CSV, Excel, JSON, and
SQL databases.
o CSV:
o df = pd.read_csv('file.csv')
o Excel:
o df = pd.read_excel('file.xlsx')
o SQL:
o import sqlite3
o conn = sqlite3.connect('database.db')
o df = pd.read_sql_query('SELECT * FROM table_name', conn)
Exporting Data: You can export data back into CSV, Excel, etc.
o CSV:
o df.to_csv('output.csv')
4. Data Manipulation
5. Data Cleaning
Handling Missing Data: Pandas provides functions to detect and handle missing data.
o Detect missing values:
o df.isnull()
o Fill missing values with a specific value:
o df.fillna(0)
o Drop rows with missing values:
o df.dropna()
Removing Duplicates: You can remove duplicates based on one or more columns:
df.drop_duplicates(subset=['Name'])
GroupBy: The groupby() function allows you to group data based on one or more
columns and then perform aggregation operations.
o Example:
o grouped = df.groupby('City')
o print(grouped['Age'].mean()) # Calculate mean age per city
Aggregation: You can use functions like sum(), mean(), count(), etc., on grouped data.
df.groupby('City').agg({'Age': 'mean', 'Name': 'count'})
7. Merging and Joining Data
9. Visualization
Plotting: Pandas integrates with Matplotlib for quick data visualizations. It has built-in
plotting capabilities.
o Example:
o df.plot(x='Date', y='Sales', kind='line')