[go: up one dir, main page]

0% found this document useful (0 votes)
3 views3 pages

Mypnotes

Uploaded by

Niranjan Patidar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Mypnotes

Uploaded by

Niranjan Patidar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Certainly!

I'll include more detailed explanations for each topic along with some basic code
examples. Here's a comprehensive outline of the Pandas Notes:

1. Introduction to Pandas

 What is Pandas?
Pandas is an open-source data manipulation and analysis library for Python. It provides
high-performance data structures and data analysis tools, making it easier to manipulate
large datasets, particularly in tabular form (like Excel spreadsheets, SQL tables, etc.).
 Key Components:
o Series: A one-dimensional labeled array.
o DataFrame: A two-dimensional labeled data structure with columns of
potentially different types.

2. Basic Data Structures in Pandas

 Series:
o A Series is similar to a list or an array but comes with labels (called index).
o Code Example:
o import pandas as pd
o s = pd.Series([10, 20, 30, 40])
o print(s)
 DataFrame:
o A DataFrame is like a table with rows and columns, where each column can have
different types of data (e.g., integer, float, string).
o Code Example:
o data = {'Name': ['John', 'Alice', 'Bob'],
o 'Age': [23, 35, 30],
o 'City': ['New York', 'Los Angeles', 'Chicago']}
o df = pd.DataFrame(data)
o print(df)

3. Data Import and Export

 Importing Data: Pandas can read from various file formats like CSV, Excel, JSON, and
SQL databases.
o CSV:
o df = pd.read_csv('file.csv')
o Excel:
o df = pd.read_excel('file.xlsx')
o SQL:
o import sqlite3
o conn = sqlite3.connect('database.db')
o df = pd.read_sql_query('SELECT * FROM table_name', conn)
 Exporting Data: You can export data back into CSV, Excel, etc.
o CSV:
o df.to_csv('output.csv')

4. Data Manipulation

 Indexing and Selecting Data:


o Accessing rows by index:
o df.iloc[0] # Row 0 (index-based selection)
o df.loc[0] # Row with index label 0 (label-based selection)
 Filtering Data: You can filter data based on conditions.
 df[df['Age'] > 30]
 Sorting Data:
o Sorting by a specific column:
o df.sort_values(by='Age', ascending=False)
 Adding/Removing Columns:
o Adding a column:
o df['Gender'] = ['Male', 'Female', 'Male']
o Removing a column:
o df.drop('Gender', axis=1)

5. Data Cleaning

 Handling Missing Data: Pandas provides functions to detect and handle missing data.
o Detect missing values:
o df.isnull()
o Fill missing values with a specific value:
o df.fillna(0)
o Drop rows with missing values:
o df.dropna()
 Removing Duplicates: You can remove duplicates based on one or more columns:
 df.drop_duplicates(subset=['Name'])

6. Grouping and Aggregation

 GroupBy: The groupby() function allows you to group data based on one or more
columns and then perform aggregation operations.
o Example:
o grouped = df.groupby('City')
o print(grouped['Age'].mean()) # Calculate mean age per city
 Aggregation: You can use functions like sum(), mean(), count(), etc., on grouped data.
 df.groupby('City').agg({'Age': 'mean', 'Name': 'count'})
7. Merging and Joining Data

 Merging DataFrames: Pandas provides a merge() function similar to SQL joins.


o Example:
o df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['John', 'Alice',
'Bob']})
o df2 = pd.DataFrame({'ID': [1, 2, 3], 'Age': [23, 35, 30]})
o merged = pd.merge(df1, df2, on='ID')
 Concatenating DataFrames: Concatenate two or more DataFrames either by rows or
columns.
 df = pd.concat([df1, df2], axis=0) # Concatenate rows

8. Time Series Data

 Datetime Objects: Handling and converting dates is an important feature.


 df['Date'] = pd.to_datetime(df['Date'])
 Resampling: For time series data, you can resample by date or time intervals.
 df.resample('M').sum() # Resample data by month
 Shifting: Shifting data is useful for calculating moving averages or comparing values
over time.
 df['Previous Month'] = df['Sales'].shift(1)

9. Visualization

 Plotting: Pandas integrates with Matplotlib for quick data visualizations. It has built-in
plotting capabilities.
o Example:
o df.plot(x='Date', y='Sales', kind='line')

10. Performance Optimization

 Vectorization: Use vectorized operations to speed up calculations instead of looping


over rows.
 df['Age'] = df['Age'] + 1 # Vectorized addition
 Avoiding Common Pitfalls:
o Avoid using .apply() on DataFrames or Series when possible as it is often
slower than vectorized operations.
o Use inplace=True for operations that modify the original DataFrame to avoid
creating unnecessary copies.

You might also like