0% found this document useful (0 votes)

27 views13 pages

14oct Pandas 2024

Pandas

Uploaded by

Abhishek tripathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views13 pages

14oct Pandas 2024

Pandas

Uploaded by

Abhishek tripathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 13

Chapter 1: Introduction to Pandas

- What is Pandas? Overview of pandas library.

- Installation: Installing pandas (pip install pandas).
- Data Structures: Introduction to pandas Series and DataFrame.
- Basic Operations: Creating, viewing, and manipulating DataFrames.
- Use in Companies: Top companies like Google, Facebook, and Netflix use pandas
for data analysis,
to load and explore data from CSV, Excel, or SQL databases.

1. What is Pandas?
Pandas is a powerful open-source library used for data manipulation and analysis.
It provides easy-to-use data structures and functions to work with structured data
like tables (rows and columns). Pandas is widely used in data science and machine
learning for handling large datasets.

2. Installation
To install pandas, you can use the package installer pip:
bash
pip install pandas

This command installs pandas, allowing you to start using its features in your
projects.

3. Data Structures
Pandas provides two main data structures:
- Series: A one-dimensional array-like object (similar to a list or array). It
is labeled and can hold any type of data (e.g., integers, strings).
- DataFrame: A two-dimensional table with rows and columns, similar to a
spreadsheet or SQL table. It can hold multiple data types and is the primary data
structure in pandas.

4. Basic Operations
With pandas, you can perform a variety of operations:
- Creating DataFrames: You can create a DataFrame from lists, dictionaries, or
reading data from files (like CSV or Excel).
- Viewing Data: You can view parts of the DataFrame using methods
like .head(), .tail(), and .info() to understand the structure of the data.
- Manipulating Data: This includes selecting specific rows or columns, filtering
data, adding or removing columns, and performing operations on the data.

Real-World Use in Companies:

Top companies like Google, Facebook, and Netflix use pandas for data analysis. They
load data from sources like CSV, Excel, or SQL databases and use pandas to explore,
clean, and manipulate the data before building machine learning models or
generating insights. For instance, Netflix might load user behavior data to analyze
and improve their recommendation system using pandas.

1. Install Pandas
(Run this in your terminal or command prompt)
bash
pip install pandas
2. Create a Series

import pandas as pd

data = [10, 20, 30, 40]

series = pd.Series(data) Creating a Series
print(series)

3. Create a DataFrame from a Dictionary

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

df = pd.DataFrame(data) Creating DataFrame from dictionary
print(df)

4. View First 5 Rows of a DataFrame

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'], 'Age': [25, 30, 35,
28, 22]}
df = pd.DataFrame(data)
print(df.head()) Viewing first 5 rows of the DataFrame

5. View DataFrame Information

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print(df.info()) Viewing information about the DataFrame

6. Select a Column from DataFrame

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print(df['Name']) Selecting the 'Name' column

7. Filter Rows Based on a Condition

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

df = pd.DataFrame(data)
filtered_df = df[df['Age'] > 25] Filtering rows where Age > 25
print(filtered_df)

8. Add a New Column to DataFrame

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
df['City'] = ['New York', 'Los Angeles', 'Chicago'] Adding new column 'City'
print(df)

9. Drop a Column from DataFrame

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NY',
'LA', 'CHI']}
df = pd.DataFrame(data)
df = df.drop('City', axis=1) Dropping the 'City' column
print(df)

10. Save DataFrame to CSV

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

df = pd.DataFrame(data)
df.to_csv('output.csv', index=False) Saving DataFrame to a CSV file

Chapter 2: Loading Data

- Reading Data: Loading data from CSV, Excel, JSON, SQL, and APIs using functions
like pd.read_csv(),
pd.read_excel(), pd.read_sql(), etc.
- Writing Data: Saving DataFrames to different formats (CSV, Excel, JSON).
- Use in Companies: Data ingestion is crucial for firms like Amazon, which
process
large datasets from different formats and sources in their analytics pipelines.

1. Reading Data
Data is often stored in different formats like CSV, Excel, JSON, SQL, or APIs, and
it needs to be loaded into Python for analysis. Pandas provides easy-to-use
functions for this.
- Loading CSV Files (pd.read_csv()): This function reads data from CSV files
into a DataFrame.
- Loading Excel Files (pd.read_excel()): This function reads data from Excel
files into a DataFrame.
- Loading JSON Files (pd.read_json()): This function reads JSON data into a
DataFrame.
- Loading Data from SQL Databases (pd.read_sql()): This function reads data from
SQL databases into a DataFrame.
- Loading Data from APIs: Data can also be loaded from web APIs by making HTTP
requests and converting the response into a DataFrame.

2. Writing Data
After analyzing or manipulating data, it’s often saved back to a file or sent to
another system. Pandas can write data to multiple formats like CSV, Excel, or JSON.
- Saving Data as CSV (DataFrame.to_csv()): This function saves DataFrame data to
a CSV file.
- Saving Data as Excel (DataFrame.to_excel()): This function saves DataFrame
data to an Excel file.
- Saving Data as JSON (DataFrame.to_json()): This function saves DataFrame data
to a JSON file.

Real-World Use in Companies:

Companies like Amazon need to ingest and process large datasets from various
formats such as CSV, Excel, and JSON for analytics. For example, sales data may
come from CSV files, inventory data from Excel, and customer feedback data from a
JSON API. Data ingestion is a critical part of their analytics pipeline to gather
insights and make data-driven decisions.

1. Read Data from CSV File

python
import pandas as pd

df = pd.read_csv('data.csv') Reading CSV file

print(df.head()) Display the first 5 rows

2. Read Data from Excel File

python
import pandas as pd

df = pd.read_excel('data.xlsx') Reading Excel file

print(df.head()) Display the first 5 rows

3. Read Data from JSON File

python
import pandas as pd

df = pd.read_json('data.json') Reading JSON file

print(df.head()) Display the first 5 rows

4. Read Data from SQL Database

python
import pandas as pd
import sqlite3

conn = sqlite3.connect('database.db') Connect to SQL database

df = pd.read_sql('SELECT FROM users', conn) Reading from SQL
print(df.head()) Display the first 5 rows

5. Read Data from API

python
import pandas as pd
import requests

response = requests.get('https://api.example.com/data') Make API request

df = pd.DataFrame(response.json()) Convert API response to DataFrame
print(df.head()) Display the first 5 rows
6. Save Data to CSV File
python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
df.to_csv('output.csv', index=False) Save DataFrame to CSV file

7. Save Data to Excel File

python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
df.to_excel('output.xlsx', index=False) Save DataFrame to Excel file

8. Save Data to JSON File

python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
df.to_json('output.json') Save DataFrame to JSON file

9. Read Specific Columns from CSV File

python
import pandas as pd

df = pd.read_csv('data.csv', usecols=['Name', 'Age']) Load only 'Name' and

'Age' columns
print(df.head()) Display the first 5 rows

10. Load Data from Excel with Specific Sheet

python
import pandas as pd

df = pd.read_excel('data.xlsx', sheet_name='Sheet2') Load data from 'Sheet2'

print(df.head()) Display the first 5 rows

Chapter 3: DataFrame Manipulation

- DataFrame Indexing: Selecting rows/columns using .loc[], .iloc[], and conditional
filtering.
- Adding/Deleting Columns: Modifying data by adding or dropping columns.
- Renaming Columns and Index: Using .rename().
- Use in Companies: Companies like Airbnb use this for cleaning and transforming
data from user
listings to analyze trends and improve services.

1. DataFrame Indexing
Indexing allows selecting specific rows or columns from a DataFrame. It's important
for working with specific parts of your data.
- Selecting Rows/Columns with .loc[]: Used to select data by label (index names
or column names).
- Selecting Rows/Columns with .iloc[]: Used to select data by position (row or
column numbers).
- Conditional Filtering: Used to filter rows based on a condition (e.g., select
rows where age > 30).

2. Adding/Deleting Columns
We can easily modify data by adding new columns or removing unwanted ones.
- Adding Columns: New columns can be added by assigning values to a new column
name.
- Deleting Columns: Columns can be removed using the .drop() function.

3. Renaming Columns and Index

Renaming is useful when column names or index labels need to be changed for clarity
or consistency.
- Using .rename(): This function allows you to rename column names or row index
labels.

Real-World Use in Companies:

Companies like Airbnb use DataFrame manipulation to clean and transform user
listing data. For example, Airbnb might rename columns, filter listings based on
location, or add new columns to calculate metrics like price per night. These steps
help in analyzing trends and improving services for hosts and guests.

1. Select Rows by Index Name (.loc[])

python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
selected_row = df.loc[1] Select row with index 1 (Anna)
print(selected_row)

2. Select Rows by Position (.iloc[])

python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
selected_row = df.iloc[2] Select row at position 2 (Peter)
print(selected_row)

3. Conditional Filtering
python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
filtered_data = df[df['Age'] > 30] Select rows where Age > 30
print(filtered_data)
4. Add a New Column
python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
df['City'] = ['New York', 'Boston', 'Chicago'] Adding a new column
print(df)

5. Delete a Column
python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35], 'City': ['New
York', 'Boston', 'Chicago']}
df = pd.DataFrame(data)
df = df.drop('City', axis=1) Remove the 'City' column
print(df)

6. Rename Columns
python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
df = df.rename(columns={'Name': 'Full Name', 'Age': 'Years'}) Renaming columns
print(df)

7. Select Multiple Columns by Label (.loc[])

python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35], 'City': ['New
York', 'Boston', 'Chicago']}
df = pd.DataFrame(data)
selected_columns = df.loc[:, ['Name', 'City']] Select 'Name' and 'City'
columns
print(selected_columns)

8. Select Multiple Columns by Position (.iloc[])

python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35], 'City': ['New
York', 'Boston', 'Chicago']}
df = pd.DataFrame(data)
selected_columns = df.iloc[:, [0, 2]] Select first and third columns
print(selected_columns)

9. Filter Rows Based on Multiple Conditions

python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35], 'City': ['New
York', 'Boston', 'Chicago']}
df = pd.DataFrame(data)
filtered_data = df[(df['Age'] > 25) & (df['City'] == 'New York')] Age > 25 and
lives in New York
print(filtered_data)

10. Rename Index Labels

python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

df = pd.DataFrame(data)
df = df.rename(index={0: 'A', 1: 'B', 2: 'C'}) Renaming row index labels
print(df)

Chapter 4: Data Cleaning

- Handling Missing Data: Detecting and filling missing values with isna(),
fillna(), and dropna().
- Duplicates: Identifying and removing duplicate rows.
- Data Type Conversion: Converting data types with .astype().
- Use in Companies: Data cleaning is essential for data-driven firms like Uber to
ensure the quality of their datasets (e.g., handling missing data from rider or
driver inputs).

1. Handling Missing Data

When data is incomplete, it often has missing values that need to be managed. This
is important because missing data can affect the results of analysis.
- Detecting Missing Values (isna()): This function helps to find if any data is
missing (shows True where data is missing).
- Filling Missing Values (fillna()): This function allows you to fill missing
data with a specific value (e.g., fill with 0 or the average).
- Removing Missing Values (dropna()): This function removes rows or columns with
missing data.

2. Duplicates
Sometimes data may have duplicate rows, which can lead to inaccurate results.
- Identifying Duplicates: We can find duplicate rows using the .duplicated()
function (shows True for duplicates).
- Removing Duplicates: We can remove duplicates using the .drop_duplicates()
function.

3. Data Type Conversion

Data in different columns may not always be in the correct type (e.g., numbers
stored as text). Converting data types is important to ensure calculations and
operations work as expected.
- Converting Data Types (astype()): This function is used to change the data
type of a column (e.g., convert text to numbers or dates).

Real-World Use in Companies:

Companies like Uber need to ensure data quality for better decision-making. For
example, rider and driver data may have missing or incorrect entries (like missing
pickup locations or incorrect driver ratings). Data cleaning helps fix these issues
so that analysis and predictions are accurate.
1. Detect Missing Values

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', None], 'Age': [28, 22, 35, 30]}
df = pd.DataFrame(data)
print(df.isna())

2. Fill Missing Values with 0

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', None], 'Age': [28, 22, 35, 30]}
df = pd.DataFrame(data)
df_filled = df.fillna(0)
print(df_filled)

3. Fill Missing Values with a Specific Value

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', None], 'Age': [28, 22, 35, 30]}
df = pd.DataFrame(data)
df_filled = df.fillna('Unknown')
print(df_filled)

4. Remove Rows with Missing Values

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', None], 'Age': [28, 22, 35, 30]}
df = pd.DataFrame(data)
df_cleaned = df.dropna()
print(df_cleaned)

5. Identify Duplicates

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'John'], 'Age': [28, 22, 35, 28]}
df = pd.DataFrame(data)
print(df.duplicated())

6. Remove Duplicates

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'John'], 'Age': [28, 22, 35, 28]}
df = pd.DataFrame(data)
df_no_duplicates = df.drop_duplicates()
print(df_no_duplicates)

7. Convert Data Type (Text to Numbers)

import pandas as pd

data = {'Name': ['John', 'Anna'], 'Age': ['28', '22']}

df = pd.DataFrame(data)
df['Age'] = df['Age'].astype(int)
print(df)

8. Convert Data Type (Text to Date)

import pandas as pd

data = {'Date': ['2023-01-01', '2023-02-01']}

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
print(df)

9. Fill Missing Values with Column Average

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Lucy'], 'Age': [28, 22, None, 30]}
df = pd.DataFrame(data)
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)

10. Remove Columns with Missing Values

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35], 'Location':
[None, 'NY', 'CA']}
df = pd.DataFrame(data)
df_cleaned = df.dropna(axis=1)
print(df_cleaned)

Chapter 5: Data Exploration

- Descriptive Statistics: Summarizing data
with .describe(), .sum(), .mean(), .count(), .median(), etc.
- Value Counts and Sorting: Using .value_counts() and .sort_values().
- Group By: Grouping data with .groupby() for aggregating statistics.
- Use in Companies: Firms like Spotify analyze user data using groupby to compute
insights like top artists by region
or time.

Chapter 5: Data Exploration

Data exploration is a critical step in understanding your dataset. It involves

summarizing and analyzing the
data to derive insights, detect patterns, and identify potential issues before
proceeding to modeling or further analysis.
1. Descriptive Statistics

Descriptive statistics provide a summary of the central tendency, dispersion, and

shape of a dataset's distribution.
It helps in understanding the characteristics of the data.

- Summarizing Data (.describe()): This function generates descriptive

statistics for numerical columns,
including count, mean, standard deviation, min, and max values, and quartiles.

- **Sum of Values (.sum()):** This function returns the sum of the values in a
specified column, helping to
gauge the total amount of data.

- **Mean Value (.mean()):** This function calculates the average of the values in a
specified column, providing insight into the central tendency.

- Count of Values (.count()): This function counts the number of non-null

entries in a specified column, useful for understanding how much data is present.

- **Median Value (.median()):** This function calculates the median, which is the
middle value when data is sorted, offering a robust measure of central tendency.

2. Value Counts and Sorting

Understanding the frequency of unique values in a column is essential for

categorical data analysis.

- Value Counts (.value_counts()): This function counts the occurrences of each

unique value in a specified column, helping identify the distribution of
categorical data.

- **Sorting Values (.sort_values()):** This function sorts the data based on the
values in a specified column, making it easier to identify trends and outliers.

3. Group By

Grouping data is crucial for aggregating statistics and performing operations on

subsets of the dataset.

- **Grouping Data (.groupby()):** This function allows you to group the data based
on one or more columns and apply aggregate functions to summarize the data. For
example, it can compute the mean or count for each group, providing insights into
various segments.

Real-World Use in Companies

Companies like Spotify analyze user data using groupby to compute insights like top
artists by region or time. By grouping data, they can identify user preferences,
trends, and areas for targeted marketing.

1. **Descriptive Statistics**
python
import pandas as pd
data = {'Age': [28, 22, 35, 30, 25]}
df = pd.DataFrame(data)
print(df.describe())

2. **Sum of Values**
python
import pandas as pd

data = {'Sales': [200, 150, 300]}

df = pd.DataFrame(data)
print(df['Sales'].sum())

3. **Mean Value**
python
import pandas as pd

data = {'Age': [28, 22, 35, 30]}

df = pd.DataFrame(data)
print(df['Age'].mean())

4. **Count of Values**
python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter']}

df = pd.DataFrame(data)
print(df['Name'].count())

5. **Median Value**
python
import pandas as pd

data = {'Age': [28, 22, 35, 30]}

df = pd.DataFrame(data)
print(df['Age'].median())

6. **Value Counts**
python
import pandas as pd

data = {'Color': ['Red', 'Blue', 'Red', 'Green']}

df = pd.DataFrame(data)
print(df['Color'].value_counts())

7. **Sorting Values**
python
import pandas as pd

data = {'Age': [28, 22, 35, 30]}

df = pd.DataFrame(data)
print(df.sort_values(by='Age'))
8. **Group By Example**
python
import pandas as pd

data = {
'Artist': ['A', 'B', 'A', 'C', 'B'],
'Streams': [100, 150, 200, 300, 250]
}
df = pd.DataFrame(data)
grouped = df.groupby('Artist')['Streams'].sum()
print(grouped)

1745516832930-Pandas-Handbook
No ratings yet
1745516832930-Pandas-Handbook
33 pages
MK Indy Build Manual
No ratings yet
MK Indy Build Manual
40 pages
Final Perioperative Guideline for Print
No ratings yet
Final Perioperative Guideline for Print
154 pages
Reflection On RA 9184
100% (5)
Reflection On RA 9184
4 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Learning Pandas PDF
No ratings yet
Learning Pandas PDF
171 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Getting Started with SAS Programming: Using SAS Studio in the Cloud
From Everand
Getting Started with SAS Programming: Using SAS Studio in the Cloud
Ron Cody
No ratings yet
05 Pandas Data Frames
No ratings yet
05 Pandas Data Frames
33 pages
Data Analysis with Pandas
No ratings yet
Data Analysis with Pandas
122 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
UNIT II Notes (1)
No ratings yet
UNIT II Notes (1)
23 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
dav 2 unit
No ratings yet
dav 2 unit
55 pages
What is pandas
No ratings yet
What is pandas
9 pages
mypnotes
No ratings yet
mypnotes
3 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Pandas
No ratings yet
Pandas
4 pages
DAP_3_module
No ratings yet
DAP_3_module
62 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Practical Guide To Pandas For Data Science
100% (1)
Practical Guide To Pandas For Data Science
26 pages
Unit6 - Working With Data
No ratings yet
Unit6 - Working With Data
29 pages
Pandas (Ziad)
No ratings yet
Pandas (Ziad)
38 pages
2_Pandas
No ratings yet
2_Pandas
22 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Pandas
No ratings yet
Pandas
41 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
18_Pandas
No ratings yet
18_Pandas
33 pages
Introduction To Pandas For Data Analysis
No ratings yet
Introduction To Pandas For Data Analysis
6 pages
Unit 4.2
No ratings yet
Unit 4.2
24 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
21 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
6 pages
Pandas PDF
No ratings yet
Pandas PDF
171 pages
introduction to pandas
No ratings yet
introduction to pandas
14 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Lecture 7 Understanding dataFrames in Python and R
No ratings yet
Lecture 7 Understanding dataFrames in Python and R
17 pages
Server Hosting Management System (Ip Class 12) (2024-25)
No ratings yet
Server Hosting Management System (Ip Class 12) (2024-25)
21 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
Rest of the Ip Project
No ratings yet
Rest of the Ip Project
26 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
ainotes
No ratings yet
ainotes
5 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Week 2_Data Exploration
No ratings yet
Week 2_Data Exploration
8 pages
Employee Data Analysis System ( Ip Class 12 ) ( 2024-25 )
No ratings yet
Employee Data Analysis System ( Ip Class 12 ) ( 2024-25 )
30 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas Learndatasci
No ratings yet
Pandas Learndatasci
86 pages
Python 3rd unit question and answer
No ratings yet
Python 3rd unit question and answer
25 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
Unit 4
No ratings yet
Unit 4
36 pages
EMPLOYEE DATA ANALYSIS SYSTEM (IP CLASS XII)
No ratings yet
EMPLOYEE DATA ANALYSIS SYSTEM (IP CLASS XII)
26 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Pandas,Numpy,Matplotlib
No ratings yet
Pandas,Numpy,Matplotlib
11 pages
Oracle Essbase 9 Implementation Guide
From Everand
Oracle Essbase 9 Implementation Guide
Joseph Sydney Gomez
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
SAS Viya: The Python Perspective
From Everand
SAS Viya: The Python Perspective
Kevin D. Smith
No ratings yet
Pandas
No ratings yet
Pandas
50 pages
Scenario Pandas
No ratings yet
Scenario Pandas
4 pages
Data Science 150 Mcq
No ratings yet
Data Science 150 Mcq
16 pages
Matplotlib Test
No ratings yet
Matplotlib Test
2 pages
22 Oct 2024 Start Reactjs Project
No ratings yet
22 Oct 2024 Start Reactjs Project
6 pages
python test paper
No ratings yet
python test paper
2 pages
DDFF Student Id Dob Student Name Admissio N Number Fathern Ame Mother Name
No ratings yet
DDFF Student Id Dob Student Name Admissio N Number Fathern Ame Mother Name
15 pages
A Phenonomenological Study of The Lived Experiences of
No ratings yet
A Phenonomenological Study of The Lived Experiences of
14 pages
Inventory Management
No ratings yet
Inventory Management
8 pages
Rocha 2014
No ratings yet
Rocha 2014
9 pages
Advandced Welding Symbols
100% (1)
Advandced Welding Symbols
24 pages
EXPRESIVE ARTS NOTES FOR STD 5
No ratings yet
EXPRESIVE ARTS NOTES FOR STD 5
24 pages
IT-2024-EMC-Fundamentals-Guide
No ratings yet
IT-2024-EMC-Fundamentals-Guide
40 pages
MIT 18.05 Exam 1 Solutions
No ratings yet
MIT 18.05 Exam 1 Solutions
7 pages
Products 1 2 1 480
No ratings yet
Products 1 2 1 480
8 pages
Quantification of The Degradation in Facades With The Application of A Model Developed Using Hygrother
No ratings yet
Quantification of The Degradation in Facades With The Application of A Model Developed Using Hygrother
14 pages
Mahe Construction
No ratings yet
Mahe Construction
60 pages
Structure of Atom - Class 11th
100% (1)
Structure of Atom - Class 11th
38 pages
Iphone Marimba Ringtone Transcription by Lisztlovers PDF
No ratings yet
Iphone Marimba Ringtone Transcription by Lisztlovers PDF
4 pages
CMM 2018-06 Conservation and Management Measure For RFV and Authorisation To Fish
No ratings yet
CMM 2018-06 Conservation and Management Measure For RFV and Authorisation To Fish
8 pages
Dokumen Tips - Lxv-Polish-Physics-Olympiad PL en
No ratings yet
Dokumen Tips - Lxv-Polish-Physics-Olympiad PL en
7 pages
The Principles of Exercise Therapy Text
50% (2)
The Principles of Exercise Therapy Text
306 pages
I OPS GEO 0024 L14 220 T Optimum Spares Instructions
No ratings yet
I OPS GEO 0024 L14 220 T Optimum Spares Instructions
2 pages
Voip CBT - Wireless Types and Topologies
No ratings yet
Voip CBT - Wireless Types and Topologies
19 pages
Abm Article
No ratings yet
Abm Article
2 pages
Reading and Grammar (Relative Clauses)
No ratings yet
Reading and Grammar (Relative Clauses)
3 pages
Action Plan
100% (3)
Action Plan
6 pages
Active Passive 2
No ratings yet
Active Passive 2
16 pages
ACBP6221Ea
No ratings yet
ACBP6221Ea
9 pages
Probability Pair Dice
No ratings yet
Probability Pair Dice
2 pages
13C2-B Pump
No ratings yet
13C2-B Pump
5 pages
Banking Law
0% (3)
Banking Law
5 pages
A Novel Channel Estimation Algorithm For 3GPP LTE Downlink System Using Joint Time-Frequency Two-Dimensional Iterative Wiener Filter
No ratings yet
A Novel Channel Estimation Algorithm For 3GPP LTE Downlink System Using Joint Time-Frequency Two-Dimensional Iterative Wiener Filter
4 pages

14oct Pandas 2024

Uploaded by

14oct Pandas 2024

Uploaded by

Chapter 1: Introduction to Pandas

- What is Pandas? Overview of pandas library.

Real-World Use in Companies:

data = [10, 20, 30, 40]

3. Create a DataFrame from a Dictionary

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

4. View First 5 Rows of a DataFrame

5. View DataFrame Information

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

6. Select a Column from DataFrame

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

7. Filter Rows Based on a Condition

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

8. Add a New Column to DataFrame

9. Drop a Column from DataFrame

10. Save DataFrame to CSV

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}

Chapter 2: Loading Data

Real-World Use in Companies:

1. Read Data from CSV File

df = pd.read_csv('data.csv') Reading CSV file

2. Read Data from Excel File

df = pd.read_excel('data.xlsx') Reading Excel file

3. Read Data from JSON File

df = pd.read_json('data.json') Reading JSON file

4. Read Data from SQL Database

conn = sqlite3.connect('database.db') Connect to SQL database

5. Read Data from API

response = requests.get('https://api.example.com/data') Make API request

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

7. Save Data to Excel File

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

8. Save Data to JSON File

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

9. Read Specific Columns from CSV File

df = pd.read_csv('data.csv', usecols=['Name', 'Age']) Load only 'Name' and

10. Load Data from Excel with Specific Sheet

df = pd.read_excel('data.xlsx', sheet_name='Sheet2') Load data from 'Sheet2'

Chapter 3: DataFrame Manipulation

3. Renaming Columns and Index

Real-World Use in Companies:

1. Select Rows by Index Name (.loc[])

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

2. Select Rows by Position (.iloc[])

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

7. Select Multiple Columns by Label (.loc[])

8. Select Multiple Columns by Position (.iloc[])

9. Filter Rows Based on Multiple Conditions

10. Rename Index Labels

data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 22, 35]}

Chapter 4: Data Cleaning

1. Handling Missing Data

3. Data Type Conversion

Real-World Use in Companies:

2. Fill Missing Values with 0

3. Fill Missing Values with a Specific Value

4. Remove Rows with Missing Values

7. Convert Data Type (Text to Numbers)

data = {'Name': ['John', 'Anna'], 'Age': ['28', '22']}

8. Convert Data Type (Text to Date)

data = {'Date': ['2023-01-01', '2023-02-01']}

9. Fill Missing Values with Column Average

10. Remove Columns with Missing Values

Chapter 5: Data Exploration

Chapter 5: Data Exploration

Data exploration is a critical step in understanding your dataset. It involves

Descriptive statistics provide a summary of the central tendency, dispersion, and

- **Summarizing Data (.describe()):** This function generates descriptive

- **Count of Values (.count()):** This function counts the number of non-null

2. Value Counts and Sorting

Understanding the frequency of unique values in a column is essential for

- **Value Counts (.value_counts()):** This function counts the occurrences of each

Grouping data is crucial for aggregating statistics and performing operations on

- Summarizing Data (.describe()): This function generates descriptive

- Count of Values (.count()): This function counts the number of non-null

- Value Counts (.value_counts()): This function counts the occurrences of each