Pandas
Pandas is a powerful Python library for data manipulation and analysis. Some commonly used
functions in pandas include:
1. DataFrame and Series Creation
• pd.DataFrame(data): Create a DataFrame from a dictionary, list, or other data structures.
• pd.Series(data): Create a Series from a list or dictionary.
2. Data Exploration & Inspection
• df.head(n): View the first n rows of the DataFrame.
• df.tail(n): View the last n rows.
• df.info(): Get an overview of the DataFrame, including data types and non-null values.
• df.describe(): Generate summary statistics for numerical columns.
• df.shape: Get the number of rows and columns.
• df.columns: List column names.
• df.dtypes: Get data types of each column.
3. Selecting & Filtering Data
• df['column_name']: Select a single column.
• df[['col1', 'col2']]: Select multiple columns.
• df.loc[row_index, column_name]: Select data using labels.
• df.iloc[row_index, column_index]: Select data using integer positions.
• df[df['column'] > value]: Filter rows based on a condition.
4. Handling Missing Data
• df.isnull(): Detect missing values.
• df.notnull(): Detect non-missing values.
• df.dropna(): Remove missing values.
• df.fillna(value): Fill missing values with a specified value.
• df.interpolate(): Fill missing values using interpolation.
5. Data Cleaning & Transformation
• df.rename(columns={'old_name': 'new_name'}): Rename columns.
• df.replace(to_replace, value): Replace values in a DataFrame.
• df.astype({'col': dtype}): Convert data types.
• df.drop(columns=['col1', 'col2']): Drop specific columns.
• df.drop(index=[0, 1]): Drop specific rows.
6. Sorting & Ordering
• df.sort_values(by='column_name', ascending=True): Sort by a specific column.
• df.sort_index(): Sort by index.
7. Aggregation & Grouping
• df.groupby('column_name').sum(): Group by a column and apply an aggregation
function.
• df.agg({'col1': 'mean', 'col2': 'sum'}): Apply multiple aggregation functions.
• df.pivot(index, columns, values): Pivot table creation.
8. Merging & Joining
• pd.concat([df1, df2]): Concatenate two DataFrames.
• df1.merge(df2, on='key', how='inner'): Merge DataFrames using a key column.
9. Working with Time Series
• pd.to_datetime(df['date_column']): Convert a column to datetime format.
• df['date_column'].dt.year: Extract the year from a date column.
• df.resample('M').sum(): Resample data by month.
10. Exporting & Importing Data
• df.to_csv('filename.csv'): Save DataFrame to CSV.
• df.to_excel('filename.xlsx'): Save DataFrame to Excel.
• pd.read_csv('filename.csv'): Read a CSV file.
• pd.read_excel('filename.xlsx'): Read an Excel file.