[go: up one dir, main page]

0% found this document useful (0 votes)
24 views2 pages

Pandas Definitions Summary

The document provides an overview of Pandas, an open-source library for data analysis, highlighting its key features, data structures (Series and DataFrame), and comparisons with NumPy. It defines essential concepts in data science and data processing, and explains the functionalities of Series and DataFrames, including their operations and attributes. The document serves as a foundational guide for understanding how to utilize Pandas for data manipulation and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views2 pages

Pandas Definitions Summary

The document provides an overview of Pandas, an open-source library for data analysis, highlighting its key features, data structures (Series and DataFrame), and comparisons with NumPy. It defines essential concepts in data science and data processing, and explains the functionalities of Series and DataFrames, including their operations and attributes. The document serves as a foundational guide for understanding how to utilize Pandas for data manipulation and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Pandas: Series and DataFrame Summary

Pandas Definitions and Key Concepts

PANDAS DEFINITIONS & CONCEPTS (From PDF)

1. Data Science:

- Field involving data collection, cleaning, standardization, analysis, visualization, and reporting.

2. Data Processing:

- Prepares data through cleaning, merging, and restructuring before analysis.

3. Python Modules and Libraries:

- Libraries contain modules with pre-defined functions.

- Common libraries: NumPy, Pandas, Matplotlib.

4. Pandas:

- Open-source library for data analysis by Wes McKinney (2008).

- Derived from "Panel Data System".

- Built on NumPy and Matplotlib.

5. Key Features of Pandas:

- Handles missing data

- Efficient and flexible

- Tabular data representation

- Supports file formats, reshaping, sorting, and merging

6. Pandas vs NumPy:

- Pandas: Tabular data, DataFrame/Series, more memory use, slower indexing.

- NumPy: Numerical data, arrays, efficient memory, fast indexing.

7. Pandas Data Structures:

- Series: 1D labelled array (homogeneous data)


Pandas: Series and DataFrame Summary

- DataFrame: 2D labelled structure (heterogeneous data)

- Panel: 3D data structure (rarely used)

8. Series:

- 1D labelled array, homogeneous data.

- Mutable values, immutable size.

- Created from list, dict, array, scalar.

- Supports indexing (positional and labelled) and slicing.

- Missing values shown as NaN.

9. Series Operations:

- Supports vector and binary operations.

- NaN in mismatched indices.

- Use add(), sub() with fill_value to avoid NaN.

10. Series Attributes & Methods:

- Access using head(), tail(), drop(), del()

- Boolean indexing for conditional filtering

- Deleting elements using drop()

11. DataFrame:

- 2D data structure (rows and columns)

- Three components: data, rows, columns

- Mutable, labelled axes, arithmetic on rows/columns

- Created from list, list of lists, dict of lists, dict of series, series, numpy arrays, or another DataFrame

You might also like