unit 5
unit 5
2-marks
1.Define Matplotlib
Matplotlib is a popular Python library used for creating static, animated, and interactive
visualizations, such as plots, charts, and graphs. It provides an object-oriented API for embedding
plots into applications.
5. Define Histogram.
A histogram is a type of bar chart that represents the distribution of a dataset by showing the
frequency of data points within specific ranges (bins). The x-axis represents the data intervals, and
the y-axis shows the frequency or count of values within each interval.
5-marks
1. Explain the concept of data frame and series?
DataFrame and Series:
In the context of Pandas, a popular Python library for data manipulation, DataFrame and Series are
the core data structures used to handle and analyze data.
1. Series:
• A Series is a one-dimensional array-like object in Pandas that can hold any data type
(integers, floats, strings, etc.).
• A Series has an associated index, which labels each element in the Series. If an index is not
explicitly provided, Pandas assigns a default integer index starting from 0.
2. DataFrame:
• A DataFrame is a two-dimensional table (or 2D array) with rows and columns. It is essentially
a collection of Series that share a common index.
• It can be thought of as a table, where each column is a Series and all columns have the same
index.
• A DataFrame allows for more complex data manipulation than a Series because it can hold
multiple columns with potentially different data types
.
• Key Differences:
Use Cases:
• Series is useful when you need to work with a single column of data.
• DataFrame is ideal for handling tabular data, where you have multiple variables (columns)
and potentially need to perform complex operations across rows and columns.
Matplotlib is one of the most popular Python libraries for creating static, animated, and interactive
data visualizations. It provides an easy-to-use interface for generating a wide range of plots and
charts, allowing data scientists and analysts to gain insights from data. The most common
visualizations include line plots, bar charts, scatter plots, histograms, and more. Here's a brief
overview of how you can use Matplotlib to perform basic data visualization.
Before starting, you need to install the library if you haven't already:
1. Introduction to NumPy
NumPy provides an object called ndarray, which stands for N-dimensional array. This array
is a fast, flexible container for large datasets. It allows mathematical operations on large
datasets efficiently, and it is much faster than Python's built-in list structure for numerical
data.
The core features of NumPy include:
• Multi-dimensional arrays: NumPy arrays (ndarray) can be 1D (vectors), 2D
(matrices), or even higher-dimensional.
• Vectorized operations: You can apply operations on entire arrays rather than having
to iterate through individual elements.
• Mathematical functions: NumPy provides many functions for linear algebra,
statistical operations, random sampling, and more.
2. Installing NumPy
If you don't have NumPy installed, you can install it via pip:
3. NumPy Arrays
The central feature of NumPy is the ndarray object, which represents an n-dimensional
array. It can hold elements of any data type, but it is commonly used for numerical data (e.g.,
integers or floats).
You can create NumPy arrays in various ways, such as from lists, tuples, or using built-in
functions.
import numpy as np
print(arr)
OUTPUT:
[1 2 3 4 5]
Explanation: The np.array() function converts a Python list into a NumPy array.
Array Attributes
Example:
print(arr_2d.shape) # Output: (3, 3) -> 3 rows and 3 columns
One of the key benefits of NumPy arrays is that they support vectorized operations, which
means that operations can be applied to entire arrays, rather than having to loop through
individual elements.
Element-wise Operations
You can perform mathematical operations like addition, subtraction, multiplication, and
division directly on arrays.
import numpy as np
print(result)
Output:
[5 7 9]
Example 2: Multiplication
print(result)
Output:
[2 4 6 8]
NumPy arrays support a powerful feature called broadcasting, where smaller arrays are
automatically expanded to match the shape of larger arrays for element-wise operations.
Example:
print(result)
Output:
[[2 3 4]
[3 4 5]
[4 5 6]]
• Explanation: The smaller array arr2 is broadcasted across the rows of the larger array
arr1, enabling element-wise addition.
5. Reshaping Arrays
NumPy allows reshaping arrays into different dimensions using the reshape() method,
provided that the total number of elements remains the same.
Example:
print(reshaped_arr)
Output:
[[1 2 3]
[4 5 6]]
You can index and slice NumPy arrays similar to Python lists, but NumPy also supports
advanced indexing techniques.
Basic Indexing and Slicing
arr = np.array([1, 2, 3, 4, 5])
# Indexing
print(arr[2]) # Output: 3
# Slicing
print(arr[1:4]) # Output: [2 3 4]
Advanced Indexing
NumPy allows more advanced indexing, like using Boolean arrays or selecting elements by
index positions.
Example:
arr = np.array([10, 20, 30, 40, 50])
• Mean of Array:
Output:
[[19 22]
[43 50]]
10. Conclusion
NumPy is an essential tool for scientific computing in Python, providing support for large,
multi-dimensional arrays and matrices, along with a wide range of mathematical operations
to manipulate these arrays. It is fast, efficient, and highly optimized for numerical
computations, making it a foundation for other scientific libraries like Pandas, SciPy, and
Matplotlib.
By mastering NumPy, you can handle and process large datasets, perform complex
mathematical operations, and optimize performance, all with a simple and concise syntax.
DataFrame
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data
structure. It is essentially a table where data is organized in rows and columns. The
DataFrame allows both row and column indexing.
• A DataFrame can be created from lists, dictionaries, or even from a Series.
Example of creating a DataFrame from a dictionary:
import pandas as pd
OUTPUT
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
6. Conclusion
Pandas is a highly versatile and powerful library for data analysis in Python. It is widely used
by data analysts, data scientists, and researchers for tasks such as:
• Data manipulation (cleaning, transformation, aggregation).
• Time series analysis.
• Statistical analysis.
• Merging, reshaping, and handling missing data.
• Visualizing data with minimal effort.
4. Explain the concept of Data wrangling using pandas (Loading a dataset,
Selecting Columns & Rows from a dataframe, Add & delete data in a
dataframe)?