[go: up one dir, main page]

0% found this document useful (0 votes)
13 views28 pages

unit 5

Uploaded by

rohith96kum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views28 pages

unit 5

Uploaded by

rohith96kum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

UNIT-V

2-marks
1.Define Matplotlib
Matplotlib is a popular Python library used for creating static, animated, and interactive
visualizations, such as plots, charts, and graphs. It provides an object-oriented API for embedding
plots into applications.

2. What is Scatter plot?


A scatter plot is a graphical representation of data points on a two-dimensional axis, where each
point represents a pair of values, typically used to show the relationship or correlation between two
variables.

3. What Line plot?


A line plot is a type of chart that displays data points connected by straight lines, typically used to
show trends or changes in a variable over time.

4. Define Bar chart


A bar chart is a graphical representation of data where rectangular bars are used to represent the
frequency or value of different categories, with the length of each bar proportional to the data it
represents.

5. Define Histogram.
A histogram is a type of bar chart that represents the distribution of a dataset by showing the
frequency of data points within specific ranges (bins). The x-axis represents the data intervals, and
the y-axis shows the frequency or count of values within each interval.

6. Define Box plot.


A box plot is a graphical representation that displays the distribution of a dataset, showing the
median, quartiles, and potential outliers, with a box representing the interquartile range and
"whiskers" extending to the minimum and maximum values.

5-marks
1. Explain the concept of data frame and series?
DataFrame and Series:

In the context of Pandas, a popular Python library for data manipulation, DataFrame and Series are
the core data structures used to handle and analyze data.

1. Series:
• A Series is a one-dimensional array-like object in Pandas that can hold any data type
(integers, floats, strings, etc.).

• It is similar to a column in a spreadsheet or a single column in a database table.

• A Series has an associated index, which labels each element in the Series. If an index is not
explicitly provided, Pandas assigns a default integer index starting from 0.

2. DataFrame:

• A DataFrame is a two-dimensional table (or 2D array) with rows and columns. It is essentially
a collection of Series that share a common index.

• It can be thought of as a table, where each column is a Series and all columns have the same
index.

• A DataFrame allows for more complex data manipulation than a Series because it can hold
multiple columns with potentially different data types
.

• Key Differences:

• Series: One-dimensional (single column), indexed collection of data.

• DataFrame: Two-dimensional (multiple columns), indexed collection of Series.

Use Cases:

• Series is useful when you need to work with a single column of data.

• DataFrame is ideal for handling tabular data, where you have multiple variables (columns)
and potentially need to perform complex operations across rows and columns.

In summary, a Series is a single-dimensional data structure, while a DataFrame is a two-dimensional


structure that can hold multiple Series with different data types. Both are essential for data
manipulation and analysis in Pandas.

2. Brief the concept of Basic data visualization using python in


MATPLOTLIB.
Basic Data Visualization Using Python in Matplotlib

Matplotlib is one of the most popular Python libraries for creating static, animated, and interactive
data visualizations. It provides an easy-to-use interface for generating a wide range of plots and
charts, allowing data scientists and analysts to gain insights from data. The most common
visualizations include line plots, bar charts, scatter plots, histograms, and more. Here's a brief
overview of how you can use Matplotlib to perform basic data visualization.

1. Installing and Importing Matplotlib

Before starting, you need to install the library if you haven't already:

Then, import the necessary components in your Python script:


• Explanation: Here, x represents the data for the x-axis, and y represents the data for the y-
axis. The plot() function draws a line connecting the data points.
10-MARKS
1. Explain about Basic data analysis using Python?
Conclusion
Basic data analysis in Python involves a series of steps that allow analysts to extract
meaningful insights from raw data. By leveraging libraries like Pandas for data manipulation,
Matplotlib and Seaborn for visualization, and SciPy for statistical analysis, Python provides a
comprehensive ecosystem for data exploration and analysis.
The core steps in the analysis process include:
1. Data Collection: Reading data into Python.
2. Data Cleaning: Handling missing values, duplicates, and type conversions.
3. Data Exploration: Summarizing the data and looking for patterns.
4. Data Visualization: Plotting various charts to better understand the data.
5. Statistical Analysis: Performing statistical tests and measures.
6. Feature Engineering: Creating new features for better analysis or modeling.
7. Data Aggregation: Grouping and summarizing data for insights.
8. Exporting Data: Saving cleaned and analyzed data for future use.
This workflow is fundamental for any data analysis task and sets the stage for further
modeling or decision-making processes.

2. Brief the concept of Numpy in python with example.


Concept of NumPy in Python
NumPy (Numerical Python) is one of the most important libraries for scientific computing in
Python. It provides support for large, multi-dimensional arrays and matrices, along with a
collection of mathematical functions to operate on these arrays. NumPy is especially useful
in data analysis, machine learning, numerical simulations, and scientific computing due to its
fast execution, ease of use, and efficient memory management.
In this section, we will explore the core concepts of NumPy, its key features, and how to use
it with examples.

1. Introduction to NumPy
NumPy provides an object called ndarray, which stands for N-dimensional array. This array
is a fast, flexible container for large datasets. It allows mathematical operations on large
datasets efficiently, and it is much faster than Python's built-in list structure for numerical
data.
The core features of NumPy include:
• Multi-dimensional arrays: NumPy arrays (ndarray) can be 1D (vectors), 2D
(matrices), or even higher-dimensional.
• Vectorized operations: You can apply operations on entire arrays rather than having
to iterate through individual elements.
• Mathematical functions: NumPy provides many functions for linear algebra,
statistical operations, random sampling, and more.

2. Installing NumPy

If you don't have NumPy installed, you can install it via pip:

pip install numpy

3. NumPy Arrays

The central feature of NumPy is the ndarray object, which represents an n-dimensional
array. It can hold elements of any data type, but it is commonly used for numerical data (e.g.,
integers or floats).

Creating NumPy Arrays

You can create NumPy arrays in various ways, such as from lists, tuples, or using built-in
functions.

Example 1: Creating an Array from a List

import numpy as np

# Create a 1D array from a Python list

arr = np.array([1, 2, 3, 4, 5])

print(arr)

OUTPUT:

[1 2 3 4 5]

Explanation: The np.array() function converts a Python list into a NumPy array.

Array Attributes

NumPy arrays have several useful attributes:

• shape: Returns a tuple representing the dimensions of the array.


• ndim: Returns the number of dimensions of the array.
• dtype: Returns the data type of the array elements.

Example:
print(arr_2d.shape) # Output: (3, 3) -> 3 rows and 3 columns

print(arr_2d.ndim) # Output: 2 (2D array)

print(arr_2d.dtype) # Output: int64 (data type of array elements)

4. NumPy Array Operations

One of the key benefits of NumPy arrays is that they support vectorized operations, which
means that operations can be applied to entire arrays, rather than having to loop through
individual elements.

Element-wise Operations

You can perform mathematical operations like addition, subtraction, multiplication, and
division directly on arrays.

Example 1: Addition of Arrays

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

result = arr1 + arr2

print(result)

Output:

[5 7 9]

• Explanation: The arrays are added element-wise (1+4, 2+5, 3+6).

Example 2: Multiplication

arr = np.array([1, 2, 3, 4])

result = arr * 2 # Scalar multiplication

print(result)

Output:

[2 4 6 8]

• Explanation: Each element in the array is multiplied by 2.


Broadcasting

NumPy arrays support a powerful feature called broadcasting, where smaller arrays are
automatically expanded to match the shape of larger arrays for element-wise operations.

Example:

arr1 = np.array([1, 2, 3])

arr2 = np.array([[1], [2], [3]])

result = arr1 + arr2

print(result)

Output:

[[2 3 4]

[3 4 5]

[4 5 6]]

• Explanation: The smaller array arr2 is broadcasted across the rows of the larger array
arr1, enabling element-wise addition.

5. Reshaping Arrays

NumPy allows reshaping arrays into different dimensions using the reshape() method,
provided that the total number of elements remains the same.

Example:

arr = np.array([1, 2, 3, 4, 5, 6])

reshaped_arr = arr.reshape(2, 3) # 2 rows and 3 columns

print(reshaped_arr)

Output:

[[1 2 3]

[4 5 6]]

• Explanation: The array is reshaped into a 2x3 matrix.


6. Indexing and Slicing

You can index and slice NumPy arrays similar to Python lists, but NumPy also supports
advanced indexing techniques.
Basic Indexing and Slicing
arr = np.array([1, 2, 3, 4, 5])

# Indexing
print(arr[2]) # Output: 3

# Slicing
print(arr[1:4]) # Output: [2 3 4]
Advanced Indexing
NumPy allows more advanced indexing, like using Boolean arrays or selecting elements by
index positions.
Example:
arr = np.array([10, 20, 30, 40, 50])

# Select elements by index positions


indices = np.array([0, 2, 4])
print(arr[indices]) # Output: [10 30 50]

7. Mathematical Functions in NumPy


NumPy provides a wide range of mathematical functions to operate on arrays. These include
operations like summing elements, finding the mean, and performing trigonometric
operations.
• Sum of Elements:
arr = np.array([1, 2, 3, 4, 5])
print(np.sum(arr)) # Output: 15

• Mean of Array:

print(np.mean(arr)) # Output: 3.0

• Element-wise Trigonometric Functions:

arr = np.array([0, np.pi / 2, np.pi])


print(np.sin(arr)) # Output: [0. 1. 0.]

8. Random Number Generation


NumPy has a module numpy.random that provides functions for generating random numbers,
which is widely used in simulations and Monte Carlo methods.
Example:
# Generate an array of random integers between 0 and 9
random_arr = np.random.randint(0, 10, size=5)
print(random_arr)
# Generate an array of random floats between 0 and 1
random_floats = np.random.random(5)
print(random_floats)
9. Linear Algebra in NumPy
NumPy also provides functions for linear algebra operations like matrix multiplication,
determinant, eigenvalues, etc.
Example:
# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
result = np.dot(A, B)
print(result)

Output:
[[19 22]
[43 50]]
10. Conclusion
NumPy is an essential tool for scientific computing in Python, providing support for large,
multi-dimensional arrays and matrices, along with a wide range of mathematical operations
to manipulate these arrays. It is fast, efficient, and highly optimized for numerical
computations, making it a foundation for other scientific libraries like Pandas, SciPy, and
Matplotlib.
By mastering NumPy, you can handle and process large datasets, perform complex
mathematical operations, and optimize performance, all with a simple and concise syntax.

3. Brief about Pandas library.


Pandas Library: An Overview
Pandas is one of the most powerful and widely used libraries in Python for data manipulation,
analysis, and cleaning. It provides easy-to-use data structures and data analysis tools for
handling structured data (such as tabular data in the form of CSV files, Excel files, or
databases). It is particularly useful for working with time series data, large datasets, and
performing complex data wrangling tasks.
The main data structures in Pandas are:
• Series: A one-dimensional labeled array.
• DataFrame: A two-dimensional labeled data structure, similar to a table or a
spreadsheet.
In this section, we will go over the core features of Pandas, along with examples of how to
use them.
1. Installation of Pandas
To install Pandas, use the following pip command
pip install pandas

2. Core Data Structures in Pandas


Series
A Series is a one-dimensional array-like object that can hold any data type (integers, strings,
floats, Python objects, etc.). It is similar to a list or a column in a DataFrame but has labels
(called index) for each element.
• A Series is created from lists, NumPy arrays, or dictionaries.
Example of creating a Series:
import pandas as pd

# Creating a Series from a list


data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)

DataFrame
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data
structure. It is essentially a table where data is organized in rows and columns. The
DataFrame allows both row and column indexing.
• A DataFrame can be created from lists, dictionaries, or even from a Series.
Example of creating a DataFrame from a dictionary:

import pandas as pd

# Creating a DataFrame from a dictionary


data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Edward'],
'Age': [24, 27, 22, 32, 29],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)
print(df)

OUTPUT
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
6. Conclusion
Pandas is a highly versatile and powerful library for data analysis in Python. It is widely used
by data analysts, data scientists, and researchers for tasks such as:
• Data manipulation (cleaning, transformation, aggregation).
• Time series analysis.
• Statistical analysis.
• Merging, reshaping, and handling missing data.
• Visualizing data with minimal effort.
4. Explain the concept of Data wrangling using pandas (Loading a dataset,
Selecting Columns & Rows from a dataframe, Add & delete data in a
dataframe)?

Concept of Data Wrangling Using Pandas


Data wrangling (also known as data cleaning or data munging) is the
process of transforming and mapping raw data into a more usable format.
It involves a series of steps that ensure the dataset is ready for analysis,
such as handling missing values, filtering, adding new data, and reshaping
datasets. Pandas, a powerful Python library, is widely used for data
wrangling tasks due to its easy-to-use data structures (such as DataFrame
and Series) and built-in functions for data manipulation.
This section explains how to perform basic data wrangling tasks using
Pandas, including:
1. Loading a dataset into a Pandas DataFrame.
2. Selecting columns and rows from a DataFrame.
3. Adding and deleting data in a DataFrame.

1. Loading a Dataset Using Pandas


The first step in data wrangling is to load the dataset into a Pandas
DataFrame. This can be done from various data sources such as CSV files,
Excel files, SQL databases, JSON files, etc.
Loading Data from CSV
The most common method of loading data is from a CSV file using
pd.read_csv().
import pandas as pd

# Loading a CSV file into a DataFrame


df = pd.read_csv('data.csv')

# Display the first 5 rows of the DataFrame


print(df.head())
Conclusion
Data wrangling is an essential part of data analysis and involves various
tasks, such as loading datasets, selecting specific columns and rows,
adding or removing data, handling missing values, and transforming data.
With Pandas, these operations become straightforward and efficient,
making it easier to prepare your data for analysis, visualization, and
machine learning.
By mastering these basic data wrangling techniques using Pandas, you
can efficiently clean and manipulate datasets, ensuring they are in the right
format for deeper analysis and modeling.

You might also like