Class 12 Informatics Practices - Unit 1 Notes
Unit 1 - Pandas and Matplotlib
UNIT 1 - DATA HANDLING USING PANDAS (Full Notes - Part 1 and 2)
== PART 1: Python Libraries, Series, DataFrames ==
== Introduction to Python Libraries ==
- A library is a collection of pre-written code that you can use.
- Pandas: for data handling.
- Matplotlib: for plotting graphs.
- NumPy: for numerical operations (used by Pandas internally).
== What is Pandas? ==
- Full Form: Python Data Analysis Library.
- Built on top of NumPy.
- Used for data analysis, manipulation.
== Series in Pandas ==
A one-dimensional labeled array. Each element has an index and a value.
-> Creating a Series:
1. From List:
s = pd.Series([10, 20, 30])
2. From Dictionary:
s = pd.Series({"a": 1, "b": 2})
3. From Scalar:
s = pd.Series(5, index=[1, 2, 3])
-> Series Functions:
- s.index, s.values, s.head(), s.tail(), s.dtype, s.shape, s.size
Class 12 Informatics Practices - Unit 1 Notes
-> Operations:
- s + 2, s * 3, s1 + s2 (element-wise if indexes match)
== DataFrames in Pandas ==
A 2D labeled data structure with rows and columns, like a table.
-> Creating a DataFrame:
1. From Dictionary of Lists:
df = pd.DataFrame({"Name": ["Amit", "Riya"], "Marks": [80, 75]})
2. From List of Dictionaries:
df = pd.DataFrame([{"Name": "Amit", "Marks": 80}, {"Name": "Riya", "Marks": 75}])
3. From Dictionary of Series:
df = pd.DataFrame({"Name": pd.Series(["Amit", "Riya"]), "Marks": pd.Series([80, 75])})
4. From CSV File:
df = pd.read_csv("file.csv")
-> DataFrame Attributes/Methods:
- df.shape, df.columns, df.index, df.dtypes, df.info(), df.describe(), df.head(), df.tail()
-> Selecting Data:
- Columns: df["Name"], df[["Name", "Marks"]]
- Rows: df.loc[0], df.iloc[1]
- Slicing: df[1:3]
== PART 2: Data Visualization using Matplotlib ==
== What is Data Visualization? ==
- Graphical representation of information and data using charts.
- Helps identify patterns, trends, and comparisons.
Class 12 Informatics Practices - Unit 1 Notes
== What is Matplotlib? ==
- A popular Python library for data visualization.
- Pyplot module is used for plotting.
Importing:
import matplotlib.pyplot as plt
== Line Plot ==
Used to show trend over time or continuous data.
Example:
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.show()
== Bar Graph ==
Used to compare data among categories.
Example:
x = ['A', 'B', 'C']
y = [5, 10, 3]
plt.bar(x, y)
plt.show()
== Histogram ==
Used to show frequency distribution of numeric data.
Example:
data = [20, 25, 30, 35, 40, 45, 25, 30]
plt.hist(data, bins=5)
plt.show()
== Customizing the Plot ==
Class 12 Informatics Practices - Unit 1 Notes
- plt.title("Title") - sets the title
- plt.xlabel("X-axis label") - sets X-axis label
- plt.ylabel("Y-axis label") - sets Y-axis label
- plt.legend(["Label"]) - adds legend
- plt.grid(True) - adds grid
- plt.savefig("filename.png") - saves the plot as image
Example with all customizations:
x = [1, 2, 3, 4]
y = [10, 20, 15, 25]
plt.plot(x, y)
plt.title("Sample Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.legend(["Values"])
plt.savefig("plot.png")
plt.show()