Data Science Python Theory + Code Notes
8 Mark Questions and Answers
1. How to handle missing values?
- Mean Imputation: Replace missing values with the mean of the column.
- Dropping Rows: Remove rows that contain any missing value.
2. Python code for log transformation and z-score standardization:
import numpy as np
from sklearn.preprocessing import StandardScaler
data = np.array([1, 10, 100, 1000])
log_data = np.log(data)
scaler = StandardScaler()
standardized = scaler.fit_transform(log_data.reshape(-1, 1))
3. Code for 2x2 subplot:
import matplotlib.pyplot as plt
fig, axs = plt.subplots(2, 2)
axs[0, 0].plot([1, 2], [3, 4])
axs[0, 1].bar([1, 2], [3, 4])
axs[1, 0].scatter([1, 2], [3, 4])
axs[1, 1].hist([1, 2, 2, 3])
plt.tight_layout()
plt.show()
4. Code for year vs sales (line) and year vs products (bar):
import matplotlib.pyplot as plt
year = [2020, 2021, 2022]
sales = [200, 250, 300]
products = [20, 30, 25]
plt.plot(year, sales, label='Sales')
plt.bar(year, products, alpha=0.5, label='Products')
Data Science Python Theory + Code Notes
plt.legend()
plt.show()
16 Mark Questions and Answers
1. 3D Plot in Python:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X**2 + Y**2))
ax.plot_surface(X, Y, Z, cmap='viridis')
plt.show()
2. Data cleaning & filtering code:
import pandas as pd
df = pd.DataFrame({'Name': ['Nina', ' Alex ', 'Nate', 'Sam'], 'Division': ['north', 'east', 'south', 'west']})
df['Name'] = df['Name'].str.strip()
starts_with_N = df[df['Name'].str.startswith('N')]
df['Division'] = df['Division'].str.upper()
# Outlier removal using IQR
Q1 = df['some_column'].quantile(0.25)
Q3 = df['some_column'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['some_column'] >= Q1 - 1.5 * IQR) & (df['some_column'] <= Q3 + 1.5 * IQR)]
Data Science Python Theory + Code Notes
Blackboard Questions Code
1. y = x^2 from -10 to 10:
import matplotlib.pyplot as plt
x = list(range(-10, 11))
y = [i**2 for i in x]
plt.plot(x, y)
plt.title('y = x^2')
plt.grid()
plt.show()
2. Bar chart of subjects and scores:
subjects = ['Math', 'English', 'History', 'Science']
scores = [90, 75, 88, 92]
plt.bar(subjects, scores)
plt.title('Scores by Subject')
plt.show()
3. Sine and Cosine curves with legend:
import numpy as np
x = np.linspace(0, 2*np.pi, 100)
plt.plot(x, np.sin(x), label='Sine')
plt.plot(x, np.cos(x), label='Cosine')
plt.legend()
plt.grid()
plt.show()
4. Seaborn pairplot with Iris:
import seaborn as sns
df = sns.load_dataset('iris')
sns.pairplot(df, hue='species')
Data Science Python Theory + Code Notes
plt.show()
5. Random scatter plot with numpy:
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
plt.scatter(x, y)
plt.title('Random Scatter Plot')
plt.show()
Basic Pandas Theory
- Series: 1D labeled array (like a column).
- DataFrame: 2D labeled data (like an Excel sheet).
- Read CSV: pd.read_csv('file.csv')
- Head/Tail: df.head(), df.tail()
- Selection: df['column'], df.iloc[0], df.loc[0, 'col']
- Missing Values: df.dropna(), df.fillna(), df.isnull()
- Mean Imputation: df['col'].fillna(df['col'].mean())
- Grouping: df.groupby('col').mean(), df['col'].sum()
- Text Ops: df['Name'].str.startswith('N'), df['Name'].str.strip()
- Outlier Removal: IQR method using quantile()
- Uppercase Transformation: df['Division'] = df['Division'].str.upper()
- Merge: pd.merge(df1, df2, on='col')
- Concatenate: pd.concat([df1, df2])