[go: up one dir, main page]

0% found this document useful (0 votes)
7 views4 pages

Python DataScience Theory and Codes

Uploaded by

gobikaa.om
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

Python DataScience Theory and Codes

Uploaded by

gobikaa.om
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Science Python Theory + Code Notes

8 Mark Questions and Answers

1. How to handle missing values?

- Mean Imputation: Replace missing values with the mean of the column.

- Dropping Rows: Remove rows that contain any missing value.

2. Python code for log transformation and z-score standardization:

import numpy as np

from sklearn.preprocessing import StandardScaler

data = np.array([1, 10, 100, 1000])

log_data = np.log(data)

scaler = StandardScaler()

standardized = scaler.fit_transform(log_data.reshape(-1, 1))

3. Code for 2x2 subplot:

import matplotlib.pyplot as plt

fig, axs = plt.subplots(2, 2)

axs[0, 0].plot([1, 2], [3, 4])

axs[0, 1].bar([1, 2], [3, 4])

axs[1, 0].scatter([1, 2], [3, 4])

axs[1, 1].hist([1, 2, 2, 3])

plt.tight_layout()

plt.show()

4. Code for year vs sales (line) and year vs products (bar):

import matplotlib.pyplot as plt

year = [2020, 2021, 2022]

sales = [200, 250, 300]

products = [20, 30, 25]

plt.plot(year, sales, label='Sales')

plt.bar(year, products, alpha=0.5, label='Products')


Data Science Python Theory + Code Notes

plt.legend()

plt.show()

16 Mark Questions and Answers

1. 3D Plot in Python:

from mpl_toolkits.mplot3d import Axes3D

import matplotlib.pyplot as plt

import numpy as np

fig = plt.figure()

ax = fig.add_subplot(111, projection='3d')

x = np.linspace(-5, 5, 100)

y = np.linspace(-5, 5, 100)

X, Y = np.meshgrid(x, y)

Z = np.sin(np.sqrt(X**2 + Y**2))

ax.plot_surface(X, Y, Z, cmap='viridis')

plt.show()

2. Data cleaning & filtering code:

import pandas as pd

df = pd.DataFrame({'Name': ['Nina', ' Alex ', 'Nate', 'Sam'], 'Division': ['north', 'east', 'south', 'west']})

df['Name'] = df['Name'].str.strip()

starts_with_N = df[df['Name'].str.startswith('N')]

df['Division'] = df['Division'].str.upper()

# Outlier removal using IQR

Q1 = df['some_column'].quantile(0.25)

Q3 = df['some_column'].quantile(0.75)

IQR = Q3 - Q1

df = df[(df['some_column'] >= Q1 - 1.5 * IQR) & (df['some_column'] <= Q3 + 1.5 * IQR)]


Data Science Python Theory + Code Notes

Blackboard Questions Code

1. y = x^2 from -10 to 10:

import matplotlib.pyplot as plt

x = list(range(-10, 11))

y = [i**2 for i in x]

plt.plot(x, y)

plt.title('y = x^2')

plt.grid()

plt.show()

2. Bar chart of subjects and scores:

subjects = ['Math', 'English', 'History', 'Science']

scores = [90, 75, 88, 92]

plt.bar(subjects, scores)

plt.title('Scores by Subject')

plt.show()

3. Sine and Cosine curves with legend:

import numpy as np

x = np.linspace(0, 2*np.pi, 100)

plt.plot(x, np.sin(x), label='Sine')

plt.plot(x, np.cos(x), label='Cosine')

plt.legend()

plt.grid()

plt.show()

4. Seaborn pairplot with Iris:

import seaborn as sns

df = sns.load_dataset('iris')

sns.pairplot(df, hue='species')
Data Science Python Theory + Code Notes

plt.show()

5. Random scatter plot with numpy:

import numpy as np

x = np.random.rand(50)

y = np.random.rand(50)

plt.scatter(x, y)

plt.title('Random Scatter Plot')

plt.show()

Basic Pandas Theory

- Series: 1D labeled array (like a column).

- DataFrame: 2D labeled data (like an Excel sheet).

- Read CSV: pd.read_csv('file.csv')

- Head/Tail: df.head(), df.tail()

- Selection: df['column'], df.iloc[0], df.loc[0, 'col']

- Missing Values: df.dropna(), df.fillna(), df.isnull()

- Mean Imputation: df['col'].fillna(df['col'].mean())

- Grouping: df.groupby('col').mean(), df['col'].sum()

- Text Ops: df['Name'].str.startswith('N'), df['Name'].str.strip()

- Outlier Removal: IQR method using quantile()

- Uppercase Transformation: df['Division'] = df['Division'].str.upper()

- Merge: pd.merge(df1, df2, on='col')

- Concatenate: pd.concat([df1, df2])

You might also like