100% found this document useful (1 vote)

27 views27 pages

NumPy Handbook For ML

The 'NumPy Handbook for Machine Learning' by Samriddha Pathak is a comprehensive guide aimed at teaching NumPy, a fundamental library for scientific computing in Python, with a focus on its applications in machine learning. The book covers a range of topics from basic array creation to advanced operations, including practical projects like K-Nearest Neighbors and Principal Component Analysis. It is designed for students, researchers, and practitioners looking to enhance their data science skills through efficient numerical data manipulation.

Uploaded by

aadarshamanipandey19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

27 views27 pages

NumPy Handbook For ML

Uploaded by

aadarshamanipandey19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

NumPy Handbook

for Machine Learning

Explore NumPy with beginner – friendly projects and
practical applications

First Edition Samriddha Pathak

NumPy Handbook for Machine Learning

By Samriddha Pathak

Preface
Welcome to the "NumPy Handbook for Machine Learning." This book is intended to be a
comprehensive handbook to learning and utilizing NumPy, the basic package for scientific
computing in Python, and how it is applied in machine learning.
As we enter an age of artificial intelligence and information dominance, the skill of efficiently
manipulating and analyzing numerical data becomes ever more important. The centerpiece of
such a skill is NumPy, which provides efficient tools that make tedious numerical computations
simpler and improve computational efficiency greatly.
This book is intended for students, researchers, and practitioners who aspire to utilize the
complete capabilities of NumPy on their machine learning endeavors. This book is the
necessary background and implementation examples to start with, if you are beginning your
data science journey or seeking to know more about numerical computation in the machine
learning algorithm.
In this book, not only will we be discussing the basics and operations of NumPy but also real-
world usage by implementing detailed projects. By the time you finish this book, you will have
a strong understanding of NumPy that will enable you to solve real-world machine learning
problems with confidence.
I sincerely hope this handbook will be a useful companion on your machine learning journey.
Acknowledgement
The release of this book, NumPy Handbook for Machine Learning, has not been made possible without
the constant encouragement, guidance, and assistance of some individuals. I would like to take this
opportunity to thank the individuals who played a significant role in this endeavor.

First and foremost, I would like to thank Mr. Rasum Subedi for his excellent editorial service. His keen
eye for detail and commitment to editing my manuscript into a professionally written and well-structured
book have been an invaluable help in assisting me in turning my manuscript into a readable and well-
structured book. His commitment and efforts have significantly enhanced the readability and
comprehensibility of this book.

I am highly grateful to my parents, Kumar Pathak and Radhika Pathak, whose belief, encouragement,
and confidence in my potential have been the driving force of my determination. Their encouragement
and constant assurance kept me firm and resolute during this difficult process.

I would also like to express my gratitude to my close friends Shrijan Shrestha, Kriti Dahal, Amnisha
Luitel, Rasum Subedi, Amit Rai, and Binod Basnet for their constant support, stimulating discussions,
and inspiring words. Their support of my idea and their constructive criticism have been vital in the
creation of this book.

A genuine word of thanks to my Sister Jyoti Pokhrel and Saroj Dahal Brother, whose inspiration,
support, and encouragement have been the driving forces behind me in all this. Their faith in me and
support have encouraged me to keep going even when everything seemed to be against me. In addition,
I also want to express my gratitude to the whole data science and machine learning community whose
collective research, expertise, and shared resources have educated and inspired much of what is
presented in this book. Their open-source learning has been invaluable in my own growth and in the
production of this book.

Lastly, I would also appreciate it if all the readers and students interested in expanding their knowledge
on NumPy and its applications in machine learning would thank me. My sincere hope is that this book
is a helpful guide to your studies and assists you in growing in this ever-evolving field.

With sincerest thanks,

Samriddha Pathak
Table of Contents
1. Introduction to NumPy
o What is NumPy?
o Why NumPy for Machine Learning?
o Installation and Setup
2. NumPy Fundamentals
o Creating NumPy Arrays
o Array Attributes
o Array Indexing and Slicing
o Array Reshaping
o Array Iteration
3. Mathematical Operations in NumPy
o Statistical Functions
o Linear Algebra Operations
4. Advanced NumPy Operations
o Broadcasting
o Array Manipulation
o Sorting and Searching
o File I/O with NumPy
5. Project 1: K-Nearest Neighbors from Scratch
o Theory and Concepts
o Implementation
o Testing and Evaluation
6. Project 2: Principal Component Analysis (PCA) Implementation
o Theory and Background
o Implementation with NumPy
o Visualization and Application
7. NumPy Best Practices
o Performance Optimization
o Memory Management
o Common Pitfalls and Solutions

i
8. NumPy and the Machine Learning Ecosystem
o NumPy with Pandas
o NumPy with Scikit-learn
o NumPy with TensorFlow and PyTorch
9. Appendix
o Quick Reference
o Further Resources
o Glossary of Terms

ii
Chapter 1: Introduction to NumPy
What is NumPy?
NumPy (Numerical Python) is a high-performance library for scientific computing in Python.
It provides support for large, multi-dimensional arrays and matrices, along with a vast
collection of mathematical and numerical functions to efficiently manipulate these arrays.
NumPy serves as the foundation for many scientific computing and machine learning libraries,
including Pandas, SciPy, TensorFlow, and PyTorch. It offers significant advantages over
Python’s built-in lists due to its faster computation, lower memory consumption, and efficient
vectorized operations.

Why NumPy for Machine Learning?

Machine learning and artificial intelligence involve handling and processing large datasets
efficiently. NumPy plays a crucial role in optimizing this process by providing:
• High-performance computations: NumPy is implemented in C and Fortran, making it
significantly faster than using Python lists for numerical operations.
• Memory efficiency: NumPy arrays are stored in contiguous memory locations, reducing
the overhead of Python objects and enabling faster access.
• Vectorized operations: Unlike traditional Python loops, NumPy supports vectorization,
which allows operations to be performed on entire arrays at once. This eliminates
explicit loops and speeds up execution.
• Broadcasting: Enables element-wise operations on arrays of different shapes without
the need for explicit reshaping.
• Interoperability: NumPy integrates seamlessly with other libraries, such as:
o Pandas (data manipulation)
o Matplotlib (data visualization)
o Scikit-learn (machine learning)
o TensorFlow & PyTorch (deep learning)

Key Features of NumPy:

Here are some key features that make NumPy an indispensable tool for machine learning:
1. N-dimensional array (ndarray): A powerful object that allows for handling of large
datasets efficiently.
2. Mathematical and statistical functions: Includes operations like mean, median, standard
deviation, and linear algebra functions for numerical analysis.

1
3. Random number generation: Used in machine learning for initializing weights,
bootstrapping, and creating synthetic data.
4. Broadcasting: Allows operations on arrays of different sizes without the need for
manual reshaping.
5. File handling: NumPy allows reading and writing from CSV, binary, and other formats,
making it easy to store and manipulate datasets.
6. Memory efficiency: Uses less memory than Python lists due to optimized storage.

Installation and Setup

Before using NumPy, you need to install it in your Python environment.
Installing NumPy using pip (Recommended Method)
You can install NumPy using the Python package manager pip by running:

pip install numpy

2
Chapter 2: NumPy Fundamentals
Creating NumPy Arrays
A NumPy array is the fundamental building block of the NumPy library. Unlike Python lists,
NumPy arrays provide a more efficient, faster, and memory-optimized way to store and
manipulate numerical data.
1. Creating a NumPy Array from a List

import numpy as np

# Creating a 1D array from a list

arr_1d = np.array([1, 2, 3, 4, 5])

print("1D Array:", arr_1d)

print("Type of the array:", type(arr_1d)) # Verify it's a NumPy array

💡Note:
Unlike Python lists, NumPy arrays provide faster numerical operations and take up less
memory.

2. Creating Multi-Dimensional Arrays

You can also create 2D and 3D arrays in NumPy:

# Creating a 2D NumPy array (Matrix)

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr_2d)

# Creating a 3D NumPy array (Tensor)

arr_3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print("3D Array:\n", arr_3d)

📌 Key Differences:
1D Array: A simple list of numbers
2D Array: A matrix (rows & columns)
3D Array: A collection of 2D arrays

3. Creating Special NumPy Arrays

NumPy provides several built-in functions to create arrays quickly:
a) Creating an Array of Zeros

zeros_arr = np.zeros((3, 3)) # Creates a 3x3 matrix filled with zeros

print("Zeros Array:\n", zeros_arr)

3
b) Creating an Array of Ones

ones_arr = np.ones((2, 5)) # Creates a 2x5 matrix filled with ones

print("Ones Array:\n", ones_arr)

c) Creating an Identity Matrix

identity_matrix = np.eye(4) # Creates a 4x4 identity matrix

print("Identity Matrix:\n", identity_matrix)

d) Creating an Array with a Range of Values

range_arr = np.arange(1, 11, 2) # Creates an array with values from 1 to 10

with a step of 2
print("Range Array:", range_arr)

e) Creating an Array with Evenly Spaced Values

linspace_arr = np.linspace(0, 1, 5) # Creates 5 evenly spaced values between

0 and 1
print("Linspace Array:", linspace_arr)

f) Creating a Random Array

random_arr = np.random.rand(3, 3) # Generates a 3x3 array of random values

between 0 and 1
print("Random Array:\n", random_arr)

🚀 Tip:
These functions are useful for machine learning, especially when initializing weights and
datasets.

Array Attributes
NumPy arrays have several attributes that provide important information about their properties.

# Creating a sample NumPy array

arr = np.array([[10, 20, 30], [40, 50, 60]])

print("Array:\n", arr)
print("Shape of array:", arr.shape) # Returns (rows, columns)
print("Size of array:", arr.size) # Returns total number of elements
print("Number of dimensions:", arr.ndim) # Returns 2 (for 2D array)
print("Data type of elements:", arr.dtype) # Data type of elements
print("Item size in bytes:", arr.itemsize) # Size of each element in bytes
print("Total memory consumed:", arr.nbytes, "bytes") # Total memory usage

4
✅ Key Takeaways:
.shape tells the (rows, columns) of an array
.size gives the total number of elements
.dtype shows the data type of elements
.ndim returns the number of dimensions

Array Indexing and Slicing

Just like Python lists, NumPy arrays support indexing and slicing.
1. Accessing Elements in a 1D Array

arr_1d = np.array([10, 20, 30, 40, 50])

print("First Element:", arr_1d[0]) # Access first element
print("Last Element:", arr_1d[-1]) # Access last element
print("Elements from index 1 to 3:", arr_1d[1:4]) # Slicing
print("Every second element:", arr_1d[::2]) # Step slicing

2. Accessing Elements in a 2D Array

arr_2d = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])

print("Element at row 1, column 2:", arr_2d[1, 2]) # Accessing element

print("First row:", arr_2d[0, :]) # Accessing entire row
print("First column:", arr_2d[:, 0]) # Accessing entire column

Array Reshaping
Reshaping allows you to change the shape of an array without altering the data.

arr = np.arange(12) # Create an array with 12 elements

reshaped_arr = arr.reshape(3, 4) # Reshape it into a 3x4 matrix
print("Original Array:", arr)
print("Reshaped Array:\n", reshaped_arr)

✅ Tip:
Use -1 to automatically infer one dimension:

auto_reshaped = arr.reshape(4, -1) # NumPy automatically calculates the

missing dimension
print(auto_reshaped.shape) # Output: (4, 3)

5
Array Iteration
1. Iterating Over a 1D Array

arr_1d = np.array([10, 20, 30, 40])

for num in arr_1d:

print(num)

2. Iterating Over a 2D Array

arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Iterating row by row

for row in arr_2d:
print("Row:", row)

# Iterating element by element

for element in np.nditer(arr_2d):
print("Element:", element)

6
Chapter 3: Mathematical Operations in
NumPy
NumPy provides an extensive library of numerical computation to aid efficient computation in
matrices, arrays, and numerical arrays. NumPy is most efficient in these computations than
loops in ordinary Python because NumPy is most efficiently efficient in these computations to
aid efficient computation in matrices, arrays, and numerical arrays.
This chapter covers:
✅ Element-wise operations

✅ Universal functions (ufuncs)

✅ Statistical computations

✅ Linear algebra operations

✅ Advanced mathematical functions

1. Element-wise Arithmetic Operations

NumPy allows fast and efficient element-wise operations on arrays.

import numpy as np

# Creating two NumPy arrays

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Element-wise arithmetic operations

print("Addition:", a + b) # [6 8 10 12]
print("Subtraction:", a - b) # [-4 -4 -4 -4]
print("Multiplication:", a * b) # [5 12 21 32]
print("Division:", a / b) # [0.2 0.333 0.428 0.5]
print("Exponentiation:", a ** 2) # [1 4 9 16] (Square each element)
print("Modulus:", b % a) # [0 0 1 0] (Remainder of division)

📌 Key Takeaways:
Element-wise operations are performed without using explicit loops.
Efficient and faster than regular Python lists.

2. Universal Functions (ufuncs)

NumPy provides vectorized functions that perform element-wise operations efficiently. These
are called Universal Functions (ufuncs).

7
# Creating a sample array
arr = np.array([1, 2, 3, 4, 5])

# Applying universal functions

print("Square root:", np.sqrt(arr))
print("Exponential (e^x):", np.exp(arr))
print("Natural logarithm (ln):", np.log(arr))
print("Base-10 logarithm (log10):", np.log10(arr))
print("Sine:", np.sin(arr))
print("Cosine:", np.cos(arr))
print("Tangent:", np.tan(arr))

🚀 Advantages of ufuncs:

✅ Faster execution using vectorized operations

✅ Automatically handles broadcasting for arrays of different shapes
✅ Supports parallel execution on multi-core processors

3. Statistical Functions
NumPy provides various functions to calculate statistical metrics for datasets.

# Creating a dataset
data = np.array([10, 20, 30, 40, 50])

# Calculating key statistics

print("Mean (Average):", np.mean(data)) # 30.0
print("Median (Middle value):", np.median(data)) # 30.0
print("Standard Deviation:", np.std(data)) # Measures data spread
print("Variance:", np.var(data)) # Square of standard deviation
print("Minimum value:", np.min(data)) # Smallest value
print("Maximum value:", np.max(data)) # Largest value
print("Index of Minimum Value:", np.argmin(data)) # Index of smallest value
print("Index of Maximum Value:", np.argmax(data)) # Index of largest value

✅ Applications in Machine Learning:

Used to normalize data for machine learning models.
Helps in data preprocessing and feature engineering.
Essential for data analysis and insights.

4. Linear Algebra Operations

NumPy provides a powerful module called numpy.linalg for linear algebra operations, which
are essential in machine learning, physics, and engineering.

8
Matrix Creation

# Creating two matrices

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix A:\n", A)
print("Matrix B:\n", B)

Matrix Multiplication (Dot Product)

# Matrix multiplication
dot_product = np.dot(A, B)
print("Dot Product:\n", dot_product)

📌 Alternative:
You can also use the @ operator for matrix multiplication.
print("Dot Product using @ operator:\n", A @ B)

Matrix Transpose
# Swaps rows and columns
print("Transpose of A:\n", A.T)

Determinant of a Matrix
# Computing determinant
print("Determinant of A:", np.linalg.det(A))

Inverse of a Matrix
# Computing inverse
print("Inverse of A:\n", np.linalg.inv(A))

Eigenvalues and Eigenvectors

# Computing eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

9
✅ Why Linear Algebra is Important?
Used in machine learning models, especially Principal Component Analysis (PCA).
Essential for neural networks and deep learning.
Helps in solving systems of linear equations.

5. Advanced Mathematical Functions

Summation & Cumulative Sum

arr = np.array([1, 2, 3, 4])

print("Sum of elements:", np.sum(arr)) # 10

print("Cumulative sum:", np.cumsum(arr)) # [1 3 6 10]

Finding Unique Elements & Counting Occurrences

arr = np.array([1, 2, 3, 1, 2, 3, 4, 4, 4])

unique_elements, counts = np.unique(arr, return_counts=True)

print("Unique elements:", unique_elements)
print("Counts:", counts)

Sorting an Array
arr = np.array([3, 1, 4, 1, 5, 9])

print("Sorted array:", np.sort(arr))

print("Indices of sorted elements:", np.argsort(arr))

Finding Percentiles & Quantiles

data = np.array([10, 20, 30, 40, 50])

print("25th percentile:", np.percentile(data, 25))

print("50th percentile (median):", np.percentile(data, 50))
print("75th percentile:", np.percentile(data, 75))

10
Chapter 4: Advanced NumPy Operations
NumPy provides several advanced operations that optimize numerical computing, including:
✅ Broadcasting for efficient arithmetic operations on different-sized arrays.

✅ Sorting and searching for handling large datasets.

✅ File I/O operations to store and retrieve data efficiently.

These advanced features make NumPy an essential tool for data analysis, machine learning,
and scientific computing.

1. Broadcasting
What is Broadcasting?
Broadcasting allows NumPy to perform element-wise operations on arrays of different shapes
without the need for manual reshaping or looping.
Instead of repeating values, NumPy automatically expands smaller arrays to match the
dimensions of larger ones.

Example: Broadcasting a 1D and 2D Array

import numpy as np

# Creating two arrays of different shapes

a = np.array([1, 2, 3])
b = np.array([[1], [2], [3]])

# Broadcasting: NumPy automatically expands 'b' to match 'a'

result = a + b

print("Array a:\n", a)
print("Array b:\n", b)
print("Broadcasted Addition Result:\n", result)

✅ How does this work?

a has a shape of (3,) → [1, 2, 3]
b has a shape of (3,1) → [[1], [2], [3]]
NumPy broadcasts b to (3,3) and adds element-wise with a.

11
More Examples of Broadcasting
Broadcasting a Scalar to a Matrix

matrix = np.array([[1, 2, 3], [4, 5, 6]])

scalar = 10

result = matrix + scalar # The scalar gets "broadcasted" to each element

print(result)

💡 Real-World Use Case:

Broadcasting is widely used in machine learning for normalization, feature scaling, and
batch operations.

2. Sorting and Searching

Sorting Arrays
Sorting is essential for data analysis, ranking, and organizing datasets.
Sorting a 1D Array

a = np.array([3, 1, 4, 1, 5])

print("Original Array:", a)
print("Sorted Array:", np.sort(a)) # Sorts in ascending order

Sorting a 2D Array

matrix = np.array([[3, 2, 1], [6, 5, 4]])

print("Original Matrix:\n", matrix)

print("Sorted along each row:\n", np.sort(matrix, axis=1)) # Row-wise sorting
print("Sorted along each column:\n", np.sort(matrix, axis=0)) # Column-wise
sorting

Searching for Elements in an Array

NumPy provides fast searching operations to find values in an array.
Finding the Index of the Maximum and Minimum Values

a = np.array([3, 1, 4, 1, 5, 9, 2])

print("Index of Maximum Value:", np.argmax(a)) # Returns index of max value

print("Index of Minimum Value:", np.argmin(a)) # Returns index of min value

12
Finding Elements that Satisfy a Condition

a = np.array([10, 20, 30, 40, 50])

# Find elements greater than 25

filtered_elements = a[a > 25]
print("Elements greater than 25:", filtered_elements)

✅ Why is Searching Important?

Used in data preprocessing to filter values in large datasets.
Helps in feature selection and anomaly detection in machine learning.

3. File I/O Operations with NumPy

Saving and Loading Arrays
NumPy allows us to save and retrieve arrays efficiently in binary or text format.
a) Saving and Loading a NumPy Array (Binary Format)

# Create an array
a = np.array([1, 2, 3, 4, 5])

# Save array to a binary file

np.save('data.npy', a)

# Load the array from the file

b = np.load('data.npy')
print("Loaded Array:", b)

✅ Why Use .npy Format?

Stores data in a compact binary format.
Faster loading and saving compared to CSV or text files.
Retains data type and structure of arrays.

b) Saving and Loading Multiple Arrays

We can save multiple arrays in a single file using np.savez().

# Create multiple arrays

arr1 = np.array([1, 2, 3])
arr2 = np.array([10, 20, 30])

# Save both arrays in a single file

np.savez('multi_data.npz', array1=arr1, array2=arr2)

# Load the arrays from file

loaded_data = np.load('multi_data.npz')
print("Array 1:", loaded_data['array1'])
print("Array 2:", loaded_data['array2'])

13
✅ When to Use .npz Format?
When working with large datasets where multiple arrays need to be stored efficiently.
Reduces disk space usage while maintaining fast I/O speeds.

c) Saving and Loading CSV Files

Many real-world datasets are stored in CSV format. NumPy makes it easy to save and load
CSV files.
Saving a NumPy Array as a CSV File

# Create a sample array

data = np.array([[1, 2, 3], [4, 5, 6]])

# Save to CSV
np.savetxt('data.csv', data, delimiter=',')

Loading Data from a CSV File

# Load CSV file

loaded_data = np.loadtxt('data.csv', delimiter=',')
print("Loaded CSV Data:\n", loaded_data)

✅ Use Case:
CSV files are widely used in data science, machine learning, and finance.

14
Chapter 5: Project 1 - K-Nearest Neighbors
from Scratch
Theory and Concepts
The K-Nearest Neighbors (KNN) algorithm is a simple, yet powerful supervised learning
algorithm used for both classification and regression tasks. The main idea behind KNN is that
similar data points tend to be found near each other in feature space.
1. Calculate Distance: Compute the distance between the test point and all points in the
training dataset (commonly using Euclidean distance).
2. Find Neighbors: Select the K closest points based on the computed distances.
3. Make a Decision:
o Classification: Assign the most common label among the K nearest neighbors.
o Regression: Take the average of the values of the K nearest neighbors.

Implementation
import numpy as np
from collections import Counter

def euclidean_distance(a, b):

return np.sqrt(np.sum((a - b) ** 2))

class KNN:
def __init__(self, k=3):
self.k = k

def fit(self, X_train, y_train):

self.X_train = X_train
self.y_train = y_train

def predict(self, X_test):

predictions = [self._predict(x) for x in X_test]
return np.array(predictions)

def _predict(self, x):

distances = [euclidean_distance(x, x_train) for x_train in
self.X_train]
k_indices = np.argsort(distances)[:self.k]
k_nearest_labels = [self.y_train[i] for i in k_indices]
return Counter(k_nearest_labels).most_common(1)[0][0]

15
Testing and Evaluation
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

data = load_iris()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

knn = KNN(k=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

16
Chapter 6: Project 2 - Principal Component
Analysis (PCA) Implementation
Theory and Background
Principal Component Analysis (PCA) is a dimensionality reduction technique used to
transform high-dimensional data into a lower-dimensional space while retaining as much
variance as possible. It is widely used in data preprocessing, visualization, and noise reduction.
1. Compute the Mean: Center the data by subtracting the mean of each feature.
2. Calculate the Covariance Matrix: Measures how different features vary with one
another.
3. Compute Eigenvalues and Eigenvectors: Solve for principal components.
4. Select Top Principal Components: Choose the top k eigenvectors based on eigenvalues.
5. Transform the Data: Project data onto the new principal component axes.

Implementation
import numpy as np

def pca(X, n_components):

# Center the data
X_meaned = X - np.mean(X, axis=0)

# Compute the covariance matrix

cov_matrix = np.cov(X_meaned, rowvar=False)

# Compute eigenvalues and eigenvectors

eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)

# Sort eigenvectors by eigenvalues in descending order

sorted_indices = np.argsort(eigenvalues)[::-1]
eigenvectors = eigenvectors[:, sorted_indices]
eigenvalues = eigenvalues[sorted_indices]

# Select top n_components eigenvectors

selected_eigenvectors = eigenvectors[:, :n_components]

# Transform data
X_reduced = np.dot(X_meaned, selected_eigenvectors)
return X_reduced

17
Visualization and Application
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Apply PCA
X_pca = pca(X, 2)

# Plot the transformed data

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA on Iris Dataset')
plt.colorbar(label='Target Label')
plt.show()

18
Chapter 7: NumPy Best Practices
Performance Optimization
NumPy provides several techniques to optimize performance when working with large
datasets:
• Vectorization: Use NumPy's built-in functions instead of Python loops.
• Avoiding Copies: Use view() instead of copy() when possible to save memory.
• Memory Layout: Use np.ascontiguousarray() for better performance in C-optimized
functions.
• Parallelization: Utilize NumPy's multithreading features for operations like np.dot().
Memory Management
Efficient memory usage is crucial for handling large datasets:
• Preallocate Arrays: Use np.empty() instead of repeatedly appending to lists.
• Use Appropriate Data Types: Convert float64 arrays to float32 if precision is not critical.
• Sparse Matrices: For large, sparse datasets, consider using scipy.sparse to save memory.
Common Pitfalls and Solutions
• Floating Point Precision Errors: Due to limited precision, avoid direct equality checks for
floating-point numbers.

if np.isclose(a, b): # Instead of a == b

• Unexpected Shape Changes: Ensure correct array shapes when performing operations to
avoid broadcasting issues.
• Indexing Errors: Be cautious when slicing; using arr[1] instead of arr[1, :] may lead to
shape mismatches.

19
Chapter 8: NumPy and the Machine
Learning Ecosystem
NumPy with Pandas
Pandas is built on top of NumPy and provides DataFrame and Series objects for handling
structured data. Some key integrations include:
Creating DataFrames from NumPy Arrays:

import pandas as pd
import numpy as np
data = np.array([[1, 2], [3, 4], [5, 6]])
df = pd.DataFrame(data, columns=['A', 'B'])
print(df)

Converting Pandas DataFrames to NumPy Arrays:

np_array = df.to_numpy()

NumPy with Scikit-learn

Scikit-learn heavily relies on NumPy for numerical computations in machine learning models.
Some common applications include:
Feature Scaling using NumPy:

from sklearn.preprocessing import StandardScaler

X = np.array([[1, 2], [3, 4], [5, 6]])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Using NumPy Arrays in Model Training:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X, np.array([1, 2, 3]))

NumPy with TensorFlow and PyTorch

Deep learning frameworks like TensorFlow and PyTorch also integrate well with NumPy:
Converting NumPy Arrays to Tensors in TensorFlow:

import tensorflow as tf
np_array = np.array([1, 2, 3, 4])
tensor = tf.convert_to_tensor(np_array)

20
Converting NumPy Arrays to Tensors in PyTorch:

import torch
torch_tensor = torch.tensor(np_array)

21
Chapter 9: Appendix
Quick Reference
• Creating an array: np.array([1, 2, 3])
• Generating random numbers: np.random.rand(3, 3)
• Reshaping an array: arr.reshape(2, 3)
• Computing mean: np.mean(arr)
• Matrix multiplication: np.dot(A, B)
Further Resources
• NumPy Documentation: https://numpy.org/doc/
• Machine Learning with Python: https://scikit-learn.org/
• Deep Learning Frameworks: https://www.tensorflow.org/ | https://pytorch.org/
Glossary of Terms
• Array: A multi-dimensional container for numerical data.
• Broadcasting: A technique to perform operations on arrays of different shapes.
• Vectorization: The process of replacing explicit loops with array-based operations.
• Tensor: A multi-dimensional array used in deep learning frameworks.

Conclusion
This handbook is an in-depth guide to NumPy both in fundamentals and advanced aspects,
including NumPy in context to machine learning libraries. Application of NumPy functionality
can speed performance and efficiency in your machine learning pipelines.
Thank you for persevering through NumPy, and hopefully this guide becomes something that
is by your side in your quest through machine learning!

AI Numpy
No ratings yet
AI Numpy
29 pages
NUMPY
No ratings yet
NUMPY
6 pages
Numpy - Introduction
No ratings yet
Numpy - Introduction
118 pages
Python NumPy
No ratings yet
Python NumPy
3 pages
Numpy and Machine Learning in Python
No ratings yet
Numpy and Machine Learning in Python
10 pages
NumPy Essentials for Data Scientists
100% (1)
NumPy Essentials for Data Scientists
27 pages
NumPy Guide for Data Scientists
50% (2)
NumPy Guide for Data Scientists
16 pages
Lab-3 AI
No ratings yet
Lab-3 AI
21 pages
New Chat
No ratings yet
New Chat
30 pages
Comprehensive NumPy Guide for Python
No ratings yet
Comprehensive NumPy Guide for Python
30 pages
Numpy
No ratings yet
Numpy
25 pages
Advanced NumPy Broadcasting and Strides Guide
No ratings yet
Advanced NumPy Broadcasting and Strides Guide
21 pages
Python 5th Sem
No ratings yet
Python 5th Sem
33 pages
Data Science
No ratings yet
Data Science
41 pages
Chapter 3 Numpy Data Analysis
No ratings yet
Chapter 3 Numpy Data Analysis
21 pages
Num Py
No ratings yet
Num Py
35 pages
Crack ML Data Roles With NumPy 15 Days To Mastery 1744611966
No ratings yet
Crack ML Data Roles With NumPy 15 Days To Mastery 1744611966
39 pages
FDS Lab Manual For CSE 1
No ratings yet
FDS Lab Manual For CSE 1
86 pages
NumPy Notes
No ratings yet
NumPy Notes
25 pages
NumPy Presentation
No ratings yet
NumPy Presentation
14 pages
Class1 Py
No ratings yet
Class1 Py
5 pages
Numpy
No ratings yet
Numpy
2 pages
NumPy Presentation
No ratings yet
NumPy Presentation
7 pages
1.the Numpy Module
No ratings yet
1.the Numpy Module
13 pages
AI/ML Python Modules
No ratings yet
AI/ML Python Modules
17 pages
FUNDAMENTALS OF DATA SCIENCE LAB - Jupyter Notebook
No ratings yet
FUNDAMENTALS OF DATA SCIENCE LAB - Jupyter Notebook
48 pages
Ex 1
No ratings yet
Ex 1
6 pages
Scaler Numpy Notes
No ratings yet
Scaler Numpy Notes
88 pages
Introduction To NumPy
No ratings yet
Introduction To NumPy
5 pages
What Is Numpy?: Aim: Study Python Libraries: Numpy, Pandas, Matplotlib, Scikitlearn With Student Dataset
No ratings yet
What Is Numpy?: Aim: Study Python Libraries: Numpy, Pandas, Matplotlib, Scikitlearn With Student Dataset
18 pages
Day 8 NumPy For Data Science Part 1
No ratings yet
Day 8 NumPy For Data Science Part 1
16 pages
Python NumPy Course: Multi-dimensional Data
No ratings yet
Python NumPy Course: Multi-dimensional Data
36 pages
Foundation of Data Science Lab Manual
No ratings yet
Foundation of Data Science Lab Manual
31 pages
Num Py
No ratings yet
Num Py
4 pages
Numpy User
No ratings yet
Numpy User
565 pages
Data Science 1
No ratings yet
Data Science 1
3 pages
Elc Report
No ratings yet
Elc Report
12 pages
Numpy User
No ratings yet
Numpy User
502 pages
Num Py
No ratings yet
Num Py
11 pages
Machine Learning With Python Supervised Learning
No ratings yet
Machine Learning With Python Supervised Learning
114 pages
Python For Data Science Extended Ebook PDF
100% (5)
Python For Data Science Extended Ebook PDF
56 pages
DS 4 1 Unit 2
No ratings yet
DS 4 1 Unit 2
60 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
Numpy - 1
No ratings yet
Numpy - 1
35 pages
Numpy User
No ratings yet
Numpy User
486 pages
NumPy User Guide 2.2.0 Overview
No ratings yet
NumPy User Guide 2.2.0 Overview
659 pages
Handwritten Style NumPy Notes
No ratings yet
Handwritten Style NumPy Notes
3 pages
NumPy Fundamentals
No ratings yet
NumPy Fundamentals
18 pages
Essential Python Libraries
100% (1)
Essential Python Libraries
41 pages
NumPy: Beginner's Guide - Third Edition - Sample Chapter
75% (4)
NumPy: Beginner's Guide - Third Edition - Sample Chapter
54 pages
NumPy - The Absolute Basics For Beginners - NumPy v1.23 Manual
No ratings yet
NumPy - The Absolute Basics For Beginners - NumPy v1.23 Manual
29 pages
Numpy User
No ratings yet
Numpy User
529 pages
Unit 2
No ratings yet
Unit 2
25 pages
Feature Engineering - Introduction
No ratings yet
Feature Engineering - Introduction
74 pages
IML LabManual
No ratings yet
IML LabManual
31 pages
Chapter - 3 NumPy Arrays and Vectorized Computation
No ratings yet
Chapter - 3 NumPy Arrays and Vectorized Computation
51 pages
Python Numpy
No ratings yet
Python Numpy
4 pages
Nusa Penida
No ratings yet
Nusa Penida
1 page
Procedure To Check Correct Pinion To Circle Adjustment For 16H and M Motor Graders PDF
No ratings yet
Procedure To Check Correct Pinion To Circle Adjustment For 16H and M Motor Graders PDF
2 pages
San Isidro Davao Oriental
No ratings yet
San Isidro Davao Oriental
5 pages
03 0620 62 Ms Final Rma 21032025
No ratings yet
03 0620 62 Ms Final Rma 21032025
11 pages
Subject: Computer Networks Comp, IT, GE
No ratings yet
Subject: Computer Networks Comp, IT, GE
2 pages
Failure Rate Analysis of Four Agricultural Tractor Models in Southern Iran
No ratings yet
Failure Rate Analysis of Four Agricultural Tractor Models in Southern Iran
12 pages
Death of A Partner
No ratings yet
Death of A Partner
2 pages
Hydrology by Sandeep Jyani Sir SSC JE GATE ESE IES State PSCs
No ratings yet
Hydrology by Sandeep Jyani Sir SSC JE GATE ESE IES State PSCs
227 pages
Lista de Alguns Judeus Certificados Pela Cil
100% (1)
Lista de Alguns Judeus Certificados Pela Cil
2 pages
2AFRICA - Port Elizabeth BID (Final)
No ratings yet
2AFRICA - Port Elizabeth BID (Final)
16 pages
Australian Student Visa Guide
0% (1)
Australian Student Visa Guide
17 pages
h248 Megaco e
No ratings yet
h248 Megaco e
56 pages
Winery Ergonomics Guide
No ratings yet
Winery Ergonomics Guide
6 pages
Grade 6 Light
No ratings yet
Grade 6 Light
4 pages
Beyond Empathy and Inclusion: The Challenge of Listening in Democratic Deliberation 1st Edition Mary F. Scudder Newest Digital Edition
No ratings yet
Beyond Empathy and Inclusion: The Challenge of Listening in Democratic Deliberation 1st Edition Mary F. Scudder Newest Digital Edition
80 pages
- Chương 4 - Lý thuyết hành vi NSX
No ratings yet
- Chương 4 - Lý thuyết hành vi NSX
83 pages
Learning Penetration Testing With Python Utilize Python Scripting To Execute Effective and Efficient Penetration Tests Christopher Duffy Download
No ratings yet
Learning Penetration Testing With Python Utilize Python Scripting To Execute Effective and Efficient Penetration Tests Christopher Duffy Download
39 pages
Carbon in Leach Circuit
No ratings yet
Carbon in Leach Circuit
2 pages
FTR GS10/09: General Guide Specification
No ratings yet
FTR GS10/09: General Guide Specification
43 pages
Admission, Discharge, Transfer & Referrals
No ratings yet
Admission, Discharge, Transfer & Referrals
31 pages
Planetary Weather Pioneer
100% (3)
Planetary Weather Pioneer
120 pages
M Set B Test Series 24-25
No ratings yet
M Set B Test Series 24-25
5 pages
Debug Diag White Paper
No ratings yet
Debug Diag White Paper
62 pages
Basic TXT Pekana
No ratings yet
Basic TXT Pekana
4 pages
An Overview On Mycotoxin Contamination of Foods in Africa: REVIEW Toxicology
No ratings yet
An Overview On Mycotoxin Contamination of Foods in Africa: REVIEW Toxicology
9 pages
OCR A - Data - Sheet
No ratings yet
OCR A - Data - Sheet
7 pages
Vimmander XR
No ratings yet
Vimmander XR
12 pages
Rats Kingdom
No ratings yet
Rats Kingdom
7 pages
Jashore University of Science and Technology
No ratings yet
Jashore University of Science and Technology
32 pages
1c-1 Solving Linear Simultaneous Equations
No ratings yet
1c-1 Solving Linear Simultaneous Equations
5 pages