Lecture: Machine Learning with Scikit-learn & TensorFlow
https://www.geeksforgeeks.org/deep-learning/differences-between-scikit-learn-keras-
and-pytorch/
1. Scikit-learn
Definition:
Scikit-learn is an open-source Python library for machine learning, built on
NumPy, SciPy, and Matplotlib. It provides tools for supervised &
unsupervised learning, preprocessing, and model selection.
Example:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
Why & Where Used:
- Why: Quick implementation of machine learning algorithms without
coding from scratch.
- Where: Academic research, industry ML pipelines, prototyping models.
Advantages:
- Easy to use & consistent API
- Large collection of ML algorithms
- Good documentation & active community
Disadvantages:
- Not optimized for deep learning
- Limited GPU acceleration
- Less suited for extremely large datasets
2. Data Representation in Scikit-learn
Definition:
Scikit-learn represents data as:
- Features (X): 2D array `(n_samples, n_features)`
- Labels (y): 1D array `(n_samples,)`.
Example:
print(X.shape) # (150, 4)
print(y.shape) # (150,)
Why & Where Used:
- Why: Ensures models understand the input format.
- Where: Any ML task in Scikit-learn.
Advantages:
- Standardized structure makes it compatible across algorithms.
Disadvantages:
- Cannot directly handle raw images, audio, or text (needs preprocessing).
3. TensorFlow Basics
Definition:
TensorFlow is an open-source framework by Google for numerical
computation and large-scale machine learning.
Example:
import tensorflow as tf
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 6], [7, 8]])
print(tf.matmul(a, b))
Why & Where Used:
- Why: Efficient for deep learning with GPU/TPU support.
- Where: Image recognition, NLP, time series forecasting.
Advantages:
- Scales from small devices to cloud clusters
- GPU/TPU acceleration
- Large ecosystem (TF Hub, TF Lite)
Disadvantages:
- Steeper learning curve
- More verbose than some alternatives like PyTorch
5. Keras APIs
Definition:
Keras is a high-level API for building and training deep learning models in
TensorFlow.
Example:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
Why & Where Used:
- Why: Makes deep learning easier and faster to implement.
- Where: Academic projects, production-ready ML services.
Advantages:
- Simple and readable code
- Large community support
- Integrated with TensorFlow
Disadvantages:
- Less low-level control compared to raw TensorFlow
- May be slower for very large models with extreme customization
6. TensorBoard Visualization
Definition:
TensorBoard is TensorFlow’s visualization tool for tracking metrics, model
graphs, and training progress.
Example:
from tensorflow.keras.callbacks import TensorBoard
import datetime
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-
%H%M%S")
tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1)
Why & Where Used:
- Why: Monitor training, debug models, and compare runs.
- Where: Research labs, ML production pipelines.
Advantages:
- Real-time visualization
- Tracks multiple experiments
- Works with cloud platforms
Disadvantages:
- Requires storage for logs
- Initial setup might confuse beginners