Parameters and Hyperparameters notes
Parameters and Hyperparameters notes
Hyper parameter is a configuration variable that is defined manually before the training of the
model. This value cannot be estimated from data. It determines the value of model parameters that
a learning algorithm seeks to learn. The prefix 'hyper means that these are 'top-level' parameters
that control the learning process and have a significant impact on the performance of the model
being trained. Hyperparameters are said to be external to the model because the model cannot
change its values during learning/training.
During the learning process, hyperparameters are used by the learning algorithm but they are not part
of the resulting model. At the end of the learning process, we have the trained model parameters.
Some key attributes of hyper parameters are:
They are often used in processes to help estimate model parameters.
They can often be set using heuristics.
They are often tuned for a given predictive modelling problem.
Some examples of hyper parameters are:
Train-test split ratio
Learning rate in optimization algorithms (e.g., gradient descent)
Choice of activation function in a neural network layer (e.g., Sigmoid, ReLU, Tanh)
The choice of cost or loss function.
Number of activation units in each layer.
Number of clusters in a clustering task.
Kernel or filter size in convolutional layers.
Pooling size
Batch size
Parameters
A model parameter is a variable whose value is calculated from historical data sets. Model
parameters are not manually set, rather they are calculated using the training data.
As compared to hyper parameters, parameters are internal to the model as they are learned from the
data during training.
We start model training with parameters being initialized to some values (random values or set
zeros).
Some examples of parameters are:
Coefficients or weights of linear and logistic regression models
Weights and biases of a Neural Network
Cluster centroids in clustering
Hyperparameter Tuning
Hyperparameter tuning is the process of determining the right combination of hyperparameters
that maximizes the model's performance. We typically do this by running multiple times in a single
training process.
Hyperparameter Tuning Methods
We can do hyper parameter tuning through the following methods:
Random Search
Grid Search
Bayesian Optimization
Tree-structured Parzen Estimators