Hyperparameter (machine learning)

In machine learning, a hyperparameter is a parameter, such as the learning rate or choice of optimizer, which specifies details of the learning process, hence the name hyperparameter. This is in contrast to parameters which determine the model itself. An additional contrast is that hyperparameters typically cannot be inferred while fitting the machine to the training set because the objective function is typically non-differentiable with respect to them. As a result, gradient based optimization methods cannot be applied directly.

Hyperparameters can be classified as model hyperparameters or algorithm hyperparameters. An example of a model hyperparameter is the topology and size of a neural network. Examples of algorithm hyperparameters are learning rate and batch size as well as mini-batch size. Batch size can refer to the full data sample where mini-batch size would be a smaller sample set.

Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. For instance, LASSO is an algorithm that adds a regularization hyperparameter to ordinary least squares regression, which has to be set before estimating the parameters through the training algorithm.[1]

  1. ^ Yang, Li; Shami, Abdallah (2020-11-20). "On hyperparameter optimization of machine learning algorithms: Theory and practice". Neurocomputing. 415: 295–316. arXiv:2007.15745. doi:10.1016/j.neucom.2020.07.061. ISSN 0925-2312. S2CID 220919678.