L1 norm
L2 norm
Early Stopping
Dropout
Sparse regularizer on columns
Nuclear norm regularization
Mean-constrained regularization
Clustered mean-constrained regularization
Graph-based similarity

L1 norm

Manhattan Distance

L1-norm is also known as least absolute deviations (LAD), least absolute errors (LAE). It is basically minimizing the sum of the absolute differences (S) between the target value and the estimated values.

Regularization - 图1

L2 norm

Euclidean Distance

L2-norm is also known as least squares. It is basically minimizing the sum of the square of the differences (S) between the target value and the estimated values:

Regularization - 图2

Early Stopping

Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit, and stop the algorithm then.

Dropout

Is a regularization technique for reducing overfitting in neural networks by preventing complex co-adaptations on training data. It is a very efficient way of performing model averaging with neural networks. The term “dropout” refers to dropping out units (both hidden and visible) in a neural network.

Sparse regularizer on columns

This regularizer defines an L2 norm on each column and an L1 norm over all columns. It can be solved by proximal methods.

Regularization - 图3

Nuclear norm regularization

Regularization - 图4

Where Regularization - 图5 is the eigenvalues in the singular value decomposition of Regularization - 图6 .

Mean-constrained regularization

This regularizer constrains the functions learned for each task to be similar to the overall average of the functions across all tasks. This is useful for expressing prior information that each task is expected to share similarities with each other task. An example is predicting blood iron levels measured at different times of the day, where each task represents a different person.

Regularization - 图7

Clustered mean-constrained regularization

This regularizer is similar to the mean- constrained regularizer, but instead enforces similarity between tasks within the same cluster. This can capture more complex prior information. This technique has been used to predict Netflix recommendations.

Regularization - 图8

Where Regularization - 图9 is a cluster of tasks.

Graph-based similarity

More general than above, similarity between tasks can be defined by a function. The regularizer encourages the model to learn similar functions for similar tasks.

Regularization - 图10

for a given symmetric similarity matrix Regularization - 图11 .