Maximum Likelihood Estimation (MLE)
Cross-Entropy
Logistic
Quadratic
0-1 Loss
Hinge Loss
Exponential
Hellinger Distance
Kullback-Leibler Divengence
Itakura–Saito distance

Maximum Likelihood Estimation (MLE)

Many cost functions are the result of applying Maximum Likelihood. For instance, the Least Squares cost function can be obtained via Maximum Likelihood. Cross-Entropy is another example.

The likelihood of a parameter value (or vector of parameter values), θ, given outcomes x, is equal to the probability (density) assumed for those observed outcomes given those parameter values, that is

Cost/Lost(Min) or Objective(Max) Functions - 图1

The natural logarithm of the likelihood function, called the log-likelihood, is more convenient to work with. Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation and related techniques.

In general, for a fixed set of data and underlying statistical model, the method of maximum likelihood selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the “agreement” of the selected model with the observed data, and for discrete random variables it indeed maximizes the probability of the observed data under the resulting distribution. Maximum-likelihood estimation gives a unified approach to estimation, which is well-defined in the case of the normal distribution and many other problems.

Cost/Lost(Min) or Objective(Max) Functions - 图2

Cost/Lost(Min) or Objective(Max) Functions - 图3

Cost/Lost(Min) or Objective(Max) Functions - 图4

Cost/Lost(Min) or Objective(Max) Functions - 图5

Cost/Lost(Min) or Objective(Max) Functions - 图6

Cross-Entropy

Cross entropy can be used to define the loss function in machine learning and optimization. The true probability pi is the true label, and the given distribution qi is the predicted value of the current model.

Cost/Lost(Min) or Objective(Max) Functions - 图7

Cost/Lost(Min) or Objective(Max) Functions - 图8

Cost/Lost(Min) or Objective(Max) Functions - 图9

Cost/Lost(Min) or Objective(Max) Functions - 图10

Cross-entropy error function and logistic regression.

Logistic

The logistic loss function is defined as:

Cost/Lost(Min) or Objective(Max) Functions - 图11

Quadratic

The use of a quadratic loss function is common, for example when using least squares techniques. It is often more mathematically tractable than other loss function because of the properties of variances, as well as being symmetric: an error above the target causes the same loss as the same magnitude of error below the target. If the target is Cost/Lost(Min) or Objective(Max) Functions - 图12 , then a quadratic loss function is:

Cost/Lost(Min) or Objective(Max) Functions - 图13

0-1 Loss

In statistics and decision theory, a frequently used loss function is 0-1 loss function:

Cost/Lost(Min) or Objective(Max) Functions - 图14

Hinge Loss

The hinge loss is a loss function used for training classifiers. For an intended output Cost/Lost(Min) or Objective(Max) Functions - 图15 and a classifier score y, the hinge loss of the prediction y is defined as:

Cost/Lost(Min) or Objective(Max) Functions - 图16

Exponential

Cost/Lost(Min) or Objective(Max) Functions - 图17

Hellinger Distance

It is used to quantify the similarity between two probability distributions. It is a type of f-divergence.

Cost/Lost(Min) or Objective(Max) Functions - 图18

To define the Hellinger distance in terms of measure theory, let P and Q denote two probability measures that are absolutely continuous with respect to a third probability measure λ. The square of the Hellinger distance between P and Q is defined as the quantity.

Kullback-Leibler Divengence

Is a measure of how one probability distribution diverges from a second expected probability distribution. Applications include characterizing the relative (Shannon) entropy in information systems, randomness in continuous time-series, and information gain when comparing statistical models of inference.

Discrete: Cost/Lost(Min) or Objective(Max) Functions - 图19

Continous:

Itakura–Saito distance

is a measure of the difference between an original spectrum P(ω) and an approximation P^(ω) of that spectrum. Although it is not a perceptual measure, it is intended to reflect perceptual (dis)similarity.

Cost/Lost(Min) or Objective(Max) Functions - 图21

[1] https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications [2] https://en.wikipedia.org/wiki/Loss_functions_for_classification