贝叶斯公式
参考:
- https://baike.baidu.com/item/%E8%B4%9D%E5%8F%B6%E6%96%AF%E5%85%AC%E5%BC%8F
- https://en.wikipedia.org/wiki/Bayes%27_theorem
- https://en.wikipedia.org/wiki/Normalizing_constant
贝叶斯公式又被称为贝叶斯定理、贝叶斯规则是概率统计中的应用所观察到的现象对有关概率分布的主观判断(即先验概率)进行修正的标准方法。所谓贝叶斯公式,是指当分析样本大到接近总体数时,样本中事件发生的概率将接近于总体中事件发生的概率。
通常,事件A在事件B(发生)的条件下的概率,与事件B在事件A的条件下的概率是不一样的;然而,这两者是有确定的关系,贝叶斯法则就是这种关系的陈述。
贝叶斯法则是关于随机事件A和B的条件概率和边缘概率的。
对于不同的事件和,且时,在发生的情况下发生的可能性可以表达如下:
- 和分别是在没有限制条件时,观察和事件的概率(the probabilities of observing A and B)。他们也被称为边缘概率或先验概率(marginal probability or prior probability)。之所以称为”先验”是因为不考虑任何其他方面的因素。
- 这里的也称作标准化常量(normalized constant)。
- 是已知发生后的条件概率(conditional probability),也由于得自的取值而被称作的后验概率(posterior probability)。
- 是已知发生后的条件概率(conditional probability),也由于得自的取值而被称作的后验概率(posterior probability)。
所以,贝叶斯法则可表述为:。
也就是说,后验概率与先验概率和似然函数的乘积成正比。
另外,比例也有时被称作标准似然度(standardised likelihood)。
因此贝叶斯法则亦可表述为:。
全概率公式
参考:
全概率公式为概率论中的重要公式,它将对一个复杂事件的概率求解问题转化为了在不同情况下发生的简单事件的概率的求和问题。
如果个事件构成一个完备事件组,即它们两两互不相容,其和为全集,并且任何一个事件发生的概率都大于0,即:。
则对任一事件有。
概率论的一个重要内容是研究怎样从一些较简单事件概率的计算来推算较复杂事件的概率,全概率公式和贝叶斯公式正好起到了这样的作用。
对一个较复杂的事件,如果能找到一伴随发生的完备事件组,而计算各个的概率与条件概率相对又要容易些,为了计算与事件有关的概率,可能需要使用全概率公式和贝叶斯公式。
似然与概率
参考:
- https://yangfangs.github.io/2018/04/06/the-different-of-likelihood-and-probability/
- https://stats.stackexchange.com/a/2647
在频率推论中,似然函数(常常简称为似然)是一个在给定了数据以及模型中关于参数的函数。在非正式情况下,“似然”通常被用作“概率”的同义词。
在数理统计中,两个术语则有不同的意思。
- “概率”描述了给定模型参数后,描述结果的合理性,而不涉及任何观察到的数据。抛一枚均匀的硬币,拋20次,问15次拋得正面的可能性有多大?这里的可能性就是”概率”,均匀的硬币就是给定参数,”拋20次15次正面”是观测值。这里实际上也就是在求概率。
- “似然”则描述了给定了特定观测值后,描述模型参数是否合理。拋一枚硬币,拋20次,结果15次正面向上,问其为均匀的可能性? 这里的可能性就是”似然”,”拋20次15次正面”为观测值为已知,参数并不知道,求最大化时的值。
The answer depends on whether you are dealing with discrete or continuous random variables. So, I will split my answer accordingly. I will assume that you want some technical details and not necessarily an explanation in plain English.
Discrete Random Variables
Suppose that you have a stochastic process that takes discrete values (e.g., outcomes of tossing a coin 10 times, number of customers who arrive at a store in 10 minutes etc).
In such cases, we can calculate the probability of observing a particular set of outcomes by making suitable assumptions about the underlying stochastic process (e.g., probability of coin landing heads is and that coin tosses are independent).
Denote the observed outcomes by and the set of parameters that describe the stochastic process as . Thus, when we speak of probability we want to calculate .
In other words, given specific values for , is the probability that we would observe the outcomes represented by .
However, when we model a real life stochastic process, we often do not know .
We simply observe and the goal then is to arrive at an estimate for that would be a plausible choice given the observed outcomes .
We know that given a value of the probability of observing is .
Thus, a ‘natural’ estimation process is to choose that value of that would maximize the probability that we would actually observe .
In other words, we find the parameter values that maximize the following function: .
is called the likelihood function.
Notice that by definition the likelihood function is conditioned on the observed and that it is a function of the unknown parameters .
Continuous Random Variables
In the continuous case the situation is similar with one important difference.
We can no longer talk about the probability that we observed given because in the continuous case .
Without getting into technicalities, the basic idea is as follows:
Denote the probability density function (PDF) associated with the outcomes as: .
Thus, in the continuous case we estimate given observed outcomes by maximizing the following function: .
In this situation, we cannot technically assert that we are finding the parameter value that maximizes the probability that we observe as we maximize the PDF associated with the observed outcomes .