数学 - 贝叶斯公式、全概率公式以及似然&概率 - 《零碎的基础知识》

贝叶斯公式
全概率公式
似然与概率

贝叶斯公式

参考：

贝叶斯公式又被称为贝叶斯定理、贝叶斯规则是概率统计中的应用所观察到的现象对有关概率分布的主观判断（即先验概率）进行修正的标准方法。所谓贝叶斯公式，是指当分析样本大到接近总体数时，样本中事件发生的概率将接近于总体中事件发生的概率。
通常，事件A在事件B(发生)的条件下的概率，与事件B在事件A的条件下的概率是不一样的；然而，这两者是有确定的关系,贝叶斯法则就是这种关系的陈述。
贝叶斯法则是关于随机事件A和B的条件概率和边缘概率的。
对于不同的事件贝叶斯公式、全概率公式以及似然&概率 - 图1 和贝叶斯公式、全概率公式以及似然&概率 - 图2 ，且贝叶斯公式、全概率公式以及似然&概率 - 图3 时，在贝叶斯公式、全概率公式以及似然&概率 - 图4 发生的情况下贝叶斯公式、全概率公式以及似然&概率 - 图5 发生的可能性可以表达如下：
贝叶斯公式、全概率公式以及似然&概率 - 图6

和分别是在没有限制条件时，观察和事件的概率(the probabilities of observing A and B)。他们也被称为边缘概率或先验概率(marginal probability or prior probability)。之所以称为”先验”是因为不考虑任何其他方面的因素。
这里的也称作标准化常量(normalized constant)。
是已知发生后的条件概率(conditional probability)，也由于得自的取值而被称作的后验概率(posterior probability)。
是已知发生后的条件概率(conditional probability)，也由于得自的取值而被称作的后验概率(posterior probability)。

所以，贝叶斯法则可表述为：贝叶斯公式、全概率公式以及似然&概率 - 图22 。
也就是说，后验概率与先验概率和似然函数的乘积成正比。
另外，比例贝叶斯公式、全概率公式以及似然&概率 - 图23 也有时被称作标准似然度(standardised likelihood)。
因此贝叶斯法则亦可表述为：贝叶斯公式、全概率公式以及似然&概率 - 图24 。

全概率公式

参考：

https://baike.baidu.com/item/%E5%85%A8%E6%A6%82%E7%8E%87%E5%85%AC%E5%BC%8F/9980676?fr=aladdin

全概率公式为概率论中的重要公式，它将对一个复杂事件 贝叶斯公式、全概率公式以及似然&概率 - 图25 的概率求解问题转化为了在不同情况下发生的简单事件的概率的求和问题。
如果贝叶斯公式、全概率公式以及似然&概率 - 图26 个事件贝叶斯公式、全概率公式以及似然&概率 - 图27 构成一个完备事件组，即它们两两互不相容，其和为全集，并且任何一个事件发生的概率都大于0，即：贝叶斯公式、全概率公式以及似然&概率 - 图28 。
则对任一事件贝叶斯公式、全概率公式以及似然&概率 - 图29 有贝叶斯公式、全概率公式以及似然&概率 - 图30 。
概率论的一个重要内容是研究怎样从一些较简单事件概率的计算来推算较复杂事件的概率，全概率公式和贝叶斯公式正好起到了这样的作用。
对一个较复杂的事件贝叶斯公式、全概率公式以及似然&概率 - 图31 ，如果能找到一伴随贝叶斯公式、全概率公式以及似然&概率 - 图32 发生的完备事件组贝叶斯公式、全概率公式以及似然&概率 - 图33 ，而计算各个贝叶斯公式、全概率公式以及似然&概率 - 图34 的概率与条件概率贝叶斯公式、全概率公式以及似然&概率 - 图35 相对又要容易些，为了计算与事件贝叶斯公式、全概率公式以及似然&概率 - 图36 有关的概率，可能需要使用全概率公式和贝叶斯公式。

似然与概率

参考：

在频率推论中，似然函数（常常简称为似然）是一个在给定了数据以及模型中关于参数的函数。在非正式情况下，“似然”通常被用作“概率”的同义词。
在数理统计中，两个术语则有不同的意思。

“概率”描述了给定模型参数后，描述结果的合理性，而不涉及任何观察到的数据。抛一枚均匀的硬币，拋20次，问15次拋得正面的可能性有多大？这里的可能性就是”概率”，均匀的硬币就是给定参数，”拋20次15次正面”是观测值。这里实际上也就是在求概率。
“似然”则描述了给定了特定观测值后，描述模型参数是否合理。拋一枚硬币，拋20次，结果15次正面向上，问其为均匀的可能性？这里的可能性就是”似然”，”拋20次15次正面”为观测值为已知，参数并不知道，求最大化时的值。

The answer depends on whether you are dealing with discrete or continuous random variables. So, I will split my answer accordingly. I will assume that you want some technical details and not necessarily an explanation in plain English.
Discrete Random Variables
Suppose that you have a stochastic process that takes discrete values (e.g., outcomes of tossing a coin 10 times, number of customers who arrive at a store in 10 minutes etc).
In such cases, we can calculate the probability of observing a particular set of outcomes by making suitable assumptions about the underlying stochastic process (e.g., probability of coin landing heads is 贝叶斯公式、全概率公式以及似然&概率 - 图44 and that coin tosses are independent).
Denote the observed outcomes by 贝叶斯公式、全概率公式以及似然&概率 - 图45 and the set of parameters that describe the stochastic process as 贝叶斯公式、全概率公式以及似然&概率 - 图46 . Thus, when we speak of probability we want to calculate 贝叶斯公式、全概率公式以及似然&概率 - 图47 .
In other words, given specific values for 贝叶斯公式、全概率公式以及似然&概率 - 图48 , 贝叶斯公式、全概率公式以及似然&概率 - 图49 is the probability that we would observe the outcomes represented by 贝叶斯公式、全概率公式以及似然&概率 - 图50 .
However, when we model a real life stochastic process, we often do not know 贝叶斯公式、全概率公式以及似然&概率 - 图51 .
We simply observe 贝叶斯公式、全概率公式以及似然&概率 - 图52 and the goal then is to arrive at an estimate for 贝叶斯公式、全概率公式以及似然&概率 - 图53 that would be a plausible choice given the observed outcomes 贝叶斯公式、全概率公式以及似然&概率 - 图54 .
We know that given a value of 贝叶斯公式、全概率公式以及似然&概率 - 图55 the probability of observing 贝叶斯公式、全概率公式以及似然&概率 - 图56 is 贝叶斯公式、全概率公式以及似然&概率 - 图57 .
Thus, a ‘natural’ estimation process is to choose that value of 贝叶斯公式、全概率公式以及似然&概率 - 图58 that would maximize the probability that we would actually observe 贝叶斯公式、全概率公式以及似然&概率 - 图59 .
In other words, we find the parameter values 贝叶斯公式、全概率公式以及似然&概率 - 图60 that maximize the following function: 贝叶斯公式、全概率公式以及似然&概率 - 图61 .
贝叶斯公式、全概率公式以及似然&概率 - 图62 is called the likelihood function.
Notice that by definition the likelihood function is conditioned on the observed 贝叶斯公式、全概率公式以及似然&概率 - 图63 and that it is a function of the unknown parameters 贝叶斯公式、全概率公式以及似然&概率 - 图64 .
Continuous Random Variables
In the continuous case the situation is similar with one important difference.
We can no longer talk about the probability that we observed 贝叶斯公式、全概率公式以及似然&概率 - 图65 given 贝叶斯公式、全概率公式以及似然&概率 - 图66 because in the continuous case 贝叶斯公式、全概率公式以及似然&概率 - 图67 .
Without getting into technicalities, the basic idea is as follows:
Denote the probability density function (PDF) associated with the outcomes 贝叶斯公式、全概率公式以及似然&概率 - 图68 as: 贝叶斯公式、全概率公式以及似然&概率 - 图69 .
Thus, in the continuous case we estimate 贝叶斯公式、全概率公式以及似然&概率 - 图70 given observed outcomes 贝叶斯公式、全概率公式以及似然&概率 - 图71 by maximizing the following function: 贝叶斯公式、全概率公式以及似然&概率 - 图72 .
In this situation, we cannot technically assert that we are finding the parameter value that maximizes the probability that we observe 贝叶斯公式、全概率公式以及似然&概率 - 图73 as we maximize the PDF associated with the observed outcomes 贝叶斯公式、全概率公式以及似然&概率 - 图74 .