本文是斯坦福大学CS229机器学习课程的基础材料,原始文件下载

原文作者:Arian Maleki , Tom Do

翻译:石振宇

审核和修改制作:黄海广

备注:请关注github的更新。

CS229 机器学习课程复习材料-概率论

概率论复习和参考

概率论是对不确定性的研究。通过这门课,我们将依靠概率论中的概念来推导机器学习算法。这篇笔记试图涵盖适用于CS229的概率论基础。概率论的数学理论非常复杂,并且涉及到“分析”的一个分支:测度论。在这篇笔记中,我们提供了概率的一些基本处理方法,但是不会涉及到这些更复杂的细节。

1. 概率的基本要素

为了定义集合上的概率,我们需要一些基本元素,

  • 样本空间$\Omega $:随机实验的所有结果的集合。在这里,每个结果 $w \in \Omega $ 可以被认为是实验结束时现实世界状态的完整描述。

  • 事件集(事件空间)2.CS229-Prob - 图1:元素 2.CS229-Prob - 图2 的集合(称为事件)是 $\Omega $ 的子集(即每个 2.CS229-Prob - 图3 是一个实验可能结果的集合)。
    备注:2.CS229-Prob - 图4需要满足以下三个条件:
    (1) 2.CS229-Prob - 图5
    (2) 2.CS229-Prob - 图6
    (3) 2.CS229-Prob - 图7

  • 概率度量2.CS229-Prob - 图8:函数2.CS229-Prob - 图9是一个$ \mathcal{F} \rightarrow \mathbb{R}$的映射,满足以下性质:

  • 对于每个 2.CS229-Prob - 图102.CS229-Prob - 图11%20%5Cgeq%200#card=math&code=P%28A%29%20%5Cgeq%200),

  • 2.CS229-Prob - 图12%20%3D%201#card=math&code=P%28%5COmega%29%20%3D%201)

  • 如果2.CS229-Prob - 图13 是互不相交的事件 (即 当$ i \neq j2.CS229-Prob - 图14A{i} \cap A{j}=\emptyset$ ), 那么:

2.CS229-Prob - 图15%3D%5Csum%7Bi%7D%20P%5Cleft(A%7Bi%7D%5Cright)%0A#card=math&code=P%5Cleft%28%5Ccup%7Bi%7D%20A%7Bi%7D%5Cright%29%3D%5Csum%7Bi%7D%20P%5Cleft%28A%7Bi%7D%5Cright%29%0A)

以上三条性质被称为概率公理

举例

考虑投掷六面骰子的事件。样本空间为2.CS229-Prob - 图16。最简单的事件空间是平凡事件空间2.CS229-Prob - 图17.另一个事件空间是2.CS229-Prob - 图18的所有子集的集合。对于第一个事件空间,满足上述要求的唯一概率度量由2.CS229-Prob - 图19%20%3D%200#card=math&code=P%28%5Cemptyset%29%20%3D%200),2.CS229-Prob - 图20%3D%201#card=math&code=p%28%5COmega%29%3D%201)给出。对于第二个事件空间,一个有效的概率度量是将事件空间中每个事件的概率分配为2.CS229-Prob - 图21,这里2.CS229-Prob - 图22 是这个事件集合中元素的数量;例如2.CS229-Prob - 图23%20%3D4%2F6#card=math&code=P%28%5C%7B1%2C2%2C3%2C4%5C%7D%29%20%3D4%2F6),2.CS229-Prob - 图24%20%3D3%2F6#card=math&code=P%28%5C%7B1%2C2%2C3%5C%7D%29%20%3D3%2F6)。

性质:

  • 如果2.CS229-Prob - 图25,则:$ P(A) \leq P(B)$
  • 2.CS229-Prob - 图26%20%5Cleq%20min(P(A)%2CP(B)%20)#card=math&code=P%28A%20%5Ccap%20B%29%20%5Cleq%20min%28P%28A%29%2CP%28B%29%20%29)
  • (布尔不等式):2.CS229-Prob - 图27%20%5Cleq%20P(A)%2BP(B)#card=math&code=P%28A%20%5Ccup%20B%29%20%5Cleq%20P%28A%29%2BP%28B%29)
  • 2.CS229-Prob - 图28%20%3D1-P(A)#card=math&code=P%28%5COmega%20%7CA%20%29%20%3D1-P%28A%29)
  • (全概率定律):如果2.CS229-Prob - 图29是一些互不相交的事件并且它们的并集是2.CS229-Prob - 图30,那么它们的概率之和是1

1.1 条件概率和独立性

假设2.CS229-Prob - 图31是一个概率非0的事件,我们定义在给定2.CS229-Prob - 图32的条件下2.CS229-Prob - 图33 的条件概率为:

2.CS229-Prob - 图34%20%5Ctriangleq%20%5Cfrac%7BP(A%20%5Ccap%20B)%7D%7BP(B)%7D%0A#card=math&code=P%28A%20%7C%20B%29%20%5Ctriangleq%20%5Cfrac%7BP%28A%20%5Ccap%20B%29%7D%7BP%28B%29%7D%0A)

换句话说,2.CS229-Prob - 图35)是度量已经观测到2.CS229-Prob - 图36事件发生的情况下2.CS229-Prob - 图37事件发生的概率,两个事件被称为独立事件当且仅当2.CS229-Prob - 图38%20%3D%20P(A)P(B)#card=math&code=P%28A%20%5Ccap%20B%29%20%3D%20P%28A%29P%28B%29)(或等价地,2.CS229-Prob - 图39%20%3D%20P(A)#card=math&code=P%28A%7CB%29%20%3D%20P%28A%29))。因此,独立性相当于是说观察到事件2.CS229-Prob - 图40对于事件2.CS229-Prob - 图41的概率没有任何影响。

2. 随机变量

考虑一个实验,我们翻转10枚硬币,我们想知道正面硬币的数量。这里,样本空间2.CS229-Prob - 图42的元素是长度为10的序列。例如,我们可能有2.CS229-Prob - 图43。然而,在实践中,我们通常不关心获得任何特定正反序列的概率。相反,我们通常关心结果的实值函数,比如我们10次投掷中出现的正面数,或者最长的背面长度。在某些技术条件下,这些函数被称为随机变量

更正式地说,随机变量2.CS229-Prob - 图44是一个的2.CS229-Prob - 图45函数。通常,我们将使用大写字母2.CS229-Prob - 图46#card=math&code=X%28%5Comega%29)或更简单的2.CS229-Prob - 图47(其中隐含对随机结果2.CS229-Prob - 图48的依赖)来表示随机变量。我们将使用小写字母2.CS229-Prob - 图49来表示随机变量的值。

举例:
在我们上面的实验中,假设2.CS229-Prob - 图50#card=math&code=X%28%5Comega%29)是在投掷序列2.CS229-Prob - 图51中出现的正面的数量。假设投掷的硬币只有10枚,那么2.CS229-Prob - 图52#card=math&code=X%28%5Comega%29)只能取有限数量的值,因此它被称为离散随机变量。这里,与随机变量2.CS229-Prob - 图53相关联的集合取某个特定值2.CS229-Prob - 图54的概率为:

2.CS229-Prob - 图55%20%3A%3DP(%5C%7B%5Comega%20%3A%20X(%5Comega)%20%3Dk%5C%7D)%0A#card=math&code=P%28X%3Dk%29%20%3A%3DP%28%5C%7B%5Comega%20%3A%20X%28%5Comega%29%20%3Dk%5C%7D%29%0A)

举例:
假设2.CS229-Prob - 图56#card=math&code=X%28%5Comega%29)是一个随机变量,表示放射性粒子衰变所需的时间。在这种情况下,2.CS229-Prob - 图57#card=math&code=X%28%5Comega%29)具有无限多的可能值,因此它被称为连续随机变量。我们将2.CS229-Prob - 图58在两个实常数2.CS229-Prob - 图592.CS229-Prob - 图60之间取值的概率(其中2.CS229-Prob - 图61)表示为:

2.CS229-Prob - 图62%20%3A%3DP(%5C%7B%5Comega%20%3A%20a%20%5Cleq%20X(%5Comega)%20%5Cleq%20b%5C%7D)%0A#card=math&code=P%28a%20%5Cleq%20X%20%5Cleq%20b%29%20%3A%3DP%28%5C%7B%5Comega%20%3A%20a%20%5Cleq%20X%28%5Comega%29%20%5Cleq%20b%5C%7D%29%0A)

2.1 累积分布函数

为了指定处理随机变量时使用的概率度量,通常可以方便地指定替代函数(CDFPDFPMF),在本节和接下来的两节中,我们将依次描述这些类型的函数。

累积分布函数(CDF)是函数2.CS229-Prob - 图63,它将概率度量指定为:

2.CS229-Prob - 图64%20%5Ctriangleq%20P(X%20%5Cleq%20x)%0A#card=math&code=F_%7BX%7D%28x%29%20%5Ctriangleq%20P%28X%20%5Cleq%20x%29%0A)

通过使用这个函数,我们可以计算任意事件发生的概率。图1显示了一个样本CDF函数。

2.CS229-Prob - 图65

  • 2.CS229-Prob - 图66%5Cleq%201#card=math&code=0%20%5Cleq%20F_%7BX%7D%28x%29%5Cleq%201)
  • 2.CS229-Prob - 图67%3D0#card=math&code=%5Clim%20%7Bx%20%5Crightarrow-%5Cinfty%7D%20F%7BX%7D%28x%29%3D0)
  • 2.CS229-Prob - 图68%3D1#card=math&code=%5Clim%20%7Bx%20%5Crightarrow%5Cinfty%7D%20F%7BX%7D%28x%29%3D1)
  • 2.CS229-Prob - 图69%5Cleq%20F%7BX%7D(y)#card=math&code=x%20%5Cleq%20y%20%5CLongrightarrow%20%20F%7BX%7D%28x%29%5Cleq%20F_%7BX%7D%28y%29)

2.2 概率质量函数

当随机变量2.CS229-Prob - 图70取有限种可能值(即,2.CS229-Prob - 图71是离散随机变量)时,表示与随机变量相关联的概率度量的更简单的方法是直接指定随机变量可以假设的每个值的概率。特别地,概率质量函数(PMF)是函数 2.CS229-Prob - 图72,这样:

2.CS229-Prob - 图73%20%5Ctriangleq%20P(X%3Dx)%0A#card=math&code=p_%7BX%7D%28x%29%20%5Ctriangleq%20P%28X%3Dx%29%0A)

在离散随机变量的情况下,我们使用符号2.CS229-Prob - 图74#card=math&code=Val%28X%29)表示随机变量2.CS229-Prob - 图75可能假设的一组可能值。例如,如果2.CS229-Prob - 图76#card=math&code=X%28%5Comega%29)是一个随机变量,表示十次投掷硬币中的正面数,那么2.CS229-Prob - 图77%20%3D%5C%7B0%EF%BC%8C1%EF%BC%8C2%EF%BC%8C…%EF%BC%8C10%5C%7D#card=math&code=Val%28X%29%20%3D%5C%7B0%EF%BC%8C1%EF%BC%8C2%EF%BC%8C…%EF%BC%8C10%5C%7D)。

性质:

  • 2.CS229-Prob - 图78%5Cleq%201#card=math&code=0%20%5Cleq%20p_%7BX%7D%28x%29%5Cleq%201)
  • 2.CS229-Prob - 图79%7D%20p%7BX%7D(x)%3D1#card=math&code=%5Csum%7Bx%20%5Cin%20V%20%5Ctext%20%7B%20al%20%7D%28X%29%7D%20p_%7BX%7D%28x%29%3D1)
  • 2.CS229-Prob - 图80%3DP(X%20%5Cin%20A)#card=math&code=%5Csum%7Bx%20%5Cin%20A%7D%20p%7BX%7D%28x%29%3DP%28X%20%5Cin%20A%29)

2.3 概率密度函数

对于一些连续随机变量,累积分布函数2.CS229-Prob - 图81#card=math&code=F_X%20%28x%29)处可微。在这些情况下,我们将概率密度函数(PDF)定义为累积分布函数的导数,即:

2.CS229-Prob - 图82%20%5Ctriangleq%20%5Cfrac%7Bd%20F%7BX%7D(x)%7D%7Bd%20x%7D%0A#card=math&code=f%7BX%7D%28x%29%20%5Ctriangleq%20%5Cfrac%7Bd%20F_%7BX%7D%28x%29%7D%7Bd%20x%7D%0A)

请注意,连续随机变量的概率密度函数可能并不总是存在的(即,如果它不是处处可微)。

根据微分的性质,对于很小的2.CS229-Prob - 图83

2.CS229-Prob - 图84%20%5Capprox%20f%7BX%7D(x)%20%5CDelta%20x%0A#card=math&code=P%28x%20%5Cleq%20X%20%5Cleq%20x%2B%5CDelta%20x%29%20%5Capprox%20f%7BX%7D%28x%29%20%5CDelta%20x%0A)

CDFPDF(当它们存在时!)都可用于计算不同事件的概率。但是应该强调的是,任意给定点的概率密度函数(PDF)的值不是该事件的概率,即2.CS229-Prob - 图85%20%5Cnot%20%3D%20P(X%20%3D%20x)#card=math&code=f%20_X%20%28x%29%20%5Cnot%20%3D%20P%28X%20%3D%20x%29)。例如,2.CS229-Prob - 图86#card=math&code=f%20_X%20%28x%29)可以取大于1的值(但是2.CS229-Prob - 图87#card=math&code=f%20_X%20%28x%29)在2.CS229-Prob - 图88的任何子集上的积分最多为1)。

性质:

  • 2.CS229-Prob - 图89%5Cgeq%200#card=math&code=f_X%28x%29%5Cgeq%200)
  • 2.CS229-Prob - 图90%3D1#card=math&code=%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%7D%28x%29%3D1)
  • 2.CS229-Prob - 图91%20d%20x%3DP(X%20%5Cin%20A)#card=math&code=%5Cint%7Bx%20%5Cin%20A%7D%20f%7BX%7D%28x%29%20d%20x%3DP%28X%20%5Cin%20A%29)

2.4 期望

假设2.CS229-Prob - 图92是一个离散随机变量,其PMF2.CS229-Prob - 图93#card=math&code=p_X%20%28x%29),2.CS229-Prob - 图94是一个任意函数。在这种情况下,2.CS229-Prob - 图95#card=math&code=g%28X%29)可以被视为随机变量,我们将2.CS229-Prob - 图96#card=math&code=g%28X%29)的期望值定义为:

2.CS229-Prob - 图97%5D%20%5Ctriangleq%20%5Csum%7Bx%20%5Cin%20V%20a%20l(X)%7D%20g(x)%20p%7BX%7D(x)%0A#card=math&code=E%5Bg%28X%29%5D%20%5Ctriangleq%20%5Csum%7Bx%20%5Cin%20V%20a%20l%28X%29%7D%20g%28x%29%20p%7BX%7D%28x%29%0A)

如果2.CS229-Prob - 图98是一个连续的随机变量,其PDF2.CS229-Prob - 图99#card=math&code=f%20_X%20%28x%29),那么2.CS229-Prob - 图100#card=math&code=g%28X%29)的期望值被定义为:

2.CS229-Prob - 图101%5D%20%5Ctriangleq%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20g(x)%20f%7BX%7D(x)%20d%20x%0A#card=math&code=E%5Bg%28X%29%5D%20%5Ctriangleq%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20g%28x%29%20f%7BX%7D%28x%29%20d%20x%0A)

直觉上,2.CS229-Prob - 图102#card=math&code=g%28X%29)的期望值可以被认为是2.CS229-Prob - 图103#card=math&code=g%28x%29)对于不同的2.CS229-Prob - 图104值可以取的值的“加权平均值”,其中权重由2.CS229-Prob - 图105#card=math&code=p_X%28x%29)或2.CS229-Prob - 图106#card=math&code=f_X%28x%29)给出。作为上述情况的特例,请注意,随机变量本身的期望值,是通过令2.CS229-Prob - 图107%20%3D%20x#card=math&code=g%28x%29%20%3D%20x)得到的,这也被称为随机变量的平均值。

性质:

  • 对于任意常数 2.CS229-Prob - 图1082.CS229-Prob - 图109
  • 对于任意常数 2.CS229-Prob - 图1102.CS229-Prob - 图111%5D%3DaE%5Bf(X)%5D#card=math&code=E%5Baf%28X%29%5D%3DaE%5Bf%28X%29%5D)
  • (线性期望):2.CS229-Prob - 图112%2Bg(X)%5D%3DE%5Bf(X)%5D%2BE%5Bg(X)%5D#card=math&code=E%5Bf%28X%29%2Bg%28X%29%5D%3DE%5Bf%28X%29%5D%2BE%5Bg%28X%29%5D)
  • 对于一个离散随机变量2.CS229-Prob - 图1132.CS229-Prob - 图114#card=math&code=E%5B1%5C%7BX%3Dk%5C%7D%5D%3DP%28X%3Dk%29)

2.5 方差

随机变量2.CS229-Prob - 图115方差是随机变量2.CS229-Prob - 图116的分布围绕其平均值集中程度的度量。形式上,随机变量2.CS229-Prob - 图117的方差定义为:

2.CS229-Prob - 图118)%5E%7B2%7D%5Cright%5D%0A#card=math&code=%5Coperatorname%7BVar%7D%5BX%5D%20%5Ctriangleq%20E%5Cleft%5B%28X-E%28X%29%29%5E%7B2%7D%5Cright%5D%0A)

使用上一节中的性质,我们可以导出方差的替代表达式:

2.CS229-Prob - 图119%5E%7B2%7D%5Cright%5D%20%26%3DE%5Cleft%5BX%5E%7B2%7D-2%20E%5BX%5D%20X%2BE%5BX%5D%5E%7B2%7D%5Cright%5D%20%5C%5C%20%26%3DE%5Cleft%5BX%5E%7B2%7D%5Cright%5D-2%20E%5BX%5D%20E%5BX%5D%2BE%5BX%5D%5E%7B2%7D%20%5C%5C%20%26%3DE%5Cleft%5BX%5E%7B2%7D%5Cright%5D-E%5BX%5D%5E%7B2%7D%20%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%20E%5Cleft%5B%28X-E%5BX%5D%29%5E%7B2%7D%5Cright%5D%20%26%3DE%5Cleft%5BX%5E%7B2%7D-2%20E%5BX%5D%20X%2BE%5BX%5D%5E%7B2%7D%5Cright%5D%20%5C%5C%20%26%3DE%5Cleft%5BX%5E%7B2%7D%5Cright%5D-2%20E%5BX%5D%20E%5BX%5D%2BE%5BX%5D%5E%7B2%7D%20%5C%5C%20%26%3DE%5Cleft%5BX%5E%7B2%7D%5Cright%5D-E%5BX%5D%5E%7B2%7D%20%5Cend%7Baligned%7D%0A)

其中第二个等式来自期望的线性,以及2.CS229-Prob - 图120相对于外层期望实际上是常数的事实。

性质:

  • 对于任意常数 2.CS229-Prob - 图1212.CS229-Prob - 图122
  • 对于任意常数 2.CS229-Prob - 图1232.CS229-Prob - 图124%5D%3Da%5E2Var%5Bf(X)%5D#card=math&code=Var%5Baf%28X%29%5D%3Da%5E2Var%5Bf%28X%29%5D)

举例:

计算均匀随机变量2.CS229-Prob - 图125的平均值和方差,任意2.CS229-Prob - 图126,其PDF2.CS229-Prob - 图127%3D%201#card=math&code=p_X%28x%29%3D%201),其他地方为0。

2.CS229-Prob - 图128%20d%20x%3D%5Cint%7B0%7D%5E%7B1%7D%20x%20d%20x%3D%5Cfrac%7B1%7D%7B2%7D%0A#card=math&code=E%5BX%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20x%20f%7BX%7D%28x%29%20d%20x%3D%5Cint%7B0%7D%5E%7B1%7D%20x%20d%20x%3D%5Cfrac%7B1%7D%7B2%7D%0A)

2.CS229-Prob - 图129%20d%20x%3D%5Cint%7B0%7D%5E%7B1%7D%20x%5E%7B2%7D%20d%20x%3D%5Cfrac%7B1%7D%7B3%7D%0A#card=math&code=E%5Cleft%5BX%5E%7B2%7D%5Cright%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20x%5E%7B2%7D%20f%7BX%7D%28x%29%20d%20x%3D%5Cint%7B0%7D%5E%7B1%7D%20x%5E%7B2%7D%20d%20x%3D%5Cfrac%7B1%7D%7B3%7D%0A)

2.CS229-Prob - 图130

举例:

假设对于一些子集2.CS229-Prob - 图131,有2.CS229-Prob - 图132%20%3D%201%5C%7Bx%20%5Cin%20A%5C%7D#card=math&code=g%28x%29%20%3D%201%5C%7Bx%20%5Cin%20A%5C%7D),计算2.CS229-Prob - 图133%5D#card=math&code=E%5Bg%28X%29%5D)?

离散情况:

2.CS229-Prob - 图134%5D%3D%5Csum%7Bx%20%5Cin%20V%20a%20l(X)%7D%201%5C%7Bx%20%5Cin%20A%5C%7D%20P%7BX%7D(x)%20d%20x%3D%5Csum%7Bx%20%5Cin%20A%7D%20P%7BX%7D(x)%20d%20x%3DP(x%20%5Cin%20A)%0A#card=math&code=E%5Bg%28X%29%5D%3D%5Csum%7Bx%20%5Cin%20V%20a%20l%28X%29%7D%201%5C%7Bx%20%5Cin%20A%5C%7D%20P%7BX%7D%28x%29%20d%20x%3D%5Csum%7Bx%20%5Cin%20A%7D%20P%7BX%7D%28x%29%20d%20x%3DP%28x%20%5Cin%20A%29%0A)

连续情况:

2.CS229-Prob - 图135%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%201%5C%7Bx%20%5Cin%20A%5C%7D%20f%7BX%7D(x)%20d%20x%3D%5Cint%7Bx%20%5Cin%20A%7D%20f%7BX%7D(x)%20d%20x%3DP(x%20%5Cin%20A)%0A#card=math&code=E%5Bg%28X%29%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%201%5C%7Bx%20%5Cin%20A%5C%7D%20f%7BX%7D%28x%29%20d%20x%3D%5Cint%7Bx%20%5Cin%20A%7D%20f%7BX%7D%28x%29%20d%20x%3DP%28x%20%5Cin%20A%29%0A)

2.6 一些常见的随机变量

离散随机变量

  • 伯努利分布:硬币掷出正面的概率为2.CS229-Prob - 图136(其中:2.CS229-Prob - 图137),如果正面发生,则为1,否则为0。 2.CS229-Prob - 图138%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7Bp%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20p%3D1%7D%20%5C%5C%20%7B1-p%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20p%3D0%7D%5Cend%7Barray%7D%5Cright.%0A#card=math&code=p%28x%29%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7Bp%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20p%3D1%7D%20%5C%5C%20%7B1-p%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20p%3D0%7D%5Cend%7Barray%7D%5Cright.%0A)
  • 二项式分布:掷出正面概率为2.CS229-Prob - 图139(其中:2.CS229-Prob - 图140)的硬币2.CS229-Prob - 图141次独立投掷中正面的数量。

2.CS229-Prob - 图142%3D%5Cleft(%5Cbegin%7Barray%7D%7Bl%7D%7Bn%7D%20%5C%5C%20%7Bx%7D%5Cend%7Barray%7D%5Cright)%20p%5E%7Bx%7D(1-p)%5E%7Bn-x%7D%0A#card=math&code=p%28x%29%3D%5Cleft%28%5Cbegin%7Barray%7D%7Bl%7D%7Bn%7D%20%5C%5C%20%7Bx%7D%5Cend%7Barray%7D%5Cright%29%20p%5E%7Bx%7D%281-p%29%5E%7Bn-x%7D%0A)

  • 几何分布:掷出正面概率为2.CS229-Prob - 图143(其中:2.CS229-Prob - 图144)的硬币第一次掷出正面所需要的次数。

  • 泊松分布:用于模拟罕见事件频率的非负整数的概率分布(其中:2.CS229-Prob - 图145)。

2.CS229-Prob - 图146%3De%5E%7B-%5Clambda%7D%20%5Cfrac%7B%5Clambda%5E%7Bx%7D%7D%7Bx%20!%7D%0A#card=math&code=p%28x%29%3De%5E%7B-%5Clambda%7D%20%5Cfrac%7B%5Clambda%5E%7Bx%7D%7D%7Bx%20%21%7D%0A)

连续随机变量

  • 均匀分布:在2.CS229-Prob - 图1472.CS229-Prob - 图148之间每个点概率密度相等的分布(其中:2.CS229-Prob - 图149)。

2.CS229-Prob - 图150%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7B%5Cfrac%7B1%7D%7Bb-a%7D%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20a%20%5Cleq%20x%20%5Cleq%20b%7D%20%5C%5C%20%7B0%7D%20%26%20%7B%5Ctext%20%7B%20otherwise%20%7D%7D%5Cend%7Barray%7D%5Cright.%0A#card=math&code=f%28x%29%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7B%5Cfrac%7B1%7D%7Bb-a%7D%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20a%20%5Cleq%20x%20%5Cleq%20b%7D%20%5C%5C%20%7B0%7D%20%26%20%7B%5Ctext%20%7B%20otherwise%20%7D%7D%5Cend%7Barray%7D%5Cright.%0A)

  • 指数分布:在非负实数上有衰减的概率密度(其中:2.CS229-Prob - 图151)。

2.CS229-Prob - 图152%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7B%5Clambda%20e%5E%7B-%5Clambda%20x%7D%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20x%20%5Cgeq%200%7D%20%5C%5C%20%7B0%7D%20%26%20%7B%5Ctext%20%7B%20otherwise%20%7D%7D%5Cend%7Barray%7D%5Cright.%0A#card=math&code=f%28x%29%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7B%5Clambda%20e%5E%7B-%5Clambda%20x%7D%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20x%20%5Cgeq%200%7D%20%5C%5C%20%7B0%7D%20%26%20%7B%5Ctext%20%7B%20otherwise%20%7D%7D%5Cend%7Barray%7D%5Cright.%0A)

  • 正态分布:又被称为高斯分布。

2.CS229-Prob - 图153%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%20%5Cpi%7D%20%5Csigma%7D%20e%5E%7B-%5Cfrac%7B1%7D%7B2%20%5Csigma%5E%7B2%7D%7D(x-%5Cmu)%5E%7B2%7D%7D%0A#card=math&code=f%28x%29%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%20%5Cpi%7D%20%5Csigma%7D%20e%5E%7B-%5Cfrac%7B1%7D%7B2%20%5Csigma%5E%7B2%7D%7D%28x-%5Cmu%29%5E%7B2%7D%7D%0A)

一些随机变量的概率密度函数和累积分布函数的形状如图2所示。

2.CS229-Prob - 图154

分布 概率密度函数(PDF)或者概率质量函数(PMF) 均值 方差
2.CS229-Prob - 图155#card=math&code=Bernoulli%28p%29)(伯努利分布) 2.CS229-Prob - 图156 2.CS229-Prob - 图157 2.CS229-Prob - 图158#card=math&code=p%281-p%29)
2.CS229-Prob - 图159#card=math&code=Binomial%28n%2Cp%29)(二项式分布) 2.CS229-Prob - 图160%20p%5E%7Bk%7D(1-p)%5E%7Bn-k%7D#card=math&code=%5Cleft%28%5Cbegin%7Barray%7D%7Bl%7D%7Bn%7D%20%5C%5C%20%7Bk%7D%5Cend%7Barray%7D%5Cright%29%20p%5E%7Bk%7D%281-p%29%5E%7Bn-k%7D) 其中:2.CS229-Prob - 图161 2.CS229-Prob - 图162 2.CS229-Prob - 图163
2.CS229-Prob - 图164#card=math&code=Geometric%28p%29)(几何分布) 2.CS229-Prob - 图165%5E%7Bk-1%7D#card=math&code=p%281-p%29%5E%7Bk-1%7D) 其中:2.CS229-Prob - 图166 2.CS229-Prob - 图167 2.CS229-Prob - 图168
2.CS229-Prob - 图169#card=math&code=Poisson%28%5Clambda%29)(泊松分布) 2.CS229-Prob - 图170 其中:2.CS229-Prob - 图171 2.CS229-Prob - 图172 2.CS229-Prob - 图173
2.CS229-Prob - 图174#card=math&code=Uniform%28a%2Cb%29)(均匀分布) 2.CS229-Prob - 图175 存在2.CS229-Prob - 图176#card=math&code=x%20%5Cin%20%28a%2Cb%29) 2.CS229-Prob - 图177 2.CS229-Prob - 图178%5E2%7D%7B12%7D#card=math&code=%5Cfrac%7B%28b-a%29%5E2%7D%7B12%7D)
2.CS229-Prob - 图179#card=math&code=Gaussian%28%5Cmu%2C%5Csigma%5E2%29)(高斯分布) 2.CS229-Prob - 图180%5E%7B2%7D%7D#card=math&code=%5Cfrac%7B1%7D%7B%5Csqrt%7B2%20%5Cpi%7D%20%5Csigma%7D%20e%5E%7B-%5Cfrac%7B1%7D%7B2%20%5Csigma%5E%7B2%7D%7D%28x-%5Cmu%29%5E%7B2%7D%7D) 2.CS229-Prob - 图181 2.CS229-Prob - 图182
2.CS229-Prob - 图183#card=math&code=Exponential%28%5Clambda%29)(指数分布) 2.CS229-Prob - 图184 2.CS229-Prob - 图185 2.CS229-Prob - 图186 2.CS229-Prob - 图187

3. 两个随机变量

到目前为止,我们已经考虑了单个随机变量。然而,在许多情况下,在随机实验中,我们可能有不止一个感兴趣的量。例如,在一个我们掷硬币十次的实验中,我们可能既关心2.CS229-Prob - 图188%20%3D#card=math&code=X%28%5Comega%29%20%3D)出现的正面数量,也关心2.CS229-Prob - 图189%20%3D#card=math&code=Y%20%28%5Comega%29%20%3D)连续最长出现正面的长度。在本节中,我们考虑两个随机变量的设置。

3.1 联合分布和边缘分布

假设我们有两个随机变量,一个方法是分别考虑它们。如果我们这样做,我们只需要2.CS229-Prob - 图190#card=math&code=F_X%20%28x%29)和2.CS229-Prob - 图191#card=math&code=F_Y%20%28y%29)。但是如果我们想知道在随机实验的结果中,2.CS229-Prob - 图1922.CS229-Prob - 图193同时假设的值,我们需要一个更复杂的结构,称为2.CS229-Prob - 图1942.CS229-Prob - 图195联合累积分布函数,定义如下:

2.CS229-Prob - 图196%3DP(X%20%5Cleq%20x%2CY%20%5Cleq%20y)%0A#card=math&code=F_%7BXY%7D%28x%2Cy%29%3DP%28X%20%5Cleq%20x%2CY%20%5Cleq%20y%29%0A)

可以证明,通过了解联合累积分布函数,可以计算出任何涉及到2.CS229-Prob - 图1972.CS229-Prob - 图198的事件的概率。

联合CDF: 2.CS229-Prob - 图199#card=math&code=F_%7BXY%20%7D%28x%2Cy%29)和每个变量的联合分布函数2.CS229-Prob - 图200#card=math&code=F_X%28x%29)和2.CS229-Prob - 图201#card=math&code=F_Y%20%28y%29)分别由下式关联:

2.CS229-Prob - 图202%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F%7BX%20Y%7D(x%2C%20y)%20d%20y%0A#card=math&code=F%7BX%7D%28x%29%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F_%7BX%20Y%7D%28x%2C%20y%29%20d%20y%0A)

2.CS229-Prob - 图203%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F%7BX%20Y%7D(x%2C%20y)%20dx%0A#card=math&code=F%7BY%7D%28y%29%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F_%7BX%20Y%7D%28x%2C%20y%29%20dx%0A)

这里我们称2.CS229-Prob - 图204#card=math&code=FX%28x%29)和2.CS229-Prob - 图205#card=math&code=F_Y%20%28y%29)为 ![](https://g.yuque.com/gr/latex?F%7BXY%20%7D(x%2Cy)#card=math&code=F_%7BXY%20%7D%28x%2Cy%29)的边缘累积概率分布函数

性质:

  • 2.CS229-Prob - 图206%20%5Cleq%201#card=math&code=0%20%5Cleq%20F_%7BXY%20%7D%28x%2Cy%29%20%5Cleq%201)
  • 2.CS229-Prob - 图207%3D1#card=math&code=%5Clim%20%7Bx%2C%20y%20%5Crightarrow%20%5Cinfty%7D%20F%7BX%20Y%7D%28x%2C%20y%29%3D1)
  • 2.CS229-Prob - 图208%3D0#card=math&code=%5Clim%20%7Bx%2C%20y%20%5Crightarrow%20-%5Cinfty%7D%20F%7BX%20Y%7D%28x%2C%20y%29%3D0)
  • 2.CS229-Prob - 图209%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F%7BX%20Y%7D(x%2C%20y)#card=math&code=F%7BX%7D%28x%29%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F_%7BX%20Y%7D%28x%2C%20y%29)

3.2 联合概率和边缘概率质量函数

如果2.CS229-Prob - 图2102.CS229-Prob - 图211是离散随机变量,那么联合概率质量函数 2.CS229-Prob - 图212由下式定义:

2.CS229-Prob - 图213%3DP(X%3Dx%2CY%3Dy)%0A#card=math&code=p_%7BX%20Y%7D%28x%2Cy%29%3DP%28X%3Dx%2CY%3Dy%29%0A)

这里, 对于任意2.CS229-Prob - 图2142.CS229-Prob - 图2152.CS229-Prob - 图216%20%5Cleq%201#card=math&code=0%20%5Cleq%20P%7BXY%7D%20%28x%2Cy%29%20%5Cleq%201), 并且 ![](https://g.yuque.com/gr/latex?%5Csum%7Bx%20%5Cin%20V%20a%20l(X)%7D%20%5Csum%7By%20%5Cin%20V%20a%20l(Y)%7D%20P%7BX%20Y%7D(x%2C%20y)%3D1#card=math&code=%5Csum%7Bx%20%5Cin%20V%20a%20l%28X%29%7D%20%5Csum%7By%20%5Cin%20V%20a%20l%28Y%29%7D%20P_%7BX%20Y%7D%28x%2C%20y%29%3D1)

两个变量上的联合PMF分别与每个变量的概率质量函数有什么关系?事实上:

2.CS229-Prob - 图217%3D%5Csum%7By%7D%20p%7BX%20Y%7D(x%2C%20y)%0A#card=math&code=p%7BX%7D%28x%29%3D%5Csum%7By%7D%20p_%7BX%20Y%7D%28x%2C%20y%29%0A)

对于2.CS229-Prob - 图218#card=math&code=p_Y%20%28y%29)类似。在这种情况下,我们称2.CS229-Prob - 图219#card=math&code=p_X%28x%29)为2.CS229-Prob - 图220的边际概率质量函数。在统计学中,将一个变量相加形成另一个变量的边缘分布的过程通常称为“边缘化”。

3.3 联合概率和边缘概率密度函数

假设2.CS229-Prob - 图2212.CS229-Prob - 图222是两个连续的随机变量,具有联合分布函数2.CS229-Prob - 图223。在2.CS229-Prob - 图224#card=math&code=F_%7BXY%7D%28x%2Cy%29)在2.CS229-Prob - 图2252.CS229-Prob - 图226中处处可微的情况下,我们可以定义联合概率密度函数

2.CS229-Prob - 图227%3D%5Cfrac%7B%5Cpartial%5E%7B2%7D%20F%7BX%20Y%7D(x%2C%20y)%7D%7B%5Cpartial%20x%20%5Cpartial%20y%7D%0A#card=math&code=f%7BX%20Y%7D%28x%2C%20y%29%3D%5Cfrac%7B%5Cpartial%5E%7B2%7D%20F_%7BX%20Y%7D%28x%2C%20y%29%7D%7B%5Cpartial%20x%20%5Cpartial%20y%7D%0A)

如同在一维情况下,2.CS229-Prob - 图228%5Cnot%3D%20P(X%20%3D%20x%2CY%20%3D%20y)#card=math&code=f_%7BXY%7D%28x%2Cy%29%5Cnot%3D%20P%28X%20%3D%20x%2CY%20%3D%20y%29),而是:

2.CS229-Prob - 图229%20d%20x%20d%20y%3DP((X%2C%20Y)%20%5Cin%20A)%0A#card=math&code=%5Ciint%7Bx%20%5Cin%20A%7D%20f%7BX%20Y%7D%28x%2C%20y%29%20d%20x%20d%20y%3DP%28%28X%2C%20Y%29%20%5Cin%20A%29%0A)

请注意,概率密度函数2.CS229-Prob - 图230#card=math&code=f%7BXY%7D%28x%2Cy%29)的值总是非负的,但它们可能大于1。尽管如此,可以肯定的是 ![](https://g.yuque.com/gr/latex?%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%20Y%7D(x%2C%20y)%3D1#card=math&code=%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f_%7BX%20Y%7D%28x%2C%20y%29%3D1)

与离散情况相似,我们定义:

2.CS229-Prob - 图231%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%20Y%7D(x%2C%20y)%20d%20y%0A#card=math&code=f%7BX%7D%28x%29%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f_%7BX%20Y%7D%28x%2C%20y%29%20d%20y%0A)

作为2.CS229-Prob - 图232边际概率密度函数(或边际密度),对于2.CS229-Prob - 图233#card=math&code=f_Y%20%28y%29)也类似。

3.4 条件概率分布

条件分布试图回答这样一个问题,当我们知道2.CS229-Prob - 图234必须取某个值2.CS229-Prob - 图235时,2.CS229-Prob - 图236上的概率分布是什么?在离散情况下,给定2.CS229-Prob - 图237的条件概率质量函数是简单的:

2.CS229-Prob - 图238%3D%5Cfrac%7Bp%7BX%20Y%7D(x%2C%20y)%7D%7Bp%7BX%7D(x)%7D%0A#card=math&code=p%7BY%20%7C%20X%7D%28y%20%7C%20x%29%3D%5Cfrac%7Bp%7BX%20Y%7D%28x%2C%20y%29%7D%7Bp_%7BX%7D%28x%29%7D%0A)

假设分母不等于0。

在连续的情况下,在技术上要复杂一点,因为连续随机变量的概率等于零。忽略这一技术点,我们通过类比离散情况,简单地定义给定2.CS229-Prob - 图239的条件概率密度为:

2.CS229-Prob - 图240%3D%5Cfrac%7Bf%7BX%20Y%7D(x%2C%20y)%7D%7Bf%7BX%7D(x)%7D%0A#card=math&code=f%7BY%20%7C%20X%7D%28y%20%7C%20x%29%3D%5Cfrac%7Bf%7BX%20Y%7D%28x%2C%20y%29%7D%7Bf_%7BX%7D%28x%29%7D%0A)

假设分母不等于0。

3.5 贝叶斯定理

当试图推导一个变量给定另一个变量的条件概率表达式时,经常出现的一个有用公式是贝叶斯定理

对于离散随机变量2.CS229-Prob - 图2412.CS229-Prob - 图242

2.CS229-Prob - 图243%3D%5Cfrac%7B%7BP%7BXY%7D%7D(x%2C%20y)%7D%7BP%7BX%7D(x)%7D%3D%5Cfrac%7BP%7BX%20%7C%20Y%7D(x%20%7C%20y)%20P%7BY%7D(y)%7D%7B%5Csum%7By%5E%7B%5Cprime%7D%20%5Cin%20V%20a%20l(Y)%7D%20P%7BX%20%7C%20Y%7D%5Cleft(x%20%7C%20y%5E%7B%5Cprime%7D%5Cright)%20P%7BY%7D%5Cleft(y%5E%7B%5Cprime%7D%5Cright)%7D%0A#card=math&code=P%7BY%20%7C%20X%7D%28y%20%7C%20x%29%3D%5Cfrac%7B%7BP%7BXY%7D%7D%28x%2C%20y%29%7D%7BP%7BX%7D%28x%29%7D%3D%5Cfrac%7BP%7BX%20%7C%20Y%7D%28x%20%7C%20y%29%20P%7BY%7D%28y%29%7D%7B%5Csum%7By%5E%7B%5Cprime%7D%20%5Cin%20V%20a%20l%28Y%29%7D%20P%7BX%20%7C%20Y%7D%5Cleft%28x%20%7C%20y%5E%7B%5Cprime%7D%5Cright%29%20P_%7BY%7D%5Cleft%28y%5E%7B%5Cprime%7D%5Cright%29%7D%0A)

对于连续随机变量2.CS229-Prob - 图2442.CS229-Prob - 图245

2.CS229-Prob - 图246%3D%5Cfrac%7Bf%7BX%20Y%7D(x%2C%20y)%7D%7Bf%7BX%7D(x)%7D%3D%5Cfrac%7Bf%7BX%20%7C%20Y%7D(x%20%7C%20y)%20f%7BY%7D(y)%7D%7B%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%20%7C%20Y%7D%5Cleft(x%20%7C%20y%5E%7B%5Cprime%7D%5Cright)%20f%7BY%7D%5Cleft(y%5E%7B%5Cprime%7D%5Cright)%20d%20y%5E%7B%5Cprime%7D%7D%0A#card=math&code=f%7BY%20%7C%20X%7D%28y%20%7C%20x%29%3D%5Cfrac%7Bf%7BX%20Y%7D%28x%2C%20y%29%7D%7Bf%7BX%7D%28x%29%7D%3D%5Cfrac%7Bf%7BX%20%7C%20Y%7D%28x%20%7C%20y%29%20f%7BY%7D%28y%29%7D%7B%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%20%7C%20Y%7D%5Cleft%28x%20%7C%20y%5E%7B%5Cprime%7D%5Cright%29%20f_%7BY%7D%5Cleft%28y%5E%7B%5Cprime%7D%5Cright%29%20d%20y%5E%7B%5Cprime%7D%7D%0A)

3.6 独立性

如果对于2.CS229-Prob - 图2472.CS229-Prob - 图248的所有值,2.CS229-Prob - 图249%20%3D%20FX(x)F_Y(y)#card=math&code=F%7BXY%7D%28x%2Cy%29%20%3D%20F_X%28x%29F_Y%28y%29),则两个随机变量2.CS229-Prob - 图2502.CS229-Prob - 图251是独立的。等价地,

  • 对于离散随机变量, 对于任意2.CS229-Prob - 图252#card=math&code=x%20%5Cin%20Val%28X%29), 2.CS229-Prob - 图253#card=math&code=y%20%5Cin%20Val%28Y%29) ,2.CS229-Prob - 图254%20%3D%20pX%20(x)p_Y%20(y)#card=math&code=p%7BXY%7D%28x%2Cy%29%20%3D%20p_X%20%28x%29p_Y%20%28y%29)。
  • 对于离散随机变量, 2.CS229-Prob - 图255%20%3D%20p_Y%20(y)#card=math&code=p_Y%20%7CX%20%28y%7Cx%29%20%3D%20p_Y%20%28y%29)当对于任意2.CS229-Prob - 图256#card=math&code=y%20%5Cin%20Val%28Y%29)且2.CS229-Prob - 图257%20%5Cnot%3D%200#card=math&code=p_X%20%28x%29%20%5Cnot%3D%200)。
  • 对于连续随机变量, 2.CS229-Prob - 图258%20%3D%20fX%20(x)f_Y(y)#card=math&code=f%7BXY%7D%28x%2Cy%29%20%3D%20f_X%20%28x%29f_Y%28y%29) 对于任意 2.CS229-Prob - 图259
  • 对于连续随机变量, 2.CS229-Prob - 图260%20%3D%20fY%20(y)#card=math&code=f%7BY%20%7CX%7D%20%28y%7Cx%29%20%3D%20f_Y%20%28y%29) ,当2.CS229-Prob - 图261%5Cnot%20%3D%200#card=math&code=f_X%20%28x%29%5Cnot%20%3D%200)对于任意2.CS229-Prob - 图262

非正式地说,如果“知道”一个变量的值永远不会对另一个变量的条件概率分布有任何影响,那么两个随机变量2.CS229-Prob - 图2632.CS229-Prob - 图264是独立的,也就是说,你只要知道2.CS229-Prob - 图265#card=math&code=f%28x%29)和2.CS229-Prob - 图266#card=math&code=f%28y%29)就知道关于这对变量2.CS229-Prob - 图267#card=math&code=%28X%EF%BC%8CY%29)的所有信息。以下引理将这一观察形式化:

引理3.1

如果2.CS229-Prob - 图2682.CS229-Prob - 图269是独立的,那么对于任何2.CS229-Prob - 图270,我们有:

2.CS229-Prob - 图271%3DP(X%20%5Cin%20A)%20P(Y%20%5Cin%20B)%0A#card=math&code=P%28X%20%5Cin%20A%2C%20Y%20%5Cin%20B%29%3DP%28X%20%5Cin%20A%29%20P%28Y%20%5Cin%20B%29%0A)

利用上述引理,我们可以证明如果2.CS229-Prob - 图2722.CS229-Prob - 图273无关,那么2.CS229-Prob - 图274的任何函数都与2.CS229-Prob - 图275的任何函数无关。

3.7 期望和协方差

假设我们有两个离散的随机变量2.CS229-Prob - 图2762.CS229-Prob - 图277并且2.CS229-Prob - 图278是这两个随机变量的函数。那么2.CS229-Prob - 图279的期望值以如下方式定义:

2.CS229-Prob - 图280%5D%20%5Ctriangleq%20%5Csum%7Bx%20%5Cin%20V%20a%20l(X)%7D%20%5Csum%7By%20%5Cin%20V%20a%20l(Y)%7D%20g(x%2C%20y)%20p%7BX%20Y%7D(x%2C%20y)%0A#card=math&code=E%5Bg%28X%2C%20Y%29%5D%20%5Ctriangleq%20%5Csum%7Bx%20%5Cin%20V%20a%20l%28X%29%7D%20%5Csum%7By%20%5Cin%20V%20a%20l%28Y%29%7D%20g%28x%2C%20y%29%20p%7BX%20Y%7D%28x%2C%20y%29%0A)

对于连续随机变量2.CS229-Prob - 图2812.CS229-Prob - 图282,类似的表达式是:

2.CS229-Prob - 图283%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20g(x%2C%20y)%20f%7BX%20Y%7D(x%2C%20y)%20d%20x%20d%20y%0A#card=math&code=E%5Bg%28X%2C%20Y%29%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20g%28x%2C%20y%29%20f%7BX%20Y%7D%28x%2C%20y%29%20d%20x%20d%20y%0A)

我们可以用期望的概念来研究两个随机变量之间的关系。特别地,两个随机变量的协方差定义为:

2.CS229-Prob - 图284(Y-E%5BY%5D)%5D%0A#card=math&code=%7BCov%7D%5BX%2C%20Y%5D%20%5Ctriangleq%20E%5B%28X-E%5BX%5D%29%28Y-E%5BY%5D%29%5D%0A)

使用类似于方差的推导,我们可以将它重写为:

2.CS229-Prob - 图285(Y-E%5BY%5D)%5D%20%5C%5C%20%26%3DE%5BX%20Y-X%20E%5BY%5D-Y%20E%5BX%5D%2BE%5BX%5D%20E%5BY%5D%5D%20%5C%5C%20%26%3DE%5BX%20Y%5D-E%5BX%5D%20E%5BY%5D-E%5BY%5D%20E%5BX%5D%2BE%5BX%5D%20E%5BY%5D%5D%20%5C%5C%20%26%3DE%5BX%20Y%5D-E%5BX%5D%20E%5BY%5D%20%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%20%7BCov%7D%5BX%2C%20Y%5D%20%26%3DE%5B%28X-E%5BX%5D%29%28Y-E%5BY%5D%29%5D%20%5C%5C%20%26%3DE%5BX%20Y-X%20E%5BY%5D-Y%20E%5BX%5D%2BE%5BX%5D%20E%5BY%5D%5D%20%5C%5C%20%26%3DE%5BX%20Y%5D-E%5BX%5D%20E%5BY%5D-E%5BY%5D%20E%5BX%5D%2BE%5BX%5D%20E%5BY%5D%5D%20%5C%5C%20%26%3DE%5BX%20Y%5D-E%5BX%5D%20E%5BY%5D%20%5Cend%7Baligned%7D%0A)

在这里,说明两种协方差形式相等的关键步骤是第三个等号,在这里我们使用了这样一个事实,即2.CS229-Prob - 图2862.CS229-Prob - 图287实际上是常数,可以被提出来。当2.CS229-Prob - 图288时,我们说2.CS229-Prob - 图2892.CS229-Prob - 图290不相关。

性质:

  • (期望线性) 2.CS229-Prob - 图291%20%2B%20g(X%2CY)%5D%20%3D%20E%5Bf(X%2CY%20)%5D%20%2B%20E%5Bg(X%2CY)%5D#card=math&code=E%5Bf%28X%2CY%20%29%20%2B%20g%28X%2CY%29%5D%20%3D%20E%5Bf%28X%2CY%20%29%5D%20%2B%20E%5Bg%28X%2CY%29%5D)
  • 2.CS229-Prob - 图292
  • 如果2.CS229-Prob - 图2932.CS229-Prob - 图294相互独立, 那么 2.CS229-Prob - 图295
  • 如果2.CS229-Prob - 图2962.CS229-Prob - 图297相互独立, 那么 2.CS229-Prob - 图298g(Y%20)%5D%20%3D%20E%5Bf(X)%5DE%5Bg(Y)%5D#card=math&code=E%5Bf%28X%29g%28Y%20%29%5D%20%3D%20E%5Bf%28X%29%5DE%5Bg%28Y%29%5D).

4. 多个随机变量

上一节介绍的概念和想法可以推广到两个以上的随机变量。特别是,假设我们有2.CS229-Prob - 图299个连续随机变量,2.CS229-Prob - 图300%2CX_2%20(%5Comega)%2C%5Ccdots%20X_n%20(%5Comega)#card=math&code=X%20_1%20%28%5Comega%29%2CX_2%20%28%5Comega%29%2C%5Ccdots%20X_n%20%28%5Comega%29)。在本节中,为了表示简单,我们只关注连续的情况,对离散随机变量的推广工作类似。

4.1 基本性质

我们可以定义2.CS229-Prob - 图301联合累积分布函数联合概率密度函数,以及给定2.CS229-Prob - 图3022.CS229-Prob - 图303边缘概率密度函数为:

2.CS229-Prob - 图304%3DP%5Cleft(X%7B1%7D%20%5Cleq%20x%7B1%7D%2C%20X%7B2%7D%20%5Cleq%20x%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%20%5Cleq%20x%7Bn%7D%5Cright)%0A#card=math&code=F%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%3DP%5Cleft%28X%7B1%7D%20%5Cleq%20x%7B1%7D%2C%20X%7B2%7D%20%5Cleq%20x%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%20%5Cleq%20x_%7Bn%7D%5Cright%29%0A)

2.CS229-Prob - 图305%3D%5Cfrac%7B%5Cpartial%5E%7Bn%7D%20F%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%7D%7B%5Cpartial%20x%7B1%7D%20%5Cldots%20%5Cpartial%20x%7Bn%7D%7D%0A#card=math&code=f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%3D%5Cfrac%7B%5Cpartial%5E%7Bn%7D%20F%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%7D%7B%5Cpartial%20x%7B1%7D%20%5Cldots%20%5Cpartial%20x_%7Bn%7D%7D%0A)

2.CS229-Prob - 图306%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Ccdots%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7D%0A#card=math&code=f%7BX%7B1%7D%7D%5Cleft%28X%7B1%7D%5Cright%29%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Ccdots%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20d%20x%7B2%7D%20%5Cldots%20d%20x_%7Bn%7D%0A)

2.CS229-Prob - 图307%3D%5Cfrac%7Bf%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cdots%20x%7Bn%7D%5Cright)%7D%7Bf%7BX%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%7D%0A#card=math&code=f%7BX%7B1%7D%20%7C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%20%7C%20x%7B2%7D%2C%20%5Cdots%20x%7Bn%7D%5Cright%29%3D%5Cfrac%7Bf%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cdots%20x%7Bn%7D%5Cright%29%7D%7Bf%7BX%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x_%7Bn%7D%5Cright%29%7D%0A)

为了计算事件2.CS229-Prob - 图308的概率,我们有:

2.CS229-Prob - 图309%20%5Cin%20A%5Cright)%3D%5Cint%7B%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20%5Cin%20A%7D%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7D%0A#card=math&code=P%5Cleft%28%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20%5Cin%20A%5Cright%29%3D%5Cint%7B%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20%5Cin%20A%7D%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x_%7Bn%7D%0A)

链式法则:

从多个随机变量的条件概率的定义中,可以看出:

2.CS229-Prob - 图310%20%26%3Df%5Cleft(x%7Bn%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright)%20f%5Cleft(x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright)%20%5C%5C%20%26%3Df%5Cleft(x%7Bn%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright)%20f%5Cleft(x%7Bn-1%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-2%7D%5Cright)%20f%5Cleft(x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-2%7D%5Cright)%20%5C%5C%20%26%3D%5Ccdots%3Df%5Cleft(x%7B1%7D%5Cright)%20%5Cprod%7Bi%3D2%7D%5E%7Bn%7D%20f%5Cleft(x%7Bi%7D%20%7C%20x%7B1%7D%2C%20%5Cldots%2C%20x%7Bi-1%7D%5Cright)%20%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%20f%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright%29%20%26%3Df%5Cleft%28x%7Bn%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright%29%20f%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright%29%20%5C%5C%20%26%3Df%5Cleft%28x%7Bn%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright%29%20f%5Cleft%28x%7Bn-1%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-2%7D%5Cright%29%20f%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-2%7D%5Cright%29%20%5C%5C%20%26%3D%5Ccdots%3Df%5Cleft%28x%7B1%7D%5Cright%29%20%5Cprod%7Bi%3D2%7D%5E%7Bn%7D%20f%5Cleft%28x%7Bi%7D%20%7C%20x%7B1%7D%2C%20%5Cldots%2C%20x_%7Bi-1%7D%5Cright%29%20%5Cend%7Baligned%7D%0A)

独立性:对于多个事件,2.CS229-Prob - 图311,我们说2.CS229-Prob - 图312 是相互独立的,当对于任何子集2.CS229-Prob - 图313,我们有:

2.CS229-Prob - 图314%3D%5Cprod%7Bi%20%5Cin%20S%7D%20P%5Cleft(A%7Bi%7D%5Cright)%0A#card=math&code=P%5Cleft%28%5Ccap%7Bi%20%5Cin%20S%7D%20A%7Bi%7D%5Cright%29%3D%5Cprod%7Bi%20%5Cin%20S%7D%20P%5Cleft%28A%7Bi%7D%5Cright%29%0A)

同样,我们说随机变量2.CS229-Prob - 图315是独立的,如果:

2.CS229-Prob - 图316%3Df(x_1)f(x_2)%5Ccdots%20f(x_n)%0A#card=math&code=f%28x_1%2C%5Ccdots%2Cx_n%29%3Df%28x_1%29f%28x_2%29%5Ccdots%20f%28x_n%29%0A)

这里,相互独立性的定义只是两个随机变量独立性到多个随机变量的自然推广。

独立随机变量经常出现在机器学习算法中,其中我们假设属于训练集的训练样本代表来自某个未知概率分布的独立样本。为了明确独立性的重要性,考虑一个“坏的”训练集,我们首先从某个未知分布中抽取一个训练样本2.CS229-Prob - 图317%7D%2Cy%5E%7B(1)%7D)#card=math&code=%28x%5E%7B%20%281%29%7D%2Cy%5E%7B%281%29%7D%29),然后将完全相同的训练样本的2.CS229-Prob - 图318个副本添加到训练集中。在这种情况下,我们有:

2.CS229-Prob - 图319%7D%2C%20y%5E%7B(1)%7D%5Cright)%2C%20%5Cldots%20.%5Cleft(x%5E%7B(m)%7D%2C%20y%5E%7B(m)%7D%5Cright)%5Cright)%20%5Cneq%20%5Cprod%7Bi%3D1%7D%5E%7Bm%7D%20P%5Cleft(x%5E%7B(i)%7D%2C%20y%5E%7B(i)%7D%5Cright)%0A#card=math&code=P%5Cleft%28%5Cleft%28x%5E%7B%281%29%7D%2C%20y%5E%7B%281%29%7D%5Cright%29%2C%20%5Cldots%20.%5Cleft%28x%5E%7B%28m%29%7D%2C%20y%5E%7B%28m%29%7D%5Cright%29%5Cright%29%20%5Cneq%20%5Cprod%7Bi%3D1%7D%5E%7Bm%7D%20P%5Cleft%28x%5E%7B%28i%29%7D%2C%20y%5E%7B%28i%29%7D%5Cright%29%0A)

尽管训练集的大小为2.CS229-Prob - 图320,但这些例子并不独立!虽然这里描述的过程显然不是为机器学习算法建立训练集的明智方法,但是事实证明,在实践中,样本的不独立性确实经常出现,并且它具有减小训练集的“有效大小”的效果。

4.2 随机向量

假设我们有n个随机变量。当把所有这些随机变量放在一起工作时,我们经常会发现把它们放在一个向量中是很方便的…我们称结果向量为随机向量(更正式地说,随机向量是从2.CS229-Prob - 图3212.CS229-Prob - 图322的映射)。应该清楚的是,随机向量只是处理2.CS229-Prob - 图323个随机变量的一种替代符号,因此联合概率密度函数和综合密度函数的概念也将适用于随机向量。

期望:

考虑2.CS229-Prob - 图324中的任意函数。这个函数的期望值 被定义为

2.CS229-Prob - 图325%5D%3D%5Cint%7B%5Cmathbb%7BR%7D%5E%7Bn%7D%7D%20g%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright)%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7DE%5Bg(X)%5D%5C%5C%3D%5Cint%7B%5Cmathbb%7BR%7D%5E%7Bn%7D%7D%20g%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright)%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7D%0A#card=math&code=E%5Bg%28X%29%5D%3D%5Cint%7B%5Cmathbb%7BR%7D%5E%7Bn%7D%7D%20g%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright%29%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7DE%5Bg%28X%29%5D%5C%5C%3D%5Cint%7B%5Cmathbb%7BR%7D%5E%7Bn%7D%7D%20g%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright%29%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7D%0A)

其中,2.CS229-Prob - 图326是从2.CS229-Prob - 图3272.CS229-Prob - 图3282.CS229-Prob - 图329个连续积分。如果2.CS229-Prob - 图330是从2.CS229-Prob - 图3312.CS229-Prob - 图332的函数,那么2.CS229-Prob - 图333的期望值是输出向量的元素期望值,即,如果2.CS229-Prob - 图334是:

2.CS229-Prob - 图335%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%7Bg%7B1%7D(x)%7D%20%5C%5C%20%7Bg%7B2%7D(x)%7D%20%5C%5C%20%7B%5Cvdots%7D%20%5C%5C%20%7Bg%7Bm%7D(x)%7D%5Cend%7Barray%7D%5Cright%5D%0A#card=math&code=g%28x%29%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%7Bg%7B1%7D%28x%29%7D%20%5C%5C%20%7Bg%7B2%7D%28x%29%7D%20%5C%5C%20%7B%5Cvdots%7D%20%5C%5C%20%7Bg%7Bm%7D%28x%29%7D%5Cend%7Barray%7D%5Cright%5D%0A)

那么,

2.CS229-Prob - 图336%5D%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%7BE%5Cleft%5Bg%7B1%7D(X)%5Cright%5D%7D%20%5C%5C%20%7BE%5Cleft%5Bg%7B2%7D(X)%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5Bg%7Bm%7D(X)%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%0A#card=math&code=E%5Bg%28X%29%5D%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%7BE%5Cleft%5Bg%7B1%7D%28X%29%5Cright%5D%7D%20%5C%5C%20%7BE%5Cleft%5Bg%7B2%7D%28X%29%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5Bg%7Bm%7D%28X%29%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%0A)

协方差矩阵:对于给定的随机向量2.CS229-Prob - 图337,其协方差矩阵2.CS229-Prob - 图3382.CS229-Prob - 图339平方矩阵,其输入由2.CS229-Prob - 图340给出。从协方差的定义来看,我们有:

2.CS229-Prob - 图341(X-E%5BX%5D)%5E%7BT%7D%5Cright%5D%0A%0A%5Cend%7Bequation%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%5Cbegin%7Bequation%7D%0A%5CSigma%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bccc%7D%7B%7BCov%7D%5Cleft%5BX%7B1%7D%2C%20X%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7B%7BCov%7D%5Cleft%5BX%7B1%7D%2C%20X%7Bn%7D%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%26%20%7B%5Cddots%7D%20%26%20%7B%5Cvdots%7D%20%5C%5C%20%7B%7BCov%7D%5Cleft%5BX%7Bn%7D%2C%20X%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7B%7BCov%7D%5Cleft%5BX%7Bn%7D%2C%20X%7Bn%7D%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%5C%5C%0A%0A%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bccc%7D%7BE%5Cleft%5BX%7B1%7D%5E%7B2%7D%5Cright%5D-E%5Cleft%5BX%7B1%7D%5Cright%5D%20E%5Cleft%5BX%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7B1%7D%20X%7Bn%7D%5Cright%5D-E%5Cleft%5BX%7B1%7D%5Cright%5D%20E%5Cleft%5BX%7Bn%7D%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%26%20%7B%5Cddots%7D%20%26%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5BX%7Bn%7D%20X%7B1%7D%5Cright%5D-E%5Cleft%5BX%7Bn%7D%5Cright%5D%20E%5Cleft%5BX%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7Bn%7D%5E%7B2%7D%5Cright%5D-E%5Cleft%5BX%7Bn%7D%5Cright%5D%20E%5Cleft%5BX%7Bn%7D%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%5C%5C%0A%0A%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bccc%7D%7BE%5Cleft%5BX%7B1%7D%5E%7B2%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7B1%7D%20X%7Bn%7D%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%26%20%7B%5Cddots%7D%20%26%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5BX%7Bn%7D%20X%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7Bn%7D%5E%7B2%7D%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D-%5Cleft%5B%5Cbegin%7Barray%7D%7Bccc%7D%7BE%5Cleft%5BX%7B1%7D%5Cright%5D%20E%5Cleft%5BX%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7B1%7D%5Cright%5D%20E%5Cleft%5BX%7Bn%7D%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%26%20%7B%5Cddots%7D%20%26%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5BX%7Bn%7D%5Cright%5D%20E%5Cleft%5BX%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7Bn%7D%5Cright%5D%20E%5Cleft%5BX%7Bn%7D%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%5C%5C%0A%3DE%5Cleft%5BX%20X%5E%7BT%7D%5Cright%5D-E%5BX%5D%20E%5BX%5D%5E%7BT%7D%3D%5Cldots%3DE%5Cleft%5B%28X-E%5BX%5D%29%28X-E%5BX%5D%29%5E%7BT%7D%5Cright%5D%0A%0A%5Cend%7Bequation%7D%0A%5Cend%7Baligned%7D%0A)

其中矩阵期望以明显的方式定义。
协方差矩阵有许多有用的属性:

  • 2.CS229-Prob - 图342;也就是说,2.CS229-Prob - 图343是正半定的。
  • 2.CS229-Prob - 图344;也就是说,2.CS229-Prob - 图345是对称的。

4.3 多元高斯分布

随机向量上概率分布的一个特别重要的例子叫做多元高斯或多元正态分布。随机向量2.CS229-Prob - 图346被认为具有多元正态(或高斯)分布,当其具有均值2.CS229-Prob - 图347和协方差矩阵2.CS229-Prob - 图348(其中$ \mathbb{S}_{++}^{n}2.CS229-Prob - 图349n \times n$矩阵的空间)

2.CS229-Prob - 图350%3D%5Cfrac%7B1%7D%7B(2%20%5Cpi)%5E%7Bn%20%2F%202%7D%7C%5CSigma%7C%5E%7B1%20%2F%202%7D%7D%20%5Cexp%20%5Cleft(-%5Cfrac%7B1%7D%7B2%7D(x-%5Cmu)%5E%7BT%7D%20%5CSigma%5E%7B-1%7D(x-%5Cmu)%5Cright)#card=math&code=f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x_%7Bn%7D%20%3B%20%5Cmu%2C%20%5CSigma%5Cright%29%3D%5Cfrac%7B1%7D%7B%282%20%5Cpi%29%5E%7Bn%20%2F%202%7D%7C%5CSigma%7C%5E%7B1%20%2F%202%7D%7D%20%5Cexp%20%5Cleft%28-%5Cfrac%7B1%7D%7B2%7D%28x-%5Cmu%29%5E%7BT%7D%20%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%5Cright%29)

我们把它写成2.CS229-Prob - 图351#card=math&code=X%20%5Csim%20%5Cmathcal%7BN%7D%28%5Cmu%2C%20%5CSigma%29)。请注意,在2.CS229-Prob - 图352的情况下,它降维成普通正态分布,其中均值参数为2.CS229-Prob - 图353,方差为2.CS229-Prob - 图354

一般来说,高斯随机变量在机器学习和统计中非常有用,主要有两个原因:

首先,在统计算法中对“噪声”建模时,它们非常常见。通常,噪声可以被认为是影响测量过程的大量小的独立随机扰动的累积;根据中心极限定理,独立随机变量的总和将趋向于“看起来像高斯”。

其次,高斯随机变量便于许多分析操作,因为实际中出现的许多涉及高斯分布的积分都有简单的封闭形式解。我们将在本课程稍后遇到这种情况。

5. 其他资源

一本关于CS229所需概率水平的好教科书是谢尔顿·罗斯的《概率第一课》(A First Course on Probability by Sheldon Ross)。