本文是斯坦福大学CS229机器学习课程的基础材料,原始文件下载

原文作者:Arian Maleki , Tom Do

翻译:石振宇

审核和修改制作:黄海广

备注:请关注github的更新。

CS229 机器学习课程复习材料-概率论

概率论复习和参考

概率论是对不确定性的研究。通过这门课,我们将依靠概率论中的概念来推导机器学习算法。这篇笔记试图涵盖适用于CS229的概率论基础。概率论的数学理论非常复杂,并且涉及到“分析”的一个分支:测度论。在这篇笔记中,我们提供了概率的一些基本处理方法,但是不会涉及到这些更复杂的细节。

1. 概率的基本要素

为了定义集合上的概率,我们需要一些基本元素,

  • 样本空间2.CS229-Prob - 图1:随机实验的所有结果的集合。在这里,每个结果2.CS229-Prob - 图2可以被认为是实验结束时现实世界状态的完整描述。
  • 事件集(事件空间)2.CS229-Prob - 图3:元素 2.CS229-Prob - 图4 的集合(称为事件)是2.CS229-Prob - 图5的子集(即每个 2.CS229-Prob - 图6 是一个实验可能结果的集合)。
    备注:2.CS229-Prob - 图7需要满足以下三个条件:
    (1) 2.CS229-Prob - 图8
    (2) 2.CS229-Prob - 图9
    (3) 2.CS229-Prob - 图10
  • 概率度量2.CS229-Prob - 图11:函数2.CS229-Prob - 图12是一个$ \mathcal{F} \rightarrow \mathbb{R}$的映射,满足以下性质:
  • 对于每个 2.CS229-Prob - 图132.CS229-Prob - 图14%20%5Cgeq%200#card=math&code=P%28A%29%20%5Cgeq%200&id=zMD06),
  • 2.CS229-Prob - 图15%20%3D%201#card=math&code=P%28%5COmega%29%20%3D%201&id=YP3CE)
  • 如果2.CS229-Prob - 图16 是互不相交的事件 (即 当$ i \neq j2.CS229-Prob - 图17A{i} \cap A{j}=\emptyset$ ), 那么:

2.CS229-Prob - 图18%3D%5Csum%7Bi%7D%20P%5Cleft(A%7Bi%7D%5Cright)%0A#card=math&code=P%5Cleft%28%5Ccup%7Bi%7D%20A%7Bi%7D%5Cright%29%3D%5Csum%7Bi%7D%20P%5Cleft%28A%7Bi%7D%5Cright%29%0A&id=Xz5Dy)

以上三条性质被称为概率公理

举例

考虑投掷六面骰子的事件。样本空间为2.CS229-Prob - 图19。最简单的事件空间是平凡事件空间2.CS229-Prob - 图20.另一个事件空间是2.CS229-Prob - 图21的所有子集的集合。对于第一个事件空间,满足上述要求的唯一概率度量由2.CS229-Prob - 图22%20%3D%200#card=math&code=P%28%5Cemptyset%29%20%3D%200&id=om3hV),2.CS229-Prob - 图23%3D%201#card=math&code=p%28%5COmega%29%3D%201&id=cZvIu)给出。对于第二个事件空间,一个有效的概率度量是将事件空间中每个事件的概率分配为2.CS229-Prob - 图24,这里2.CS229-Prob - 图25 是这个事件集合中元素的数量;例如2.CS229-Prob - 图26%20%3D4%2F6#card=math&code=P%28%5C%7B1%2C2%2C3%2C4%5C%7D%29%20%3D4%2F6&id=J8BaK),2.CS229-Prob - 图27%20%3D3%2F6#card=math&code=P%28%5C%7B1%2C2%2C3%5C%7D%29%20%3D3%2F6&id=H9lcB)。

性质:

  • 如果2.CS229-Prob - 图28,则:$ P(A) \leq P(B)$
  • 2.CS229-Prob - 图29%20%5Cleq%20min(P(A)%2CP(B)%20)#card=math&code=P%28A%20%5Ccap%20B%29%20%5Cleq%20min%28P%28A%29%2CP%28B%29%20%29&id=zYxFW)
  • (布尔不等式):2.CS229-Prob - 图30%20%5Cleq%20P(A)%2BP(B)#card=math&code=P%28A%20%5Ccup%20B%29%20%5Cleq%20P%28A%29%2BP%28B%29&id=skgIc)
  • 2.CS229-Prob - 图31%20%3D1-P(A)#card=math&code=P%28%5COmega%20%7CA%20%29%20%3D1-P%28A%29&id=XLxEf)
  • (全概率定律):如果2.CS229-Prob - 图32是一些互不相交的事件并且它们的并集是2.CS229-Prob - 图33,那么它们的概率之和是1

1.1 条件概率和独立性

假设2.CS229-Prob - 图34是一个概率非0的事件,我们定义在给定2.CS229-Prob - 图35的条件下2.CS229-Prob - 图36 的条件概率为:

2.CS229-Prob - 图37%20%5Ctriangleq%20%5Cfrac%7BP(A%20%5Ccap%20B)%7D%7BP(B)%7D%0A#card=math&code=P%28A%20%7C%20B%29%20%5Ctriangleq%20%5Cfrac%7BP%28A%20%5Ccap%20B%29%7D%7BP%28B%29%7D%0A&id=vbG8d)

换句话说,2.CS229-Prob - 图38)是度量已经观测到2.CS229-Prob - 图39事件发生的情况下2.CS229-Prob - 图40事件发生的概率,两个事件被称为独立事件当且仅当2.CS229-Prob - 图41%20%3D%20P(A)P(B)#card=math&code=P%28A%20%5Ccap%20B%29%20%3D%20P%28A%29P%28B%29&id=Fj8aq)(或等价地,2.CS229-Prob - 图42%20%3D%20P(A)#card=math&code=P%28A%7CB%29%20%3D%20P%28A%29&id=t3XRy))。因此,独立性相当于是说观察到事件2.CS229-Prob - 图43对于事件2.CS229-Prob - 图44的概率没有任何影响。

2. 随机变量

考虑一个实验,我们翻转10枚硬币,我们想知道正面硬币的数量。这里,样本空间2.CS229-Prob - 图45的元素是长度为10的序列。例如,我们可能有2.CS229-Prob - 图46。然而,在实践中,我们通常不关心获得任何特定正反序列的概率。相反,我们通常关心结果的实值函数,比如我们10次投掷中出现的正面数,或者最长的背面长度。在某些技术条件下,这些函数被称为随机变量

更正式地说,随机变量2.CS229-Prob - 图47是一个的2.CS229-Prob - 图48函数。通常,我们将使用大写字母2.CS229-Prob - 图49#card=math&code=X%28%5Comega%29&id=YWetr)或更简单的2.CS229-Prob - 图50(其中隐含对随机结果2.CS229-Prob - 图51的依赖)来表示随机变量。我们将使用小写字母2.CS229-Prob - 图52来表示随机变量的值。

举例:
在我们上面的实验中,假设2.CS229-Prob - 图53#card=math&code=X%28%5Comega%29&id=UWKRW)是在投掷序列2.CS229-Prob - 图54中出现的正面的数量。假设投掷的硬币只有10枚,那么2.CS229-Prob - 图55#card=math&code=X%28%5Comega%29&id=h85kP)只能取有限数量的值,因此它被称为离散随机变量。这里,与随机变量2.CS229-Prob - 图56相关联的集合取某个特定值2.CS229-Prob - 图57的概率为:

2.CS229-Prob - 图58%20%3A%3DP(%5C%7B%5Comega%20%3A%20X(%5Comega)%20%3Dk%5C%7D)%0A#card=math&code=P%28X%3Dk%29%20%3A%3DP%28%5C%7B%5Comega%20%3A%20X%28%5Comega%29%20%3Dk%5C%7D%29%0A&id=c0Ln0)

举例:
假设2.CS229-Prob - 图59#card=math&code=X%28%5Comega%29&id=FLcff)是一个随机变量,表示放射性粒子衰变所需的时间。在这种情况下,2.CS229-Prob - 图60#card=math&code=X%28%5Comega%29&id=C61hn)具有无限多的可能值,因此它被称为连续随机变量。我们将2.CS229-Prob - 图61在两个实常数2.CS229-Prob - 图622.CS229-Prob - 图63之间取值的概率(其中2.CS229-Prob - 图64)表示为:

2.CS229-Prob - 图65%20%3A%3DP(%5C%7B%5Comega%20%3A%20a%20%5Cleq%20X(%5Comega)%20%5Cleq%20b%5C%7D)%0A#card=math&code=P%28a%20%5Cleq%20X%20%5Cleq%20b%29%20%3A%3DP%28%5C%7B%5Comega%20%3A%20a%20%5Cleq%20X%28%5Comega%29%20%5Cleq%20b%5C%7D%29%0A&id=IDvOH)

2.1 累积分布函数

为了指定处理随机变量时使用的概率度量,通常可以方便地指定替代函数(CDFPDFPMF),在本节和接下来的两节中,我们将依次描述这些类型的函数。

累积分布函数(CDF)是函数2.CS229-Prob - 图66,它将概率度量指定为:

2.CS229-Prob - 图67%20%5Ctriangleq%20P(X%20%5Cleq%20x)%0A#card=math&code=F_%7BX%7D%28x%29%20%5Ctriangleq%20P%28X%20%5Cleq%20x%29%0A&id=N9xCK)

通过使用这个函数,我们可以计算任意事件发生的概率。图1显示了一个样本CDF函数。
image.png

  • 2.CS229-Prob - 图69%5Cleq%201#card=math&code=0%20%5Cleq%20F_%7BX%7D%28x%29%5Cleq%201&id=lYQbd)
  • 2.CS229-Prob - 图70%3D0#card=math&code=%5Clim%20%7Bx%20%5Crightarrow-%5Cinfty%7D%20F%7BX%7D%28x%29%3D0&id=LrFMP)
  • 2.CS229-Prob - 图71%3D1#card=math&code=%5Clim%20%7Bx%20%5Crightarrow%5Cinfty%7D%20F%7BX%7D%28x%29%3D1&id=fozJi)
  • 2.CS229-Prob - 图72%5Cleq%20F%7BX%7D(y)#card=math&code=x%20%5Cleq%20y%20%5CLongrightarrow%20%20F%7BX%7D%28x%29%5Cleq%20F_%7BX%7D%28y%29&id=PaC4x)

2.2 概率质量函数

当随机变量2.CS229-Prob - 图73取有限种可能值(即,2.CS229-Prob - 图74是离散随机变量)时,表示与随机变量相关联的概率度量的更简单的方法是直接指定随机变量可以假设的每个值的概率。特别地,概率质量函数(PMF)是函数 2.CS229-Prob - 图75,这样:

2.CS229-Prob - 图76%20%5Ctriangleq%20P(X%3Dx)%0A#card=math&code=p_%7BX%7D%28x%29%20%5Ctriangleq%20P%28X%3Dx%29%0A&id=o5QCT)

在离散随机变量的情况下,我们使用符号2.CS229-Prob - 图77#card=math&code=Val%28X%29&id=p3mZZ)表示随机变量2.CS229-Prob - 图78可能假设的一组可能值。例如,如果2.CS229-Prob - 图79#card=math&code=X%28%5Comega%29&id=Pp7PQ)是一个随机变量,表示十次投掷硬币中的正面数,那么2.CS229-Prob - 图80%20%3D%5C%7B0%EF%BC%8C1%EF%BC%8C2%EF%BC%8C…%EF%BC%8C10%5C%7D#card=math&code=Val%28X%29%20%3D%5C%7B0%EF%BC%8C1%EF%BC%8C2%EF%BC%8C…%EF%BC%8C10%5C%7D&id=YPtRB)。

性质:

  • 2.CS229-Prob - 图81%5Cleq%201#card=math&code=0%20%5Cleq%20p_%7BX%7D%28x%29%5Cleq%201&id=PyBik)
  • 2.CS229-Prob - 图82%7D%20p%7BX%7D(x)%3D1#card=math&code=%5Csum%7Bx%20%5Cin%20V%20%5Ctext%20%7B%20al%20%7D%28X%29%7D%20p_%7BX%7D%28x%29%3D1&id=Zpd2B)
  • 2.CS229-Prob - 图83%3DP(X%20%5Cin%20A)#card=math&code=%5Csum%7Bx%20%5Cin%20A%7D%20p%7BX%7D%28x%29%3DP%28X%20%5Cin%20A%29&id=tmpvu)

2.3 概率密度函数

对于一些连续随机变量,累积分布函数2.CS229-Prob - 图84#card=math&code=F_X%20%28x%29&id=ws80K)处可微。在这些情况下,我们将概率密度函数(PDF)定义为累积分布函数的导数,即:

2.CS229-Prob - 图85%20%5Ctriangleq%20%5Cfrac%7Bd%20F%7BX%7D(x)%7D%7Bd%20x%7D%0A#card=math&code=f%7BX%7D%28x%29%20%5Ctriangleq%20%5Cfrac%7Bd%20F_%7BX%7D%28x%29%7D%7Bd%20x%7D%0A&id=EHMT4)

请注意,连续随机变量的概率密度函数可能并不总是存在的(即,如果它不是处处可微)。

根据微分的性质,对于很小的2.CS229-Prob - 图86

2.CS229-Prob - 图87%20%5Capprox%20f%7BX%7D(x)%20%5CDelta%20x%0A#card=math&code=P%28x%20%5Cleq%20X%20%5Cleq%20x%2B%5CDelta%20x%29%20%5Capprox%20f%7BX%7D%28x%29%20%5CDelta%20x%0A&id=BKrU9)

CDFPDF(当它们存在时!)都可用于计算不同事件的概率。但是应该强调的是,任意给定点的概率密度函数(PDF)的值不是该事件的概率,即2.CS229-Prob - 图88%20%5Cnot%20%3D%20P(X%20%3D%20x)#card=math&code=f%20_X%20%28x%29%20%5Cnot%20%3D%20P%28X%20%3D%20x%29&id=JtUa6)。例如,2.CS229-Prob - 图89#card=math&code=f%20_X%20%28x%29&id=Cy5R2)可以取大于1的值(但是2.CS229-Prob - 图90#card=math&code=f%20_X%20%28x%29&id=ZjCBU)在2.CS229-Prob - 图91的任何子集上的积分最多为1)。

性质:

  • 2.CS229-Prob - 图92%5Cgeq%200#card=math&code=f_X%28x%29%5Cgeq%200&id=hjouM)
  • 2.CS229-Prob - 图93%3D1#card=math&code=%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%7D%28x%29%3D1&id=lmFvs)
  • 2.CS229-Prob - 图94%20d%20x%3DP(X%20%5Cin%20A)#card=math&code=%5Cint%7Bx%20%5Cin%20A%7D%20f%7BX%7D%28x%29%20d%20x%3DP%28X%20%5Cin%20A%29&id=Hj0MN)

2.4 期望

假设2.CS229-Prob - 图95是一个离散随机变量,其PMF2.CS229-Prob - 图96#card=math&code=p_X%20%28x%29&id=lJTCQ),2.CS229-Prob - 图97是一个任意函数。在这种情况下,2.CS229-Prob - 图98#card=math&code=g%28X%29&id=pcnqb)可以被视为随机变量,我们将2.CS229-Prob - 图99#card=math&code=g%28X%29&id=OJVzn)的期望值定义为:

2.CS229-Prob - 图100%5D%20%5Ctriangleq%20%5Csum%7Bx%20%5Cin%20V%20a%20l(X)%7D%20g(x)%20p%7BX%7D(x)%0A#card=math&code=E%5Bg%28X%29%5D%20%5Ctriangleq%20%5Csum%7Bx%20%5Cin%20V%20a%20l%28X%29%7D%20g%28x%29%20p%7BX%7D%28x%29%0A&id=Ddpda)

如果2.CS229-Prob - 图101是一个连续的随机变量,其PDF2.CS229-Prob - 图102#card=math&code=f%20_X%20%28x%29&id=lQbUl),那么2.CS229-Prob - 图103#card=math&code=g%28X%29&id=ZThLx)的期望值被定义为:

2.CS229-Prob - 图104%5D%20%5Ctriangleq%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20g(x)%20f%7BX%7D(x)%20d%20x%0A#card=math&code=E%5Bg%28X%29%5D%20%5Ctriangleq%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20g%28x%29%20f%7BX%7D%28x%29%20d%20x%0A&id=PjJtT)

直觉上,2.CS229-Prob - 图105#card=math&code=g%28X%29&id=LTl6r)的期望值可以被认为是2.CS229-Prob - 图106#card=math&code=g%28x%29&id=KAKpX)对于不同的2.CS229-Prob - 图107值可以取的值的“加权平均值”,其中权重由2.CS229-Prob - 图108#card=math&code=p_X%28x%29&id=cF49t)或2.CS229-Prob - 图109#card=math&code=f_X%28x%29&id=ODynQ)给出。作为上述情况的特例,请注意,随机变量本身的期望值,是通过令2.CS229-Prob - 图110%20%3D%20x#card=math&code=g%28x%29%20%3D%20x&id=J3Hax)得到的,这也被称为随机变量的平均值。

性质:

  • 对于任意常数 2.CS229-Prob - 图1112.CS229-Prob - 图112
  • 对于任意常数 2.CS229-Prob - 图1132.CS229-Prob - 图114%5D%3DaE%5Bf(X)%5D#card=math&code=E%5Baf%28X%29%5D%3DaE%5Bf%28X%29%5D&id=T2oQO)
  • (线性期望):2.CS229-Prob - 图115%2Bg(X)%5D%3DE%5Bf(X)%5D%2BE%5Bg(X)%5D#card=math&code=E%5Bf%28X%29%2Bg%28X%29%5D%3DE%5Bf%28X%29%5D%2BE%5Bg%28X%29%5D&id=cQ087)
  • 对于一个离散随机变量2.CS229-Prob - 图1162.CS229-Prob - 图117#card=math&code=E%5B1%5C%7BX%3Dk%5C%7D%5D%3DP%28X%3Dk%29&id=C9sUW)

2.5 方差

随机变量2.CS229-Prob - 图118方差是随机变量2.CS229-Prob - 图119的分布围绕其平均值集中程度的度量。形式上,随机变量2.CS229-Prob - 图120的方差定义为:

2.CS229-Prob - 图121)%5E%7B2%7D%5Cright%5D%0A#card=math&code=%5Coperatorname%7BVar%7D%5BX%5D%20%5Ctriangleq%20E%5Cleft%5B%28X-E%28X%29%29%5E%7B2%7D%5Cright%5D%0A&id=FGkga)

使用上一节中的性质,我们可以导出方差的替代表达式:

2.CS229-Prob - 图122%5E%7B2%7D%5Cright%5D%20%26%3DE%5Cleft%5BX%5E%7B2%7D-2%20E%5BX%5D%20X%2BE%5BX%5D%5E%7B2%7D%5Cright%5D%20%5C%5C%20%26%3DE%5Cleft%5BX%5E%7B2%7D%5Cright%5D-2%20E%5BX%5D%20E%5BX%5D%2BE%5BX%5D%5E%7B2%7D%20%5C%5C%20%26%3DE%5Cleft%5BX%5E%7B2%7D%5Cright%5D-E%5BX%5D%5E%7B2%7D%20%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%20E%5Cleft%5B%28X-E%5BX%5D%29%5E%7B2%7D%5Cright%5D%20%26%3DE%5Cleft%5BX%5E%7B2%7D-2%20E%5BX%5D%20X%2BE%5BX%5D%5E%7B2%7D%5Cright%5D%20%5C%5C%20%26%3DE%5Cleft%5BX%5E%7B2%7D%5Cright%5D-2%20E%5BX%5D%20E%5BX%5D%2BE%5BX%5D%5E%7B2%7D%20%5C%5C%20%26%3DE%5Cleft%5BX%5E%7B2%7D%5Cright%5D-E%5BX%5D%5E%7B2%7D%20%5Cend%7Baligned%7D%0A&id=nn3p8)

其中第二个等式来自期望的线性,以及2.CS229-Prob - 图123相对于外层期望实际上是常数的事实。

性质:

  • 对于任意常数 2.CS229-Prob - 图1242.CS229-Prob - 图125
  • 对于任意常数 2.CS229-Prob - 图1262.CS229-Prob - 图127%5D%3Da%5E2Var%5Bf(X)%5D#card=math&code=Var%5Baf%28X%29%5D%3Da%5E2Var%5Bf%28X%29%5D&id=Mwv2Y)

举例:

计算均匀随机变量2.CS229-Prob - 图128的平均值和方差,任意2.CS229-Prob - 图129,其PDF2.CS229-Prob - 图130%3D%201#card=math&code=p_X%28x%29%3D%201&id=HdiJ7),其他地方为0。

2.CS229-Prob - 图131%20d%20x%3D%5Cint%7B0%7D%5E%7B1%7D%20x%20d%20x%3D%5Cfrac%7B1%7D%7B2%7D%0A#card=math&code=E%5BX%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20x%20f%7BX%7D%28x%29%20d%20x%3D%5Cint%7B0%7D%5E%7B1%7D%20x%20d%20x%3D%5Cfrac%7B1%7D%7B2%7D%0A&id=gjygG)

2.CS229-Prob - 图132%20d%20x%3D%5Cint%7B0%7D%5E%7B1%7D%20x%5E%7B2%7D%20d%20x%3D%5Cfrac%7B1%7D%7B3%7D%0A#card=math&code=E%5Cleft%5BX%5E%7B2%7D%5Cright%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20x%5E%7B2%7D%20f%7BX%7D%28x%29%20d%20x%3D%5Cint%7B0%7D%5E%7B1%7D%20x%5E%7B2%7D%20d%20x%3D%5Cfrac%7B1%7D%7B3%7D%0A&id=pmT6s)

2.CS229-Prob - 图133

举例:

假设对于一些子集2.CS229-Prob - 图134,有2.CS229-Prob - 图135%20%3D%201%5C%7Bx%20%5Cin%20A%5C%7D#card=math&code=g%28x%29%20%3D%201%5C%7Bx%20%5Cin%20A%5C%7D&id=HGJRD),计算2.CS229-Prob - 图136%5D#card=math&code=E%5Bg%28X%29%5D&id=a9CVz)?

离散情况:

2.CS229-Prob - 图137%5D%3D%5Csum%7Bx%20%5Cin%20V%20a%20l(X)%7D%201%5C%7Bx%20%5Cin%20A%5C%7D%20P%7BX%7D(x)%20d%20x%3D%5Csum%7Bx%20%5Cin%20A%7D%20P%7BX%7D(x)%20d%20x%3DP(x%20%5Cin%20A)%0A#card=math&code=E%5Bg%28X%29%5D%3D%5Csum%7Bx%20%5Cin%20V%20a%20l%28X%29%7D%201%5C%7Bx%20%5Cin%20A%5C%7D%20P%7BX%7D%28x%29%20d%20x%3D%5Csum%7Bx%20%5Cin%20A%7D%20P%7BX%7D%28x%29%20d%20x%3DP%28x%20%5Cin%20A%29%0A&id=tb8JT)

连续情况:

2.CS229-Prob - 图138%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%201%5C%7Bx%20%5Cin%20A%5C%7D%20f%7BX%7D(x)%20d%20x%3D%5Cint%7Bx%20%5Cin%20A%7D%20f%7BX%7D(x)%20d%20x%3DP(x%20%5Cin%20A)%0A#card=math&code=E%5Bg%28X%29%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%201%5C%7Bx%20%5Cin%20A%5C%7D%20f%7BX%7D%28x%29%20d%20x%3D%5Cint%7Bx%20%5Cin%20A%7D%20f%7BX%7D%28x%29%20d%20x%3DP%28x%20%5Cin%20A%29%0A&id=VQuAA)

2.6 一些常见的随机变量

离散随机变量

  • 伯努利分布:硬币掷出正面的概率为2.CS229-Prob - 图139(其中:2.CS229-Prob - 图140),如果正面发生,则为1,否则为0。 2.CS229-Prob - 图141%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7Bp%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20p%3D1%7D%20%5C%5C%20%7B1-p%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20p%3D0%7D%5Cend%7Barray%7D%5Cright.%0A#card=math&code=p%28x%29%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7Bp%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20p%3D1%7D%20%5C%5C%20%7B1-p%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20p%3D0%7D%5Cend%7Barray%7D%5Cright.%0A&id=erAYa)
  • 二项式分布:掷出正面概率为2.CS229-Prob - 图142(其中:2.CS229-Prob - 图143)的硬币2.CS229-Prob - 图144次独立投掷中正面的数量。

2.CS229-Prob - 图145%3D%5Cleft(%5Cbegin%7Barray%7D%7Bl%7D%7Bn%7D%20%5C%5C%20%7Bx%7D%5Cend%7Barray%7D%5Cright)%20p%5E%7Bx%7D(1-p)%5E%7Bn-x%7D%0A#card=math&code=p%28x%29%3D%5Cleft%28%5Cbegin%7Barray%7D%7Bl%7D%7Bn%7D%20%5C%5C%20%7Bx%7D%5Cend%7Barray%7D%5Cright%29%20p%5E%7Bx%7D%281-p%29%5E%7Bn-x%7D%0A&id=vhgaT)

  • 几何分布:掷出正面概率为2.CS229-Prob - 图146(其中:2.CS229-Prob - 图147)的硬币第一次掷出正面所需要的次数。
  • 泊松分布:用于模拟罕见事件频率的非负整数的概率分布(其中:2.CS229-Prob - 图148)。

2.CS229-Prob - 图149%3De%5E%7B-%5Clambda%7D%20%5Cfrac%7B%5Clambda%5E%7Bx%7D%7D%7Bx%20!%7D%0A#card=math&code=p%28x%29%3De%5E%7B-%5Clambda%7D%20%5Cfrac%7B%5Clambda%5E%7Bx%7D%7D%7Bx%20%21%7D%0A&id=fGmSY)

连续随机变量

  • 均匀分布:在2.CS229-Prob - 图1502.CS229-Prob - 图151之间每个点概率密度相等的分布(其中:2.CS229-Prob - 图152)。

2.CS229-Prob - 图153%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7B%5Cfrac%7B1%7D%7Bb-a%7D%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20a%20%5Cleq%20x%20%5Cleq%20b%7D%20%5C%5C%20%7B0%7D%20%26%20%7B%5Ctext%20%7B%20otherwise%20%7D%7D%5Cend%7Barray%7D%5Cright.%0A#card=math&code=f%28x%29%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7B%5Cfrac%7B1%7D%7Bb-a%7D%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20a%20%5Cleq%20x%20%5Cleq%20b%7D%20%5C%5C%20%7B0%7D%20%26%20%7B%5Ctext%20%7B%20otherwise%20%7D%7D%5Cend%7Barray%7D%5Cright.%0A&id=kTrcJ)

  • 指数分布:在非负实数上有衰减的概率密度(其中:2.CS229-Prob - 图154)。

2.CS229-Prob - 图155%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7B%5Clambda%20e%5E%7B-%5Clambda%20x%7D%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20x%20%5Cgeq%200%7D%20%5C%5C%20%7B0%7D%20%26%20%7B%5Ctext%20%7B%20otherwise%20%7D%7D%5Cend%7Barray%7D%5Cright.%0A#card=math&code=f%28x%29%3D%5Cleft%5C%7B%5Cbegin%7Barray%7D%7Bll%7D%7B%5Clambda%20e%5E%7B-%5Clambda%20x%7D%7D%20%26%20%7B%5Ctext%20%7B%20if%20%7D%20x%20%5Cgeq%200%7D%20%5C%5C%20%7B0%7D%20%26%20%7B%5Ctext%20%7B%20otherwise%20%7D%7D%5Cend%7Barray%7D%5Cright.%0A&id=WfoEk)

  • 正态分布:又被称为高斯分布。

2.CS229-Prob - 图156%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%20%5Cpi%7D%20%5Csigma%7D%20e%5E%7B-%5Cfrac%7B1%7D%7B2%20%5Csigma%5E%7B2%7D%7D(x-%5Cmu)%5E%7B2%7D%7D%0A#card=math&code=f%28x%29%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%20%5Cpi%7D%20%5Csigma%7D%20e%5E%7B-%5Cfrac%7B1%7D%7B2%20%5Csigma%5E%7B2%7D%7D%28x-%5Cmu%29%5E%7B2%7D%7D%0A&id=h5SOk)

一些随机变量的概率密度函数和累积分布函数的形状如图2所示。
image.png

分布 概率密度函数(PDF)或者概率质量函数(PMF) 均值 方差
2.CS229-Prob - 图158#card=math&code=Bernoulli%28p%29&id=ZKeHJ)(伯努利分布) 2.CS229-Prob - 图159 2.CS229-Prob - 图160 2.CS229-Prob - 图161#card=math&code=p%281-p%29&id=g3nlj)
2.CS229-Prob - 图162#card=math&code=Binomial%28n%2Cp%29&id=eLlFS)(二项式分布) 2.CS229-Prob - 图163%20p%5E%7Bk%7D(1-p)%5E%7Bn-k%7D#card=math&code=%5Cleft%28%5Cbegin%7Barray%7D%7Bl%7D%7Bn%7D%20%5C%5C%20%7Bk%7D%5Cend%7Barray%7D%5Cright%29%20p%5E%7Bk%7D%281-p%29%5E%7Bn-k%7D&id=zCpQr) 其中:2.CS229-Prob - 图164 2.CS229-Prob - 图165 2.CS229-Prob - 图166
2.CS229-Prob - 图167#card=math&code=Geometric%28p%29&id=KXzqL)(几何分布) 2.CS229-Prob - 图168%5E%7Bk-1%7D#card=math&code=p%281-p%29%5E%7Bk-1%7D&id=zI2RQ) 其中:2.CS229-Prob - 图169 2.CS229-Prob - 图170 2.CS229-Prob - 图171
2.CS229-Prob - 图172#card=math&code=Poisson%28%5Clambda%29&id=IX7SY)(泊松分布) 2.CS229-Prob - 图173 其中:2.CS229-Prob - 图174 2.CS229-Prob - 图175 2.CS229-Prob - 图176
2.CS229-Prob - 图177#card=math&code=Uniform%28a%2Cb%29&id=cUhX8)(均匀分布) 2.CS229-Prob - 图178 存在2.CS229-Prob - 图179#card=math&code=x%20%5Cin%20%28a%2Cb%29&id=Cmhjc) 2.CS229-Prob - 图180 2.CS229-Prob - 图181%5E2%7D%7B12%7D#card=math&code=%5Cfrac%7B%28b-a%29%5E2%7D%7B12%7D&id=rFLhh)
2.CS229-Prob - 图182#card=math&code=Gaussian%28%5Cmu%2C%5Csigma%5E2%29&id=D7uXe)(高斯分布) 2.CS229-Prob - 图183%5E%7B2%7D%7D#card=math&code=%5Cfrac%7B1%7D%7B%5Csqrt%7B2%20%5Cpi%7D%20%5Csigma%7D%20e%5E%7B-%5Cfrac%7B1%7D%7B2%20%5Csigma%5E%7B2%7D%7D%28x-%5Cmu%29%5E%7B2%7D%7D&id=AZjKg) 2.CS229-Prob - 图184 2.CS229-Prob - 图185
2.CS229-Prob - 图186#card=math&code=Exponential%28%5Clambda%29&id=FbAme)(指数分布) 2.CS229-Prob - 图1872.CS229-Prob - 图188 2.CS229-Prob - 图189 2.CS229-Prob - 图190

3. 两个随机变量

到目前为止,我们已经考虑了单个随机变量。然而,在许多情况下,在随机实验中,我们可能有不止一个感兴趣的量。例如,在一个我们掷硬币十次的实验中,我们可能既关心2.CS229-Prob - 图191%20%3D#card=math&code=X%28%5Comega%29%20%3D&id=g7sYi)出现的正面数量,也关心2.CS229-Prob - 图192%20%3D#card=math&code=Y%20%28%5Comega%29%20%3D&id=PRCnI)连续最长出现正面的长度。在本节中,我们考虑两个随机变量的设置。

3.1 联合分布和边缘分布

假设我们有两个随机变量,一个方法是分别考虑它们。如果我们这样做,我们只需要2.CS229-Prob - 图193#card=math&code=F_X%20%28x%29&id=cunwX)和2.CS229-Prob - 图194#card=math&code=F_Y%20%28y%29&id=CX1Rm)。但是如果我们想知道在随机实验的结果中,2.CS229-Prob - 图1952.CS229-Prob - 图196同时假设的值,我们需要一个更复杂的结构,称为2.CS229-Prob - 图1972.CS229-Prob - 图198联合累积分布函数,定义如下:

2.CS229-Prob - 图199%3DP(X%20%5Cleq%20x%2CY%20%5Cleq%20y)%0A#card=math&code=F_%7BXY%7D%28x%2Cy%29%3DP%28X%20%5Cleq%20x%2CY%20%5Cleq%20y%29%0A&id=XtwG3)

可以证明,通过了解联合累积分布函数,可以计算出任何涉及到2.CS229-Prob - 图2002.CS229-Prob - 图201的事件的概率。

联合CDF: 2.CS229-Prob - 图202#card=math&code=F_%7BXY%20%7D%28x%2Cy%29&id=S9B4r)和每个变量的联合分布函数2.CS229-Prob - 图203#card=math&code=F_X%28x%29&id=VU9J1)和2.CS229-Prob - 图204#card=math&code=F_Y%20%28y%29&id=Gt0E2)分别由下式关联:

2.CS229-Prob - 图205%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F%7BX%20Y%7D(x%2C%20y)%20d%20y%0A#card=math&code=F%7BX%7D%28x%29%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F_%7BX%20Y%7D%28x%2C%20y%29%20d%20y%0A&id=SxG0Q)

2.CS229-Prob - 图206%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F%7BX%20Y%7D(x%2C%20y)%20dx%0A#card=math&code=F%7BY%7D%28y%29%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F_%7BX%20Y%7D%28x%2C%20y%29%20dx%0A&id=TRnt9)

这里我们称2.CS229-Prob - 图207#card=math&code=FX%28x%29&id=LWGhT)和2.CS229-Prob - 图208#card=math&code=F_Y%20%28y%29&id=oVVkP)为 ![](https://g.yuque.com/gr/latex?F%7BXY%20%7D(x%2Cy)#card=math&code=F_%7BXY%20%7D%28x%2Cy%29&id=SODUm)的边缘累积概率分布函数

性质:

  • 2.CS229-Prob - 图209%20%5Cleq%201#card=math&code=0%20%5Cleq%20F_%7BXY%20%7D%28x%2Cy%29%20%5Cleq%201&id=sKqxn)
  • 2.CS229-Prob - 图210%3D1#card=math&code=%5Clim%20%7Bx%2C%20y%20%5Crightarrow%20%5Cinfty%7D%20F%7BX%20Y%7D%28x%2C%20y%29%3D1&id=kojLk)
  • 2.CS229-Prob - 图211%3D0#card=math&code=%5Clim%20%7Bx%2C%20y%20%5Crightarrow%20-%5Cinfty%7D%20F%7BX%20Y%7D%28x%2C%20y%29%3D0&id=Lu2Js)
  • 2.CS229-Prob - 图212%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F%7BX%20Y%7D(x%2C%20y)#card=math&code=F%7BX%7D%28x%29%3D%5Clim%20%7By%20%5Crightarrow%20%5Cinfty%7D%20F_%7BX%20Y%7D%28x%2C%20y%29&id=Km2Ua)

3.2 联合概率和边缘概率质量函数

如果2.CS229-Prob - 图2132.CS229-Prob - 图214是离散随机变量,那么联合概率质量函数 2.CS229-Prob - 图215由下式定义:

2.CS229-Prob - 图216%3DP(X%3Dx%2CY%3Dy)%0A#card=math&code=p_%7BX%20Y%7D%28x%2Cy%29%3DP%28X%3Dx%2CY%3Dy%29%0A&id=HR6Be)

这里, 对于任意2.CS229-Prob - 图2172.CS229-Prob - 图2182.CS229-Prob - 图219%20%5Cleq%201#card=math&code=0%20%5Cleq%20P%7BXY%7D%20%28x%2Cy%29%20%5Cleq%201&id=JaaWL), 并且 ![](https://g.yuque.com/gr/latex?%5Csum%7Bx%20%5Cin%20V%20a%20l(X)%7D%20%5Csum%7By%20%5Cin%20V%20a%20l(Y)%7D%20P%7BX%20Y%7D(x%2C%20y)%3D1#card=math&code=%5Csum%7Bx%20%5Cin%20V%20a%20l%28X%29%7D%20%5Csum%7By%20%5Cin%20V%20a%20l%28Y%29%7D%20P_%7BX%20Y%7D%28x%2C%20y%29%3D1&id=l0sgU)

两个变量上的联合PMF分别与每个变量的概率质量函数有什么关系?事实上:

2.CS229-Prob - 图220%3D%5Csum%7By%7D%20p%7BX%20Y%7D(x%2C%20y)%0A#card=math&code=p%7BX%7D%28x%29%3D%5Csum%7By%7D%20p_%7BX%20Y%7D%28x%2C%20y%29%0A&id=Ao7Ul)

对于2.CS229-Prob - 图221#card=math&code=p_Y%20%28y%29&id=mS4bs)类似。在这种情况下,我们称2.CS229-Prob - 图222#card=math&code=p_X%28x%29&id=Nrq9f)为2.CS229-Prob - 图223的边际概率质量函数。在统计学中,将一个变量相加形成另一个变量的边缘分布的过程通常称为“边缘化”。

3.3 联合概率和边缘概率密度函数

假设2.CS229-Prob - 图2242.CS229-Prob - 图225是两个连续的随机变量,具有联合分布函数2.CS229-Prob - 图226。在2.CS229-Prob - 图227#card=math&code=F_%7BXY%7D%28x%2Cy%29&id=r88uB)在2.CS229-Prob - 图2282.CS229-Prob - 图229中处处可微的情况下,我们可以定义联合概率密度函数

2.CS229-Prob - 图230%3D%5Cfrac%7B%5Cpartial%5E%7B2%7D%20F%7BX%20Y%7D(x%2C%20y)%7D%7B%5Cpartial%20x%20%5Cpartial%20y%7D%0A#card=math&code=f%7BX%20Y%7D%28x%2C%20y%29%3D%5Cfrac%7B%5Cpartial%5E%7B2%7D%20F_%7BX%20Y%7D%28x%2C%20y%29%7D%7B%5Cpartial%20x%20%5Cpartial%20y%7D%0A&id=IsBiW)

如同在一维情况下,2.CS229-Prob - 图231%5Cnot%3D%20P(X%20%3D%20x%2CY%20%3D%20y)#card=math&code=f_%7BXY%7D%28x%2Cy%29%5Cnot%3D%20P%28X%20%3D%20x%2CY%20%3D%20y%29&id=Pzaj3),而是:

2.CS229-Prob - 图232%20d%20x%20d%20y%3DP((X%2C%20Y)%20%5Cin%20A)%0A#card=math&code=%5Ciint%7Bx%20%5Cin%20A%7D%20f%7BX%20Y%7D%28x%2C%20y%29%20d%20x%20d%20y%3DP%28%28X%2C%20Y%29%20%5Cin%20A%29%0A&id=u0mZV)

请注意,概率密度函数2.CS229-Prob - 图233#card=math&code=f%7BXY%7D%28x%2Cy%29&id=vKQA6)的值总是非负的,但它们可能大于1。尽管如此,可以肯定的是 ![](https://g.yuque.com/gr/latex?%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%20Y%7D(x%2C%20y)%3D1#card=math&code=%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f_%7BX%20Y%7D%28x%2C%20y%29%3D1&id=GdxlF)

与离散情况相似,我们定义:

2.CS229-Prob - 图234%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%20Y%7D(x%2C%20y)%20d%20y%0A#card=math&code=f%7BX%7D%28x%29%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f_%7BX%20Y%7D%28x%2C%20y%29%20d%20y%0A&id=HB6h1)

作为2.CS229-Prob - 图235边际概率密度函数(或边际密度),对于2.CS229-Prob - 图236#card=math&code=f_Y%20%28y%29&id=nAZAa)也类似。

3.4 条件概率分布

条件分布试图回答这样一个问题,当我们知道2.CS229-Prob - 图237必须取某个值2.CS229-Prob - 图238时,2.CS229-Prob - 图239上的概率分布是什么?在离散情况下,给定2.CS229-Prob - 图240的条件概率质量函数是简单的:

2.CS229-Prob - 图241%3D%5Cfrac%7Bp%7BX%20Y%7D(x%2C%20y)%7D%7Bp%7BX%7D(x)%7D%0A#card=math&code=p%7BY%20%7C%20X%7D%28y%20%7C%20x%29%3D%5Cfrac%7Bp%7BX%20Y%7D%28x%2C%20y%29%7D%7Bp_%7BX%7D%28x%29%7D%0A&id=Tvtov)

假设分母不等于0。

在连续的情况下,在技术上要复杂一点,因为连续随机变量的概率等于零。忽略这一技术点,我们通过类比离散情况,简单地定义给定2.CS229-Prob - 图242的条件概率密度为:

2.CS229-Prob - 图243%3D%5Cfrac%7Bf%7BX%20Y%7D(x%2C%20y)%7D%7Bf%7BX%7D(x)%7D%0A#card=math&code=f%7BY%20%7C%20X%7D%28y%20%7C%20x%29%3D%5Cfrac%7Bf%7BX%20Y%7D%28x%2C%20y%29%7D%7Bf_%7BX%7D%28x%29%7D%0A&id=zTiGF)

假设分母不等于0。

3.5 贝叶斯定理

当试图推导一个变量给定另一个变量的条件概率表达式时,经常出现的一个有用公式是贝叶斯定理

对于离散随机变量2.CS229-Prob - 图2442.CS229-Prob - 图245

2.CS229-Prob - 图246%3D%5Cfrac%7B%7BP%7BXY%7D%7D(x%2C%20y)%7D%7BP%7BX%7D(x)%7D%3D%5Cfrac%7BP%7BX%20%7C%20Y%7D(x%20%7C%20y)%20P%7BY%7D(y)%7D%7B%5Csum%7By%5E%7B%5Cprime%7D%20%5Cin%20V%20a%20l(Y)%7D%20P%7BX%20%7C%20Y%7D%5Cleft(x%20%7C%20y%5E%7B%5Cprime%7D%5Cright)%20P%7BY%7D%5Cleft(y%5E%7B%5Cprime%7D%5Cright)%7D%0A#card=math&code=P%7BY%20%7C%20X%7D%28y%20%7C%20x%29%3D%5Cfrac%7B%7BP%7BXY%7D%7D%28x%2C%20y%29%7D%7BP%7BX%7D%28x%29%7D%3D%5Cfrac%7BP%7BX%20%7C%20Y%7D%28x%20%7C%20y%29%20P%7BY%7D%28y%29%7D%7B%5Csum%7By%5E%7B%5Cprime%7D%20%5Cin%20V%20a%20l%28Y%29%7D%20P%7BX%20%7C%20Y%7D%5Cleft%28x%20%7C%20y%5E%7B%5Cprime%7D%5Cright%29%20P_%7BY%7D%5Cleft%28y%5E%7B%5Cprime%7D%5Cright%29%7D%0A&id=i3H31)

对于连续随机变量2.CS229-Prob - 图2472.CS229-Prob - 图248

2.CS229-Prob - 图249%3D%5Cfrac%7Bf%7BX%20Y%7D(x%2C%20y)%7D%7Bf%7BX%7D(x)%7D%3D%5Cfrac%7Bf%7BX%20%7C%20Y%7D(x%20%7C%20y)%20f%7BY%7D(y)%7D%7B%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%20%7C%20Y%7D%5Cleft(x%20%7C%20y%5E%7B%5Cprime%7D%5Cright)%20f%7BY%7D%5Cleft(y%5E%7B%5Cprime%7D%5Cright)%20d%20y%5E%7B%5Cprime%7D%7D%0A#card=math&code=f%7BY%20%7C%20X%7D%28y%20%7C%20x%29%3D%5Cfrac%7Bf%7BX%20Y%7D%28x%2C%20y%29%7D%7Bf%7BX%7D%28x%29%7D%3D%5Cfrac%7Bf%7BX%20%7C%20Y%7D%28x%20%7C%20y%29%20f%7BY%7D%28y%29%7D%7B%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%20%7C%20Y%7D%5Cleft%28x%20%7C%20y%5E%7B%5Cprime%7D%5Cright%29%20f_%7BY%7D%5Cleft%28y%5E%7B%5Cprime%7D%5Cright%29%20d%20y%5E%7B%5Cprime%7D%7D%0A&id=WOdUI)

3.6 独立性

如果对于2.CS229-Prob - 图2502.CS229-Prob - 图251的所有值,2.CS229-Prob - 图252%20%3D%20FX(x)F_Y(y)#card=math&code=F%7BXY%7D%28x%2Cy%29%20%3D%20F_X%28x%29F_Y%28y%29&id=zDeAo),则两个随机变量2.CS229-Prob - 图2532.CS229-Prob - 图254是独立的。等价地,

  • 对于离散随机变量, 对于任意2.CS229-Prob - 图255#card=math&code=x%20%5Cin%20Val%28X%29&id=osvdq), 2.CS229-Prob - 图256#card=math&code=y%20%5Cin%20Val%28Y%29&id=T1B8o) ,2.CS229-Prob - 图257%20%3D%20pX%20(x)p_Y%20(y)#card=math&code=p%7BXY%7D%28x%2Cy%29%20%3D%20p_X%20%28x%29p_Y%20%28y%29&id=omRoV)。
  • 对于离散随机变量, 2.CS229-Prob - 图258%20%3D%20p_Y%20(y)#card=math&code=p_Y%20%7CX%20%28y%7Cx%29%20%3D%20p_Y%20%28y%29&id=qYvXV)当对于任意2.CS229-Prob - 图259#card=math&code=y%20%5Cin%20Val%28Y%29&id=GhykL)且2.CS229-Prob - 图260%20%5Cnot%3D%200#card=math&code=p_X%20%28x%29%20%5Cnot%3D%200&id=Ul8JT)。
  • 对于连续随机变量, 2.CS229-Prob - 图261%20%3D%20fX%20(x)f_Y(y)#card=math&code=f%7BXY%7D%28x%2Cy%29%20%3D%20f_X%20%28x%29f_Y%28y%29&id=Hqj11) 对于任意 2.CS229-Prob - 图262
  • 对于连续随机变量, 2.CS229-Prob - 图263%20%3D%20fY%20(y)#card=math&code=f%7BY%20%7CX%7D%20%28y%7Cx%29%20%3D%20f_Y%20%28y%29&id=LjnXZ) ,当2.CS229-Prob - 图264%5Cnot%20%3D%200#card=math&code=f_X%20%28x%29%5Cnot%20%3D%200&id=RrYhG)对于任意2.CS229-Prob - 图265

非正式地说,如果“知道”一个变量的值永远不会对另一个变量的条件概率分布有任何影响,那么两个随机变量2.CS229-Prob - 图2662.CS229-Prob - 图267是独立的,也就是说,你只要知道2.CS229-Prob - 图268#card=math&code=f%28x%29&id=FuGJg)和2.CS229-Prob - 图269#card=math&code=f%28y%29&id=zRNKz)就知道关于这对变量2.CS229-Prob - 图270#card=math&code=%28X%EF%BC%8CY%29&id=NQhcv)的所有信息。以下引理将这一观察形式化:

引理3.1

如果2.CS229-Prob - 图2712.CS229-Prob - 图272是独立的,那么对于任何2.CS229-Prob - 图273,我们有:

2.CS229-Prob - 图274%3DP(X%20%5Cin%20A)%20P(Y%20%5Cin%20B)%0A#card=math&code=P%28X%20%5Cin%20A%2C%20Y%20%5Cin%20B%29%3DP%28X%20%5Cin%20A%29%20P%28Y%20%5Cin%20B%29%0A&id=KQbbZ)

利用上述引理,我们可以证明如果2.CS229-Prob - 图2752.CS229-Prob - 图276无关,那么2.CS229-Prob - 图277的任何函数都与2.CS229-Prob - 图278的任何函数无关。

3.7 期望和协方差

假设我们有两个离散的随机变量2.CS229-Prob - 图2792.CS229-Prob - 图280并且2.CS229-Prob - 图281是这两个随机变量的函数。那么2.CS229-Prob - 图282的期望值以如下方式定义:

2.CS229-Prob - 图283%5D%20%5Ctriangleq%20%5Csum%7Bx%20%5Cin%20V%20a%20l(X)%7D%20%5Csum%7By%20%5Cin%20V%20a%20l(Y)%7D%20g(x%2C%20y)%20p%7BX%20Y%7D(x%2C%20y)%0A#card=math&code=E%5Bg%28X%2C%20Y%29%5D%20%5Ctriangleq%20%5Csum%7Bx%20%5Cin%20V%20a%20l%28X%29%7D%20%5Csum%7By%20%5Cin%20V%20a%20l%28Y%29%7D%20g%28x%2C%20y%29%20p%7BX%20Y%7D%28x%2C%20y%29%0A&id=IPYX6)

对于连续随机变量2.CS229-Prob - 图2842.CS229-Prob - 图285,类似的表达式是:

2.CS229-Prob - 图286%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20g(x%2C%20y)%20f%7BX%20Y%7D(x%2C%20y)%20d%20x%20d%20y%0A#card=math&code=E%5Bg%28X%2C%20Y%29%5D%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20g%28x%2C%20y%29%20f%7BX%20Y%7D%28x%2C%20y%29%20d%20x%20d%20y%0A&id=tfPpv)

我们可以用期望的概念来研究两个随机变量之间的关系。特别地,两个随机变量的协方差定义为:

2.CS229-Prob - 图287(Y-E%5BY%5D)%5D%0A#card=math&code=%7BCov%7D%5BX%2C%20Y%5D%20%5Ctriangleq%20E%5B%28X-E%5BX%5D%29%28Y-E%5BY%5D%29%5D%0A&id=nESUj)

使用类似于方差的推导,我们可以将它重写为:

2.CS229-Prob - 图288(Y-E%5BY%5D)%5D%20%5C%5C%20%26%3DE%5BX%20Y-X%20E%5BY%5D-Y%20E%5BX%5D%2BE%5BX%5D%20E%5BY%5D%5D%20%5C%5C%20%26%3DE%5BX%20Y%5D-E%5BX%5D%20E%5BY%5D-E%5BY%5D%20E%5BX%5D%2BE%5BX%5D%20E%5BY%5D%5D%20%5C%5C%20%26%3DE%5BX%20Y%5D-E%5BX%5D%20E%5BY%5D%20%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%20%7BCov%7D%5BX%2C%20Y%5D%20%26%3DE%5B%28X-E%5BX%5D%29%28Y-E%5BY%5D%29%5D%20%5C%5C%20%26%3DE%5BX%20Y-X%20E%5BY%5D-Y%20E%5BX%5D%2BE%5BX%5D%20E%5BY%5D%5D%20%5C%5C%20%26%3DE%5BX%20Y%5D-E%5BX%5D%20E%5BY%5D-E%5BY%5D%20E%5BX%5D%2BE%5BX%5D%20E%5BY%5D%5D%20%5C%5C%20%26%3DE%5BX%20Y%5D-E%5BX%5D%20E%5BY%5D%20%5Cend%7Baligned%7D%0A&id=RcMQH)

在这里,说明两种协方差形式相等的关键步骤是第三个等号,在这里我们使用了这样一个事实,即2.CS229-Prob - 图2892.CS229-Prob - 图290实际上是常数,可以被提出来。当2.CS229-Prob - 图291时,我们说2.CS229-Prob - 图2922.CS229-Prob - 图293不相关。

性质:

  • (期望线性) 2.CS229-Prob - 图294%20%2B%20g(X%2CY)%5D%20%3D%20E%5Bf(X%2CY%20)%5D%20%2B%20E%5Bg(X%2CY)%5D#card=math&code=E%5Bf%28X%2CY%20%29%20%2B%20g%28X%2CY%29%5D%20%3D%20E%5Bf%28X%2CY%20%29%5D%20%2B%20E%5Bg%28X%2CY%29%5D&id=aH3ta)
  • 2.CS229-Prob - 图295
  • 如果2.CS229-Prob - 图2962.CS229-Prob - 图297相互独立, 那么 2.CS229-Prob - 图298
  • 如果2.CS229-Prob - 图2992.CS229-Prob - 图300相互独立, 那么 2.CS229-Prob - 图301g(Y%20)%5D%20%3D%20E%5Bf(X)%5DE%5Bg(Y)%5D#card=math&code=E%5Bf%28X%29g%28Y%20%29%5D%20%3D%20E%5Bf%28X%29%5DE%5Bg%28Y%29%5D&id=Qe004).

4. 多个随机变量

上一节介绍的概念和想法可以推广到两个以上的随机变量。特别是,假设我们有2.CS229-Prob - 图302个连续随机变量,2.CS229-Prob - 图303%2CX_2%20(%5Comega)%2C%5Ccdots%20X_n%20(%5Comega)#card=math&code=X%20_1%20%28%5Comega%29%2CX_2%20%28%5Comega%29%2C%5Ccdots%20X_n%20%28%5Comega%29&id=DaoDp)。在本节中,为了表示简单,我们只关注连续的情况,对离散随机变量的推广工作类似。

4.1 基本性质

我们可以定义2.CS229-Prob - 图304联合累积分布函数联合概率密度函数,以及给定2.CS229-Prob - 图3052.CS229-Prob - 图306边缘概率密度函数为:

2.CS229-Prob - 图307%3DP%5Cleft(X%7B1%7D%20%5Cleq%20x%7B1%7D%2C%20X%7B2%7D%20%5Cleq%20x%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%20%5Cleq%20x%7Bn%7D%5Cright)%0A#card=math&code=F%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%3DP%5Cleft%28X%7B1%7D%20%5Cleq%20x%7B1%7D%2C%20X%7B2%7D%20%5Cleq%20x%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%20%5Cleq%20x_%7Bn%7D%5Cright%29%0A&id=DpEUv)

2.CS229-Prob - 图308%3D%5Cfrac%7B%5Cpartial%5E%7Bn%7D%20F%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%7D%7B%5Cpartial%20x%7B1%7D%20%5Cldots%20%5Cpartial%20x%7Bn%7D%7D%0A#card=math&code=f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%3D%5Cfrac%7B%5Cpartial%5E%7Bn%7D%20F%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%7D%7B%5Cpartial%20x%7B1%7D%20%5Cldots%20%5Cpartial%20x_%7Bn%7D%7D%0A&id=RFFcW)

2.CS229-Prob - 图309%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Ccdots%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7D%0A#card=math&code=f%7BX%7B1%7D%7D%5Cleft%28X%7B1%7D%5Cright%29%3D%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20%5Ccdots%20%5Cint%7B-%5Cinfty%7D%5E%7B%5Cinfty%7D%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20d%20x%7B2%7D%20%5Cldots%20d%20x_%7Bn%7D%0A&id=mOLTN)

2.CS229-Prob - 图310%3D%5Cfrac%7Bf%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cdots%20x%7Bn%7D%5Cright)%7D%7Bf%7BX%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%7D%0A#card=math&code=f%7BX%7B1%7D%20%7C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%20%7C%20x%7B2%7D%2C%20%5Cdots%20x%7Bn%7D%5Cright%29%3D%5Cfrac%7Bf%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cdots%20x%7Bn%7D%5Cright%29%7D%7Bf%7BX%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x_%7Bn%7D%5Cright%29%7D%0A&id=s6vfk)

为了计算事件2.CS229-Prob - 图311的概率,我们有:

2.CS229-Prob - 图312%20%5Cin%20A%5Cright)%3D%5Cint%7B%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20%5Cin%20A%7D%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7D%0A#card=math&code=P%5Cleft%28%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20%5Cin%20A%5Cright%29%3D%5Cint%7B%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20%5Cin%20A%7D%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x_%7Bn%7D%0A&id=JkeIY)

链式法则:

从多个随机变量的条件概率的定义中,可以看出:

2.CS229-Prob - 图313%20%26%3Df%5Cleft(x%7Bn%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright)%20f%5Cleft(x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright)%20%5C%5C%20%26%3Df%5Cleft(x%7Bn%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright)%20f%5Cleft(x%7Bn-1%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-2%7D%5Cright)%20f%5Cleft(x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-2%7D%5Cright)%20%5C%5C%20%26%3D%5Ccdots%3Df%5Cleft(x%7B1%7D%5Cright)%20%5Cprod%7Bi%3D2%7D%5E%7Bn%7D%20f%5Cleft(x%7Bi%7D%20%7C%20x%7B1%7D%2C%20%5Cldots%2C%20x%7Bi-1%7D%5Cright)%20%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%20f%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright%29%20%26%3Df%5Cleft%28x%7Bn%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright%29%20f%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright%29%20%5C%5C%20%26%3Df%5Cleft%28x%7Bn%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-1%7D%5Cright%29%20f%5Cleft%28x%7Bn-1%7D%20%7C%20x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-2%7D%5Cright%29%20f%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%20%5Cldots%2C%20x%7Bn-2%7D%5Cright%29%20%5C%5C%20%26%3D%5Ccdots%3Df%5Cleft%28x%7B1%7D%5Cright%29%20%5Cprod%7Bi%3D2%7D%5E%7Bn%7D%20f%5Cleft%28x%7Bi%7D%20%7C%20x%7B1%7D%2C%20%5Cldots%2C%20x_%7Bi-1%7D%5Cright%29%20%5Cend%7Baligned%7D%0A&id=gxAZ8)

独立性:对于多个事件,2.CS229-Prob - 图314,我们说2.CS229-Prob - 图315 是相互独立的,当对于任何子集2.CS229-Prob - 图316,我们有:

2.CS229-Prob - 图317%3D%5Cprod%7Bi%20%5Cin%20S%7D%20P%5Cleft(A%7Bi%7D%5Cright)%0A#card=math&code=P%5Cleft%28%5Ccap%7Bi%20%5Cin%20S%7D%20A%7Bi%7D%5Cright%29%3D%5Cprod%7Bi%20%5Cin%20S%7D%20P%5Cleft%28A%7Bi%7D%5Cright%29%0A&id=hwqSK)

同样,我们说随机变量2.CS229-Prob - 图318是独立的,如果:

2.CS229-Prob - 图319%3Df(x_1)f(x_2)%5Ccdots%20f(x_n)%0A#card=math&code=f%28x_1%2C%5Ccdots%2Cx_n%29%3Df%28x_1%29f%28x_2%29%5Ccdots%20f%28x_n%29%0A&id=P6sSh)

这里,相互独立性的定义只是两个随机变量独立性到多个随机变量的自然推广。

独立随机变量经常出现在机器学习算法中,其中我们假设属于训练集的训练样本代表来自某个未知概率分布的独立样本。为了明确独立性的重要性,考虑一个“坏的”训练集,我们首先从某个未知分布中抽取一个训练样本2.CS229-Prob - 图320%7D%2Cy%5E%7B(1)%7D)#card=math&code=%28x%5E%7B%20%281%29%7D%2Cy%5E%7B%281%29%7D%29&id=hJZ99),然后将完全相同的训练样本的2.CS229-Prob - 图321个副本添加到训练集中。在这种情况下,我们有:

2.CS229-Prob - 图322%7D%2C%20y%5E%7B(1)%7D%5Cright)%2C%20%5Cldots%20.%5Cleft(x%5E%7B(m)%7D%2C%20y%5E%7B(m)%7D%5Cright)%5Cright)%20%5Cneq%20%5Cprod%7Bi%3D1%7D%5E%7Bm%7D%20P%5Cleft(x%5E%7B(i)%7D%2C%20y%5E%7B(i)%7D%5Cright)%0A#card=math&code=P%5Cleft%28%5Cleft%28x%5E%7B%281%29%7D%2C%20y%5E%7B%281%29%7D%5Cright%29%2C%20%5Cldots%20.%5Cleft%28x%5E%7B%28m%29%7D%2C%20y%5E%7B%28m%29%7D%5Cright%29%5Cright%29%20%5Cneq%20%5Cprod%7Bi%3D1%7D%5E%7Bm%7D%20P%5Cleft%28x%5E%7B%28i%29%7D%2C%20y%5E%7B%28i%29%7D%5Cright%29%0A&id=kEjo4)

尽管训练集的大小为2.CS229-Prob - 图323,但这些例子并不独立!虽然这里描述的过程显然不是为机器学习算法建立训练集的明智方法,但是事实证明,在实践中,样本的不独立性确实经常出现,并且它具有减小训练集的“有效大小”的效果。

4.2 随机向量

假设我们有n个随机变量。当把所有这些随机变量放在一起工作时,我们经常会发现把它们放在一个向量中是很方便的…我们称结果向量为随机向量(更正式地说,随机向量是从2.CS229-Prob - 图3242.CS229-Prob - 图325的映射)。应该清楚的是,随机向量只是处理2.CS229-Prob - 图326个随机变量的一种替代符号,因此联合概率密度函数和综合密度函数的概念也将适用于随机向量。

期望:

考虑2.CS229-Prob - 图327中的任意函数。这个函数的期望值 被定义为

2.CS229-Prob - 图328%5D%3D%5Cint%7B%5Cmathbb%7BR%7D%5E%7Bn%7D%7D%20g%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright)%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7DE%5Bg(X)%5D%5C%5C%3D%5Cint%7B%5Cmathbb%7BR%7D%5E%7Bn%7D%7D%20g%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright)%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft(x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright)%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7D%0A#card=math&code=E%5Bg%28X%29%5D%3D%5Cint%7B%5Cmathbb%7BR%7D%5E%7Bn%7D%7D%20g%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright%29%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7DE%5Bg%28X%29%5D%5C%5C%3D%5Cint%7B%5Cmathbb%7BR%7D%5E%7Bn%7D%7D%20g%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x%7Bn%7D%5Cright%29%20f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%20x%7Bn%7D%5Cright%29%20d%20x%7B1%7D%20d%20x%7B2%7D%20%5Cldots%20d%20x%7Bn%7D%0A&id=QAPp6)

其中,2.CS229-Prob - 图329是从2.CS229-Prob - 图3302.CS229-Prob - 图3312.CS229-Prob - 图332个连续积分。如果2.CS229-Prob - 图333是从2.CS229-Prob - 图3342.CS229-Prob - 图335的函数,那么2.CS229-Prob - 图336的期望值是输出向量的元素期望值,即,如果2.CS229-Prob - 图337是:

2.CS229-Prob - 图338%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%7Bg%7B1%7D(x)%7D%20%5C%5C%20%7Bg%7B2%7D(x)%7D%20%5C%5C%20%7B%5Cvdots%7D%20%5C%5C%20%7Bg%7Bm%7D(x)%7D%5Cend%7Barray%7D%5Cright%5D%0A#card=math&code=g%28x%29%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%7Bg%7B1%7D%28x%29%7D%20%5C%5C%20%7Bg%7B2%7D%28x%29%7D%20%5C%5C%20%7B%5Cvdots%7D%20%5C%5C%20%7Bg%7Bm%7D%28x%29%7D%5Cend%7Barray%7D%5Cright%5D%0A&id=xlFxS)

那么,

2.CS229-Prob - 图339%5D%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%7BE%5Cleft%5Bg%7B1%7D(X)%5Cright%5D%7D%20%5C%5C%20%7BE%5Cleft%5Bg%7B2%7D(X)%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5Bg%7Bm%7D(X)%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%0A#card=math&code=E%5Bg%28X%29%5D%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%7BE%5Cleft%5Bg%7B1%7D%28X%29%5Cright%5D%7D%20%5C%5C%20%7BE%5Cleft%5Bg%7B2%7D%28X%29%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5Bg%7Bm%7D%28X%29%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%0A&id=yW9Xx)

协方差矩阵:对于给定的随机向量2.CS229-Prob - 图340,其协方差矩阵2.CS229-Prob - 图3412.CS229-Prob - 图342平方矩阵,其输入由2.CS229-Prob - 图343给出。从协方差的定义来看,我们有:

2.CS229-Prob - 图344(X-E%5BX%5D)%5E%7BT%7D%5Cright%5D%0A%0A%5Cend%7Bequation%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%5Cbegin%7Bequation%7D%0A%5CSigma%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bccc%7D%7B%7BCov%7D%5Cleft%5BX%7B1%7D%2C%20X%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7B%7BCov%7D%5Cleft%5BX%7B1%7D%2C%20X%7Bn%7D%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%26%20%7B%5Cddots%7D%20%26%20%7B%5Cvdots%7D%20%5C%5C%20%7B%7BCov%7D%5Cleft%5BX%7Bn%7D%2C%20X%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7B%7BCov%7D%5Cleft%5BX%7Bn%7D%2C%20X%7Bn%7D%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%5C%5C%0A%0A%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bccc%7D%7BE%5Cleft%5BX%7B1%7D%5E%7B2%7D%5Cright%5D-E%5Cleft%5BX%7B1%7D%5Cright%5D%20E%5Cleft%5BX%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7B1%7D%20X%7Bn%7D%5Cright%5D-E%5Cleft%5BX%7B1%7D%5Cright%5D%20E%5Cleft%5BX%7Bn%7D%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%26%20%7B%5Cddots%7D%20%26%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5BX%7Bn%7D%20X%7B1%7D%5Cright%5D-E%5Cleft%5BX%7Bn%7D%5Cright%5D%20E%5Cleft%5BX%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7Bn%7D%5E%7B2%7D%5Cright%5D-E%5Cleft%5BX%7Bn%7D%5Cright%5D%20E%5Cleft%5BX%7Bn%7D%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%5C%5C%0A%0A%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bccc%7D%7BE%5Cleft%5BX%7B1%7D%5E%7B2%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7B1%7D%20X%7Bn%7D%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%26%20%7B%5Cddots%7D%20%26%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5BX%7Bn%7D%20X%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7Bn%7D%5E%7B2%7D%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D-%5Cleft%5B%5Cbegin%7Barray%7D%7Bccc%7D%7BE%5Cleft%5BX%7B1%7D%5Cright%5D%20E%5Cleft%5BX%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7B1%7D%5Cright%5D%20E%5Cleft%5BX%7Bn%7D%5Cright%5D%7D%20%5C%5C%20%7B%5Cvdots%7D%20%26%20%7B%5Cddots%7D%20%26%20%7B%5Cvdots%7D%20%5C%5C%20%7BE%5Cleft%5BX%7Bn%7D%5Cright%5D%20E%5Cleft%5BX%7B1%7D%5Cright%5D%7D%20%26%20%7B%5Ccdots%7D%20%26%20%7BE%5Cleft%5BX%7Bn%7D%5Cright%5D%20E%5Cleft%5BX%7Bn%7D%5Cright%5D%7D%5Cend%7Barray%7D%5Cright%5D%5C%5C%0A%3DE%5Cleft%5BX%20X%5E%7BT%7D%5Cright%5D-E%5BX%5D%20E%5BX%5D%5E%7BT%7D%3D%5Cldots%3DE%5Cleft%5B%28X-E%5BX%5D%29%28X-E%5BX%5D%29%5E%7BT%7D%5Cright%5D%0A%0A%5Cend%7Bequation%7D%0A%5Cend%7Baligned%7D%0A&id=isyIU)

其中矩阵期望以明显的方式定义。
协方差矩阵有许多有用的属性:

  • 2.CS229-Prob - 图345;也就是说,2.CS229-Prob - 图346是正半定的。
  • 2.CS229-Prob - 图347;也就是说,2.CS229-Prob - 图348是对称的。

4.3 多元高斯分布

随机向量上概率分布的一个特别重要的例子叫做多元高斯或多元正态分布。随机向量2.CS229-Prob - 图349被认为具有多元正态(或高斯)分布,当其具有均值2.CS229-Prob - 图350和协方差矩阵2.CS229-Prob - 图351(其中$ \mathbb{S}_{++}^{n}2.CS229-Prob - 图352n \times n$矩阵的空间)

2.CS229-Prob - 图353%3D%5Cfrac%7B1%7D%7B(2%20%5Cpi)%5E%7Bn%20%2F%202%7D%7C%5CSigma%7C%5E%7B1%20%2F%202%7D%7D%20%5Cexp%20%5Cleft(-%5Cfrac%7B1%7D%7B2%7D(x-%5Cmu)%5E%7BT%7D%20%5CSigma%5E%7B-1%7D(x-%5Cmu)%5Cright)#card=math&code=f%7BX%7B1%7D%2C%20X%7B2%7D%2C%20%5Cldots%2C%20X%7Bn%7D%7D%5Cleft%28x%7B1%7D%2C%20x%7B2%7D%2C%20%5Cldots%2C%20x_%7Bn%7D%20%3B%20%5Cmu%2C%20%5CSigma%5Cright%29%3D%5Cfrac%7B1%7D%7B%282%20%5Cpi%29%5E%7Bn%20%2F%202%7D%7C%5CSigma%7C%5E%7B1%20%2F%202%7D%7D%20%5Cexp%20%5Cleft%28-%5Cfrac%7B1%7D%7B2%7D%28x-%5Cmu%29%5E%7BT%7D%20%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%5Cright%29&id=W1oyg)

我们把它写成2.CS229-Prob - 图354#card=math&code=X%20%5Csim%20%5Cmathcal%7BN%7D%28%5Cmu%2C%20%5CSigma%29&id=vgD0D)。请注意,在2.CS229-Prob - 图355的情况下,它降维成普通正态分布,其中均值参数为2.CS229-Prob - 图356,方差为2.CS229-Prob - 图357

一般来说,高斯随机变量在机器学习和统计中非常有用,主要有两个原因:

首先,在统计算法中对“噪声”建模时,它们非常常见。通常,噪声可以被认为是影响测量过程的大量小的独立随机扰动的累积;根据中心极限定理,独立随机变量的总和将趋向于“看起来像高斯”。

其次,高斯随机变量便于许多分析操作,因为实际中出现的许多涉及高斯分布的积分都有简单的封闭形式解。我们将在本课程稍后遇到这种情况。

5. 其他资源

一本关于CS229所需概率水平的好教科书是谢尔顿·罗斯的《概率第一课》(A First Course on Probability by Sheldon Ross)。