Terminology

A running example**: Let’s assume that you are a owner of a computer shop. You may want to identify which customers buy a computer for targeting your advertising. So you decide to record a customer’s age and credit rating whether the customer buys a computer or not for future predictions.

  • EvidenceBayes' Theorem - 图1: A Bayesian term for observed data tuple, described by measurements made on a set ofBayes' Theorem - 图2attributes.
    • E.g., record of customer’s information such as age and credit rating.
    • Bayes' Theorem - 图3
    • Sometimes the probability Bayes' Theorem - 图4 is also called evidence.
  • HypothesisBayes' Theorem - 图5: A target of the classification. Hypothesis such thatBayes' Theorem - 图6belongs to a specified classBayes' Theorem - 图7.
    • E.g., Bayes' Theorem - 图8 = buy computer, Bayes' Theorem - 图9 = not buy computer
  • Priorprobability,Bayes' Theorem - 图10: the_a prioriprobability _of Bayes' Theorem - 图11
    • E.g., Bayes' Theorem - 图12 = the probability that any given customer will buy a computer regardless of age, _or credit rating_.
  • Likelihood, Bayes' Theorem - 图13: the probability of observing the sample Bayes' Theorem - 图14 given that the hypothesis holds.
    • E.g., Given that a customer, Bayes' Theorem - 图15, will buy a computer, the probability that the customer is 35 years old and has fair credit rating.
  • Posteriorprobability,Bayes' Theorem - 图16: the_a posteriori probability, _that is the probability that the hypothesis holds given the observed dataBayes' Theorem - 图17. 后验概率:后验概率,即假设在给定观测数据的情况下成立的概率。
    • E.g., Given that a customer, Bayes' Theorem - 图18 is 35 years old and has _fair credit rating, _the probability that Bayes' Theorem - 图19 will buy a computer.

The prediction of a class for some new tuple Bayes' Theorem - 图20 for which the class is unknown, is determined by the class which has the highest posterior probability. 类别的预测由具有最高后验概率的类别确定。

Bayes’ Theorem

In many cases, it is easy to estimate the posterior probabilty through estimating the prior and likelihood of given problem from historical data (i.e a training set).

  • E.g., to estimate the prior Bayes' Theorem - 图21, we can count the number of customers who bought a computer and divide it by the total number of customers.
  • E.g., to estimate the likelihood Bayes' Theorem - 图22, we can measure the proportion of customers whose age is 35 and have _fair _credit rating among the customers who bought a computer.
  • E.g., to estimate the evidence Bayes' Theorem - 图23 we can measure the proportion of customers whose age is 35 and have fair _credit rating amongst _all the customers, irrespective of computer-buying.
  • The posterior probability can then be computed from the prior and likelihood through Bayes’ theorem.

Bayes’ theorem provides a way to relate likelihood, prior, and posterior probabilities in the following way, when Bayes' Theorem - 图24
image.png

  • Informally, this equation can be interpreted as

    Posterior = likelihood x prior / evidence

  • Bayes’ theorem is used to predict Bayes' Theorem - 图26 belongs to Bayes' Theorem - 图27 iff the posterior Bayes' Theorem - 图28 is the highest among all other Bayes' Theorem - 图29 for all the k classes. We can also state the probability that Bayes' Theorem - 图30 belongs to Bayes' Theorem - 图31 is Bayes' Theorem - 图32. Because we can give this probability, we call Bayes classification a probablistic classifier.

  • For determining the classification of some Bayes' Theorem - 图33, we are looking to find the Bayes' Theorem - 图34 that maximises Bayes' Theorem - 图35 yet Bayes' Theorem - 图36 is the same for every Bayes' Theorem - 图37, so Bayes' Theorem - 图38 can be ignored in all the calculations as long as we don’t need to know the probability.

Example: with training data
Let’s assume that you are a owner of a computer shop. You may want to identify which customers buy a computer for a targeted advertisement. So the owner decided to record a customers’s age and credit rating no matter the customer buys a computer or not. The following table shows a set of customer records in the computer shop. What is the probability of a customer who is youth and has fair credit rating buying a computer?

age credit buys_computer
youth fair no
youth fair yes
middle_aged excellent yes
middle_aged fair no
youth fair no
middle_aged excellent no
middle_aged fair yes
  • Prior: probability of a customer buying a computer regardless of their information.
    • Bayes' Theorem - 图39
    • Bayes' Theorem - 图40
  • Likelihood
    • Bayes' Theorem - 图41
    • Bayes' Theorem - 图42
  • Evidence
    • Bayes' Theorem - 图43
  • Posterior
    • Bayes' Theorem - 图44
      Bayes' Theorem - 图45
    • Bayes' Theorem - 图46
      Bayes' Theorem - 图47
  • Therefore, the customer would not buy a computer
    • When computing a posterior, the evidence term is the same for all hypothesis classes. Since our goal is to find the highest class, the evidence term is often ignored in practice.

Example: with estimated probabilities

You might be interested in finding out a probability of patients having liver cancer if they are an alcoholic. In this scenario, we discover by using Bayes’ Theorem that “being an alcoholic” is a useful diagnostic examination for liver cancer.

  • Prior:Bayes' Theorem - 图48 means the event “Patient has liver cancer.” Past data tells you that 1% of patients entering your clinic have liver disease.Bayes' Theorem - 图49means the event “Patient does not have liver disease”.
    • Bayes' Theorem - 图50, Bayes' Theorem - 图51
  • Evidence:Bayes' Theorem - 图52could mean the examination that “Patient is an alcoholic.” Five percent of the clinic’s patients are alcoholics.
    • Bayes' Theorem - 图53
  • Likelihood: You may also know from the medical literature that among those patients diagnosed with liver cancer, 70% are alcoholics.
    • Bayes' Theorem - 图54; the probability that a patient is alcoholic, given that they have liver cancer, is 70%.
  • Bayes’ theorem tells you: If the patient is an alcoholic, their chances of having liver cancer is 0.14 (14%). This is much more than the 1% prior probability suggested by past data.
    • Bayes' Theorem - 图55