In this section, we will show how to use a naive-Bayes classifier with a continuous (numerical) attribute. This approach can also be used for ordinal variables, although depending on the application, and where the range of possible values is small, it may be more useful to treat ordinals as categorical even though the information of the order will not be used for prediction.
It is common to assume that a continuous attribute follows a Gaussian distribution (also called normal, or bell curve).
- Two parameters define a Gaussian distribution mean:
and standard deviation
通常假设连续属性遵循高斯分布(也称为正态或钟形曲线)
- Probability density function of Gaussian:
- Class conditional likelihood of
th-continuous attribute given class
is
To solve the equation for class conditional likelihood, we only need and
, which are calculated as given earlier.
Example
Let’s assume that the attribute age is not discretized in the following example:
age | credit_rating | buys_computer |
---|---|---|
22 | fair | no |
23 | fair | yes |
35 | excellent | yes |
31 | fair | no |
20 | excellent | no |
38 | excellent | no |
40 | fair | yes |
Let buys_computer be a class label, then and
.
The class conditional mean and variance of attribute age are:
Let be attributes of a future customer, the class conditional probability of this customer is:
This likelihood for each continuous variable can be used directly in the calculation of class conditional likelihood for Naive Bayes, combined with the likelihoods for discrete attributes. Via Bayes theorem, we can then predict the probability of the customer buying a computer.