Model Evaluation and Selection - Comparing classifiers - 《机器学习》

Classifier Models M1 vs. M2

Suppose we have 2 classifiers, M1 and M2. Which one is better?
Use 10-fold cross-validation to obtain mean error rates for M1 and M2,
It may seem intuitive to choose the model with the lowest error rate
However, these mean error rates are just estimates of error on the true population of future data cases
What if the difference between the 2 error rates is just attributed to chance?
- Use a test of statistical significance
- Obtain confidence limits for our error estimates

Estimating Confidence Intervals: Null Hypothesis

Null Hypothesis: M1 & M2 are the same
Test the null hypothesis with t-test
- Use t-distribution with k-1 degree of freedom
If we can reject null hypothesis, then
- we conclude that the difference between M1 & M2 is statistically significant.
- Chose model with lower error rate
Perform 10-fold cross-validation (k=10)
- For i-th round of 10-fold cross-validation, the same cross partitioning is used to obtain and .
- Average over 10 rounds to get and similarly for
- t-test computes t-statistic with k-1 degrees of freedom:
  
  where (using the variance for the population, as given in the text):
- To determine whether M1 and M2 are significantly different, we compute t-statistic and select a significance level.
  - 5% significance levels: The difference between M1 and M2 is significantly different for 95% of population.
  - 1% significance levels: The difference between M1 and M2 is significantly different for 99% of population.
- Based on t-statistics and significance level, we consult a table for the t-distribution.
  - We need to find the t -distribution value corresponding to k − 1 degrees of freedom (or 9 degrees of freedom for our example) from the table (Two-sided).
    Hidden from students:URL T-distribution tableURL
- If the t-statistic we calculated above is _not _between the corresponding value in the table and its negative (i.e. the corresponding value in the table multiplied by -1), then we reject the null hypothesis and conclude that M1 and M2 are significantly different (at the significance level we chose above).
- Alternatively, if the t-statistic we calculated above is between the corresponding value in the table and its negative, we conclude that M1 and M2 are essentially the same and any difference is attributed to chance.