Classifier Models M1 vs. M2

    • Suppose we have 2 classifiers, M1 and M2. Which one is better?
    • Use 10-fold cross-validation to obtain mean error rates for M1 and M2, Comparing classifiers - 图1
    • It may seem intuitive to choose the model with the lowest error rate
    • However, these mean error rates are just estimates of error on the true population of future data cases
    • What if the difference between the 2 error rates is just attributed to chance?
      • Use a test of statistical significance
      • Obtain confidence limits for our error estimates

    Estimating Confidence Intervals: Null Hypothesis

    • Null Hypothesis: M1 & M2 are the same
    • Test the null hypothesis with t-test
      • Use t-distribution with k-1 degree of freedom
    • If we can reject null hypothesis, then
      • we conclude that the difference between M1 & M2 is statistically significant.
      • Chose model with lower error rate
    • Perform 10-fold cross-validation (k=10)
      • For i-th round of 10-fold cross-validation, the same cross partitioning is used to obtain Comparing classifiers - 图2 and Comparing classifiers - 图3.
      • Average over 10 rounds to get Comparing classifiers - 图4 and similarly for Comparing classifiers - 图5
      • t-test computes t-statistic with k-1 degrees of freedom:
        image.png
        where (using the variance for the population, as given in the text):image.png
      • To determine whether M1 and M2 are significantly different, we compute t-statistic and select a significance level.
        • 5% significance levels: The difference between M1 and M2 is significantly different for 95% of population.
        • 1% significance levels: The difference between M1 and M2 is significantly different for 99% of population.
      • Based on t-statistics and significance level, we consult a table for the t-distribution.
      • If the t-statistic we calculated above is _not _between the corresponding value in the table and its negative (i.e. the corresponding value in the table multiplied by -1), then we reject the null hypothesis and conclude that M1 and M2 are significantly different (at the significance level we chose above).
      • Alternatively, if the t-statistic we calculated above is between the corresponding value in the table and its negative, we conclude that M1 and M2 are essentially the same and any difference is attributed to chance.