Overview 概述

下图表示训练集tuple线性可分情况。
If the training tuples can be plotted as follows (x-axis and y-axis represent Linearly Separable Case 线性可分 - 图1 and Linearly Separable Case 线性可分 - 图2, respectively), then the dataset is linearly separable:
image.png

  • Because a straight line (hyperplane) can be drawn to separate all the tuples of class +1 from all the tuples of class -1. 用一条直线可以分割开不同类别的训练集。
  • There are infinite lines (hyperplanes) separating the two classes. 有无限种分割方法可以得到同样的分类结果。如上图
    • e.g., all of the dotted lines separate the training tuples exactly the same in the above example.
  • We want to find the best one (the one that minimises classification error on unseen data). 我们想要找到最好的分割方法。判别方法

Maximum marginal hyperplane 最大边界超平面

SVM searches for the hyperplane with the largest margin, i.e., maximum marginal hyperplane (MMH)
SVM搜索具有最大边际的超平面,即最大边际超平面(MMH)

  • Margin: Draw a perpendicular line from the hyperplane to a tuple. The distance between the hyperplane and the tuple is the margin of that hyperplane.

边距:从超平面画一条垂线到元组。超平面和元组之间的距离就是超平面的边界。
image.png

Support Vectors:

  • Support vectors: the training tuples that determine the largest margin hyperplane. In the above example, red-circled tuples are the support vectors of the hyperplane.

支持向量:决定最大边界超平面的训练元组。在上面的例子中,红圈元组是超平面的支持向量。
image.png

Formal definition of hyperplanes and support vectors:

Two dimensional training tuple case:

  • In two dimensional space (Linearly Separable Case 线性可分 - 图6 plane), a hyperplane corresponds to a line, and every hyperplane can be written as:
    • Linearly Separable Case 线性可分 - 图7
  • For a more general representation, if we replace Linearly Separable Case 线性可分 - 图8 and Linearly Separable Case 线性可分 - 图9 by Linearly Separable Case 线性可分 - 图10 and Linearly Separable Case 线性可分 - 图11, then the above hyperplane can be rewritten as:
    • Linearly Separable Case 线性可分 - 图12,
    • where Linearly Separable Case 线性可分 - 图13.
    • We can represent any hyperplane(line) in two dimensional space with Linearly Separable Case 线性可分 - 图14, and Linearly Separable Case 线性可分 - 图15.
  • In the linearly separable case, every training tuple satisfies the following condition:
    • H1 (positive class)
      • If Linearly Separable Case 线性可分 - 图16
    • H2 (negative class):
      • If Linearly Separable Case 线性可分 - 图17
  • Support vector: Therefore, every training tuple that satisfies Linearly Separable Case 线性可分 - 图18 is a support vector.

**

N-dimensional training tuple case:

  • Let Linearly Separable Case 线性可分 - 图19 be a training tuple with class label Linearly Separable Case 线性可分 - 图20 then a separating hyperplane can be written as
  • Linearly Separable Case 线性可分 - 图21
  • where Linearly Separable Case 线性可分 - 图22 is a weight vector and Linearly Separable Case 线性可分 - 图23 a scalar (bias)
  • Linearly Separable Case 线性可分 - 图24

  • The hyperplane defining the sides of the margin:

    • H1: Linearly Separable Case 线性可分 - 图25 for Linearly Separable Case 线性可分 - 图26, and
    • H2: Linearly Separable Case 线性可分 - 图27 for Linearly Separable Case 线性可分 - 图28
  • These two equations can be combined into one equation:
    • Linearly Separable Case 线性可分 - 图29
    • This equation can be solved as a constrained (convex) quadratic optimisation problem that maximises the margins to estimate the weights Linearly Separable Case 线性可分 - 图30 from the training set, and is the SVM version of training the model.

**

Classify test tuple using trained model:

During the testing phase, the trained model classifies a new tuple Linearly Separable Case 线性可分 - 图31 using the rules:

  • Using hyperplane
    • H1 (positive class)
      • If Linearly Separable Case 线性可分 - 图32
      • Then Linearly Separable Case 线性可分 - 图33 will be classified as a positive class
    • H2 (negative class):
      • If Linearly Separable Case 线性可分 - 图34
      • Then Linearly Separable Case 线性可分 - 图35 will be classified as a negative class
  • Alternatively, we can use the support vectorsLinearly Separable Case 线性可分 - 图36, each with class labelLinearly Separable Case 线性可分 - 图37, to classify test tuples. For test tuple Linearly Separable Case 线性可分 - 图38,
    • Linearly Separable Case 线性可分 - 图39
    • where Linearly Separable Case 线性可分 - 图40 is the number of support vectors, and Linearly Separable Case 线性可分 - 图41 and Linearly Separable Case 线性可分 - 图42 are automatically determined by the optimisation/training algorithm.
    • If the sign of Linearly Separable Case 线性可分 - 图43 is positive then Linearly Separable Case 线性可分 - 图44 is classified as H1, otherwise H2.
    • Note that we need to keep only the support vectors for testing
      • This fact will be used in the non-linearly separable case

**

Why Is SVM effective on high-dimensional data?

  • The complexity of a trained classifier is characterised by the number of support vectors rather than the dimensionality of the data
  • The support vectors are the essential or critical training examples —they lie closest to the decision boundary (MMH)
  • If all other training examples are removed and the training is repeated, the same separating hyperplane would be found from the support vectors alone
  • The number of support vectors found can be used to compute an (upper) bound on the expected error rate of the SVM classifier, which is independent of the data dimensionality
  • Thus, an SVM with a small number of support vectors can have good generalisation, even when the dimensionality of the data is high