Step 1: Training phase or learning step: Build a model from the labelled training set.
    Each tuple/sample/record/object/example/instance/feature vector of the training dataset is assumed to belong to a predefined class, as determined by the class label attribute. Ideally, the tuples are a random sample from the full population of data.

    • The set of tuples used for model construction is the training set:

    Two steps: Construct and Evaluate - 图1 and each Two steps: Construct and Evaluate - 图2 is an attribute value and Two steps: Construct and Evaluate - 图3 for some Two steps: Construct and Evaluate - 图4 and Two steps: Construct and Evaluate - 图5 is the class label for Two steps: Construct and Evaluate - 图6.

    • Commonly, each Two steps: Construct and Evaluate - 图7 is assumed to belong to exactly one class Two steps: Construct and Evaluate - 图8
    • In the very common special case of exactly 2 classes, i.e. binary learning, the training classes are called the positive examples Two steps: Construct and Evaluate - 图9 or P and negative examples Two steps: Construct and Evaluate - 图10 or N.
    • The model is represented as classification rules, decision trees, mathematical formulae, or a “black box”. The model can be viewed as a function Two steps: Construct and Evaluate - 图11 that can predict the class label for some unlabelled tuple Two steps: Construct and Evaluate - 图12.
    • For classification models, the built model may be called a classifier.

    Step 2: Use the model to classify unseen objects

    • Need to estimate the accuracy of the model
      • The known labels of a set of independent test samples is compared with the classified results for those same samples from the model
      • Accuracy is the proportion of test set samples that are correctly classified by the model
    • If the accuracy and all other evaluation measures are acceptable, apply the model to classify data objects whose class labels are not known in the world.

    Example:
    The data classification process:
    (a) Learning: Training data is analysed by a classification algorithm. Here, the class label attribute is loan_decision, and the learned model or classifier is represented in the form of classification rules.
    (b) Classification: Test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable, the rules can be applied to the classification of new, unlabelled, data tuples.
    image.png
    image.png