In this section, we look at rule-based classifiers, where the learned model is represented as a set of IF-THEN rules.
IF-THEN rules for classification
Rules are a good way of representing information or bits of knowledge. A rule-based classifier uses a set of IF-THEN rules for classification.
How to Represent the knowledge in the form of IF-THEN rules
Example
R: IF age = youth AND student = yes THEN buys_computer = yes
- “IF” part
- Rule antecedent / precondition
- Condition consists of one or more attribute tests that are logically ANDed
- “Then” part
- Rule consequent
- Contains a class prediction
Rule Evaluation
How to measure the quality of a single rule R? Use both coverage and _accuracy. _The key thing to note is that the accuracy of a rule is in proportion to its coverage, not to the size of the dataset as a whole.
- n_covers = number of tuples covered by R, i.e. that satisfy the antecedent of R
- n_correct = number of tuples correctly classified by R i.e that satisfy both the antecedent and the consequent**
- D = training data set
- coverage(R) =n_covers/|D|
- the proportion of tuples that are covered by the rule
- accuracy(R) = n_correct / n_covers
- the proportion of covered tuples that are correctly labelled
- the proportion of covered tuples that are correctly labelled
Evaluation measures other than accuracy can be similarly adapted to using cover just as for accuracy here.
How to measure the quality of a set of rules?
Conventional accuracy (correct tuples as a proportion of all tuples in the dataset) is used to evaluate a rule based classifier comprising a set (or sequence) of rules. Other conventional classification quality measures can also be used this way for a set of rules that together form a classifier.
Conflict Resolution
If more than one rule is triggered, need conflict resolution
- Size ordering: assign the highest priority to the triggering rules that has the “toughest” requirement
- i.e., with the most attribute tests
- Rule ordering: prioritise the rules beforehand by class-based or rule-based
- Class-based ordering: prioritise classes beforehand. If a tuple is classified into multiple classes, choose a class by the class order.
- Rule-based ordering: the rules are organised into one long priority list, according to some measure of rule quality
Default Rule
A default rule can be applied if there is no rule satisfied by a tuple.