For a set Distance-Based Outlier Detection: Nested loop method - 图1 of data points, start with a user-defined parameter Distance-Based Outlier Detection: Nested loop method - 图2 called the distance threshold 距离阈值 that defines a reasonable neighbourhood for each object. For each object o, examine the number of other objects in the r-neighbourhood of o. _If enough of the objects in Distance-Based Outlier Detection: Nested loop method - 图3 are beyond the r-neighbourhood of o, then o should be considered an outlier.
Formally, let Distance-Based Outlier Detection: Nested loop method - 图4 (Distance-Based Outlier Detection: Nested loop method - 图5 ) be a fraction threshold, a user-defined parameter that defines what proportion of objects in D are expected to be within the r-neighbourhood of every non-outlying object非离群对象.
Then an object Distance-Based Outlier Detection: Nested loop method - 图6 is a distance-based outlier Distance-Based Outlier Detection: Nested loop method - 图7 if
image.png
that is, if the proportion of objects in Distance-Based Outlier Detection: Nested loop method - 图9 that are as close as Distance-Based Outlier Detection: Nested loop method - 图10 is no more than Distance-Based Outlier Detection: Nested loop method - 图11.
_Equivalently,
one can check the distance between o and its k-nearest neighbour o, where k is defined by
image.png
In this case, o is an outlier if Distance-Based Outlier Detection: Nested loop method - 图13
N.B. the upper-square brackets here indicate the ceiling function that rounds any-noninteger value up to the whole number above.
_

Computation

The simple nested-loop algorithm below, although theoretically Distance-Based Outlier Detection: Nested loop method - 图14, is usually linear in practice because the inner loop terminates early when there are few outliers.
For any object o, calculate its distance from other objects, and count the number of other objects in the r-neighborhood.
If π∙n other objects are within r distance, terminate the inner loop, Otherwise, o is a DB(r, π) outlier.
image.png

ACTION: Note this pseudo-code, straight from the text, is ambiguous. The “exit” command must pass control out of the inner for loop back to the next iteration of the outer for loop, and thereby skipping the “print” command. If, like me, you find this odd, then an alternative correction would be to insert a test in front of the “print” command, i.e. insert “if not count >= π.n then”



ACTION: Check through this worked example of detecting distance-based outliers. You can watch the video or work through the paper-based working, or both.


Video for distance-based outlier example
Example: Distance-Based Outlier with Nested Loop