Distance-Based Outlier Detection: Nested loop method - Example: Distance-Based Outlier with Nested Loop - 《机器学习》

We define outliers as those points with a π or less proportion of the data that is within (or equal to) radius r distance. That is, for an oultier, too little of the data is close enough. Another way of seeing it is that the k-th nearest neighbour is too far away (distance greater than r), _where _k is the number of data points of the data set corresponding to the proportion π . 我们将异常值定义为那些在半径r距离内(或等于半径r距离)具有π或更小比例数据的点。也就是说，对于一个离群值来说，太少的数据就足够接近了。另一种看的方式是第k个最近邻太远(距离大于r)，这里k是比例π对应的数据集的数据点数。
Consider the one-dimensional dataset D = {1, 2, 10}. Use r = 1 and π = 1/3. Use simple arithmetical subtraction of values for distance. 考虑一维数据集D = {1，2，10}。使用r = 1和π = 1/3。使用简单的算术减法计算距离值。
Nested loop algorithm
NB: slight abuse of notation here: i is used here for the value of the data point, not for the position in the dataset as in the algorithm.
outer for loop
For i = 1: dist(1,2) = 1 that is <= r so count = 1. Now count >= π.n = 1/3 3. So exit because i = 1 cannot be an outlier.
For i = 2: dist(2,1) = 1 that is <= r so count = 1. Now count >= 1. So exit because i = 2 cannot be an outlier.
For i =10: dist(10,1) = 9 which is not <=r. dist(10,2) = 8 which is not <=r. Finished the loop over j. _print (10) because 10 is an outlier.
endfor
kth-NN method
k = ceiling(π_ ||D||) = ceiling(1/3 3) = 1.
For data value 1: 1-th NN = 2 and dist(1,2) =1 which is not greater than r so 1 is not an outlier.
For data value 2: 1-th NN = 1 and dist(2,1) = 1 which is not greater than r so 2 is not an outlier.
For data value 10: 1-th NN =2 and dist(10,2) =8 which is greater than r, so 10 is an outlier.