metric
criteria

Euclidean distance
Manhattan distance

The Euclidean and Manhattan distances are special cases of Minkowski distance
Minkowski distance

The larger the value of p the more emphasis is placed on the features with large differences in values because these differences are raised to the power of p
index
Jaccard similarity
co-presence (CP), co-absence (CA), presence-absence (PA), and absence- presence (AP).
Cosine Similarity

- the cosine of the inner angle between the two vectors
- especially useful measure of similarity when the descriptive features describing instances in a dataset are related to each other
- All instances are normalized so as to lie on a hypersphere of radius 1.0 with its center at the origin of the feature space
- This normalization is what makes cosine similarity so useful in scenarios in which we are interested in the relative spread of values across a set of descriptive features rather than the magnitudes of the values themselves.
- If both customers use about four times as many SMS messages as VOICE calls, the cosine similarity will be 1; because even though the magnitudes of their feature values are different, the relationship between the feature values for both instances is the same.
Mahalanobis Distance

- measure the similarity between instances with continuous descriptive features
- it allows us to take into account how spread out the instances in a dataset are when judging similarities
- uses covariance to scale distances so that distances along a direction where the dataset is very spread out are scaled down, and distances along directions where the dataset is tightly packed are scaled up
- Mahalanobis distance between B and A will be less than the Mahalanobis distance between C and A

a) equally distributed in all directions b) negative covariance c) positive covariance
in b), Mahalanobis distance(A, B) < Mahalanobis distance(A, C)
- Σ−1, represents the inverse covariance matrix computed across all instances in the dataset
- effects:
- the larger the variance of a feature, the less weight the difference between the values for that feature will contribute to the distance calculation.
- the larger the correlation between two features, the less weight they contribute to the distance
- The rotation and scaling of the axes are the result of the multiplication by the inverse covariance matrix of the dataset (Σ−1)
- 椭圆上的点距离A的Mahalanobis距离一样

