\newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \newcommand\rfrac[2]{^{#1}!/_{#2}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert}
距离度量
描述
不同的距离度量对于不同类型的分析是方便的。Flink ML为许多标准的距离度量提供了内置实现。您可以通过实现’DistanceMetric’特性来创建自定义的距离度量。
内置实现
当前FlinkML支持如下指标
度量 | 描述 |
---|---|
Euclidean Distance | $$d(\x, \y) = \sqrt{\sum_{i=1}^n \left(x_i - y_i \right)^2}$$ |
Squared Euclidean Distance | $$d(\x, \y) = \sum_{i=1}^n \left(x_i - y_i \right)^2$$ |
Cosine Similarity | $$d(\x, \y) = 1 - \frac{\x^T \y}{\Vert \x \Vert \Vert \y \Vert}$$ |
Chebyshev Distance | $$d(\x, \y) = \max_{i}\left(\left \vert x_i - y_i \right\vert \right)$$ |
Manhattan Distance | $$d(\x, \y) = \sum_{i=1}^n \left\vert x_i - y_i \right\vert$$ |
Minkowski Distance | $$d(\x, \y) = \left( \sum_{i=1}^{n} \left( x_i - y_i \right)^p \right)^{\rfrac{1}{p}}$$ |
Tanimoto Distance | $$d(\x, \y) = 1 - \frac{\x^T\y}{\Vert \x \Vert^2 + \Vert \y \Vert^2 - \x^T\y}$$ with $\x$ and $\y$ being bit-vectors |
自定义实现
您可以通过实现’DistanceMetric’特性来创建自己的距离度量。
class MyDistance extends DistanceMetric {
override def distance(a: Vector, b: Vector) = ... // your implementation for distance metric }
object MyDistance {
def apply() = new MyDistance()
}
val myMetric = MyDistance()