Density-Based Clustering
Model clusters as dense regions in the data space, separated by sparse regions. Does not attempt to assign every object to a cluster; many may be left out as “noise”. 将集群建模为数据空间中由稀疏区域分隔的密集区域。不尝试将每个对象分配给一个集群;许多可能被遗漏为“噪音”。

  • Major features:
    • Discovers clusters of arbitrary shape 发现任意形状的簇
      • Partitioning and hierarchical methods are designed to find spherical-shaped (convex) clusters 分区和分层方法被设计来寻找球形(凸形)聚类
    • Handles noise 处理噪音
    • One scan through the data only 只扫描一次数据
    • Needs parameters to define threshold dense-ness (but not for the number of clusters) 需要参数来定义阈值密集度(但不是针对集群的数量)

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) 带噪声应用的基于密度的空间聚类

  • Density of an object Density-Based Clustering - 图1 对象o的密度: the number of objects close to Density-Based Clustering - 图2 接近对象o的对象数量
  • Core objects(核心对象): Objects that have a dense neighbourhood 具有密集邻域的对象
  • DBSCAN: connects core objects and their neighbourhoods to form dense regions as cluster 连接核心对象和它们的邻居,形成密集的区域作为集群
  • Two parameters:
    • Density-Based Clustering - 图3: Maximum radius of the neighbourhood 邻居的最大半径
    • MinPts: Minimum number of points in an Density-Based Clustering - 图4-neighbourhood of that point 该点邻域内的最小点数
  • Density-Based Clustering - 图5: {Density-Based Clustering - 图6|Density-Based Clustering - 图7}
    • Number of neighbourhood objects including Density-Based Clustering - 图8 包含p点的邻居对象数量
    • If Density-Based Clustering - 图9 MinPts, then Density-Based Clustering - 图10 is core object
    • Density-Based Clustering - 图11 is a data set.
  • Directly density-reachable: A point Density-Based Clustering - 图12 is directly density-reachable from a core point Density-Based Clustering - 图13 if Density-Based Clustering - 图14 is within the Density-Based Clustering - 图15-neighbourhood of Density-Based Clustering - 图16 如果q点是核心点,且p点在q点的Density-Based Clustering - 图17-neighbourhood中, 则p点是q的直接密度可达。
    • By definition, no points are directly density-reachable from a non-core point.
  • Density-reachable: Density-Based Clustering - 图18 is density-reachable from a core point Density-Based Clustering - 图19 if there is a chain of objects Density-Based Clustering - 图20 such that Density-Based Clustering - 图21, Density-Based Clustering - 图22 and Density-Based Clustering - 图23 is directly density-reachable from Density-Based Clustering - 图24 with respect to Density-Based Clustering - 图25 and _MinPts. p和q是 密度可达的条件是:对象p和对象q 可以有锁链相连,其中锁链的相邻对象可 直接密度可达,且遵守最大半径、最小密度。
    • By definition, all the Density-Based Clustering - 图26 s other than Density-Based Clustering - 图27 are core points 对象p必须都是核心点
  • Density-connected: Two objects Density-Based Clustering - 图28 are density-connected if
    • there is an object Density-Based Clustering - 图29 such that both Density-Based Clustering - 图30 and Density-Based Clustering - 图31 are density-reachable from Density-Based Clustering - 图32 with respect to Density-Based Clustering - 图33 and MinPts. p1 和 p2 必须可以链接到同一个点q
    • By definition, Density-Based Clustering - 图34 must be a core point, and Density-Based Clustering - 图35 and Density-Based Clustering - 图36 must be in the neighbourhood of a core point, but may not be core points themselves. q必须是核心点, 但p1 和 p2 不必是。

Definition of Cluster in DBSCAN

A subset Density-Based Clustering - 图37 is a cluster if

  • All points within the cluster Density-Based Clustering - 图38 are mutually density-connected,聚类中的所有点都互相密度连接 and
  • There is no point outside Density-Based Clustering - 图39 that is density-connected to a point inside Density-Based Clustering - 图40. 没有聚类以外的点连接到聚类之内的点。

Example of density-reachable and density-connected:
image.png
> Let Density-Based Clustering - 图42 be the radius of the circles and MinPts 3.
> Density-Based Clustering - 图43 are core objects.
> Object Density-Based Clustering - 图44 is directly density-reachable from Density-Based Clustering - 图45.
> Object Density-Based Clustering - 图46 is directly density-reachable from Density-Based Clustering - 图47 and vice versa.
> Object Density-Based Clustering - 图48 is density-reachable from Density-Based Clustering - 图49 because Density-Based Clustering - 图50 is directly density reachable from Density-Based Clustering - 图51 and Density-Based Clustering - 图52 is directly density-reachable from Density-Based Clustering - 图53. However, Density-Based Clustering - 图54 is not density reachable from Density-Based Clustering - 图55 because Density-Based Clustering - 图56 is not a core object.
> Density-Based Clustering - 图57 and Density-Based Clustering - 图58 are density-reachable from Density-Based Clustering - 图59
> Density-Based Clustering - 图60 is density-reachable from Density-Based Clustering - 图61.
> Density-Based Clustering - 图62, Density-Based Clustering - 图63, and Density-Based Clustering - 图64 are all density-connected.
DBSCAN algorithm
image.png
Density-Based Clustering - 图66DBSCAN worked example