Density-Based Clustering
Model clusters as dense regions in the data space, separated by sparse regions. Does not attempt to assign every object to a cluster; many may be left out as “noise”. 将集群建模为数据空间中由稀疏区域分隔的密集区域。不尝试将每个对象分配给一个集群;许多可能被遗漏为“噪音”。
- Major features:
- Discovers clusters of arbitrary shape 发现任意形状的簇
- Partitioning and hierarchical methods are designed to find spherical-shaped (convex) clusters 分区和分层方法被设计来寻找球形(凸形)聚类
- Handles noise 处理噪音
- One scan through the data only 只扫描一次数据
- Needs parameters to define threshold dense-ness (but not for the number of clusters) 需要参数来定义阈值密集度(但不是针对集群的数量)
- Discovers clusters of arbitrary shape 发现任意形状的簇
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) 带噪声应用的基于密度的空间聚类
- Density of an object
对象o的密度: the number of objects close to
接近对象o的对象数量
- Core objects(核心对象): Objects that have a dense neighbourhood 具有密集邻域的对象
- DBSCAN: connects core objects and their neighbourhoods to form dense regions as cluster 连接核心对象和它们的邻居,形成密集的区域作为集群
- Two parameters:
: {
|
}
- Directly density-reachable: A point
is directly density-reachable from a core point
if
is within the
-neighbourhood of
如果q点是核心点,且p点在q点的
-neighbourhood中, 则p点是q的直接密度可达。
- By definition, no points are directly density-reachable from a non-core point.
- Density-reachable:
is density-reachable from a core point
if there is a chain of objects
such that
,
and
is directly density-reachable from
with respect to
and _MinPts. p和q是 密度可达的条件是:对象p和对象q 可以有锁链相连,其中锁链的相邻对象可 直接密度可达,且遵守最大半径、最小密度。
- Density-connected: Two objects
are density-connected if
Definition of Cluster in DBSCAN
- All points within the cluster
are mutually density-connected,聚类中的所有点都互相密度连接 and
- There is no point outside
that is density-connected to a point inside
. 没有聚类以外的点连接到聚类之内的点。
Example of density-reachable and density-connected:
> Let be the radius of the circles and MinPts 3.
> are core objects.
> Object is directly density-reachable from
.
> Object is directly density-reachable from
and vice versa.
> Object is density-reachable from
because
is directly density reachable from
and
is directly density-reachable from
. However,
is not density reachable from
because
is not a core object.
> and
are density-reachable from
> is density-reachable from
.
> ,
, and
are all density-connected.
DBSCAN algorithmDBSCAN worked example