Hierarchical Clustering (AGNES and DIANA) 层次聚类 ; 分层聚类 ; 聚类法 ; 分级群聚 ; 层级聚类

Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:

  • Agglomerative: This is a “bottom up” approach: each observation starts in its own cluster, and a pair of clusters is merged in each step of moving up the hierarchy. 聚集:这是一种“自下而上”的方法:每个观察都从它自己的集群开始,在层次结构向上移动的每个步骤中,一对集群被合并。
  • Divisive: This is a “top down” approach: all observations start in one cluster, and a cluster is split into two at each step of moving down the hierarchy. 分歧:这是一种“自上而下”的方法:所有的观察都在一个集群中开始,一个集群在向下移动的每一步都被分成两个。

In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram. 一般来说,合并和拆分是以贪婪的方式确定的。层次聚类的结果通常以树形图的形式呈现。

Here is an example of the agglomerative and divisive hierarchical clustering approaches on data objects {a,b,c,d,e}.
image.png
Initially, the agglomerative method places each object into a cluster of its own. The clusters are then merged step-by-step according to some criterion. The merging process is repeated until all the objects are eventually merged to one cluster. 最初,凝聚方法将每个对象放入对象自己的一个簇中。然后根据某种标准逐步合并这些集群。重复合并过程,直到所有对象最终合并到一个集群。
The divisive method proceeds in the oppostive direction. All the objects are used to form one initial cluster. The cluster is split according to some principle. The splitting process repeats until each new cluster contains only a single object. 分裂的方法朝着相反的方向发展。所有对象用于形成一个初始集群。集群是根据某种原则划分的。分割过程重复进行,直到每个新的集群只包含一个对象。

AGNES (AGglomerative NESting) 凝聚嵌套

  • Uses the single-link method for determining the distance (dissimilarity) between clusters. Other methods can instead be applied, see Distance between clusters, together with a dissimilarity matrix 直到所有节点都在同一个群集中
  • Merges nodes that have the least dissimilarity 合并差异最小的节点
  • Go on until all nodes are in the same cluster 直到所有节点都在同一个群集中

DIANA (DIvisive ANAlysis) 分歧分析

  • Inverse order of AGNES
  • Go on until each distinct data object forms its own cluster