What is Cluster Analysis?

  • Cluster: A collection of data objects
    • similar (or related) to one another within the same group
    • dissimilar (or unrelated) to the objects in other groups
  • Cluster analysis (or clustering, data segmentation, …)
    • Finding similarities between data according to the characteristics found in the data and grouping similar data objects into clusters; discovering groups within the data 找到数据之间的相似性并将相似的数据对象分组在一个聚类中,发现数据集中的聚类。
  • Unsupervised learning: no predefined classes 非监督学习:没有提前定义的类。
  • Typical applications
    • As a stand-alone tool to get insight into data distribution 作为洞察数据分布的独立工具
    • As a preprocessing step for other algorithms 其他算法的预处理

Clustering as a Preprocessing Tool

  • Summarisation:
    • Preprocessing for regression, principal components analysis, classification, and association analysis
  • Compression:
    • Image processing: vector quantisation 图像处理:矢量量化
  • Finding K-nearest Neighbours 找knn算法的邻近。
    • Localising search to one or a small number of clusters
  • Outlier detection 异常值检测
    • Outliers are often viewed as those “far away” from any cluster

Clustering for Data Understanding and Applications

  • Biology: taxonomy of living things: kingdom, phylum, class, order, family, genus and species 生物学:生物分类:界、门、纲、目、科、属、种
  • Information retrieval: document clustering 信息检索:文档聚类
  • Land use: Identification of areas of similar land use in an earth observation database 在地球观测数据库中识别类似土地利用的区域
  • Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs 市场营销:帮助市场营销人员发现其客户群中的不同群体,然后利用这些知识开发有针对性的市场营销计划
  • City-planning: Identifying groups of houses according to their house type, value, and geographical location 城市规划:根据房屋类型、价值和地理位置确定房屋组
  • Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults 地震研究:观测到的震中应该沿着大陆断层聚集
  • Climate: understanding earth climate, find patterns of atmospheric and ocean 气候:了解地球气候,发现大气和海洋的模式
  • Economic Science: market research 经济科学:市场研究