Algorithmic Considerations

  • Partitioning criteria
    • Single level vs. hierarchical partitioning (often, multi-level hierarchical partitioning is desirable)
  • Separation of clusters
    • Exclusive 排他性(e.g., one customer belongs to only one region) vs. non-exclusive 非排他性(e.g., one document may belong to more than one class)
  • Similarity measure
    • Distance-based (e.g., Euclidean, road network, vector) vs. connectivity-based (e.g., density or contiguity) 距离性/连通性
  • Clustering space
    • Full space (often when low dimensional) vs. subspaces (often in high-dimensional clustering)

Requirements and Challenges

  • Scalability
    • Clustering all the data instead of only on samples
  • Ability to deal with different types of attributes
    • Numerical, binary, categorical, ordinal, linked, and mixture of these
  • Constraint-based clustering
    • User may give inputs on constraints
    • Use domain knowledge to determine input parameters
  • Interpretability and usability
  • Others
    • Discovery of clusters with arbitrary shape
    • Ability to deal with noisy data
    • Incremental clustering and insensitivity to input order
    • High dimensionality