Algorithmic Considerations
- Partitioning criteria
- Single level vs. hierarchical partitioning (often, multi-level hierarchical partitioning is desirable)
- Separation of clusters
- Exclusive 排他性(e.g., one customer belongs to only one region) vs. non-exclusive 非排他性(e.g., one document may belong to more than one class)
- Similarity measure
- Distance-based (e.g., Euclidean, road network, vector) vs. connectivity-based (e.g., density or contiguity) 距离性/连通性
- Clustering space
- Full space (often when low dimensional) vs. subspaces (often in high-dimensional clustering)
Requirements and Challenges
- Scalability
- Clustering all the data instead of only on samples
- Ability to deal with different types of attributes
- Numerical, binary, categorical, ordinal, linked, and mixture of these
- Constraint-based clustering
- User may give inputs on constraints
- Use domain knowledge to determine input parameters
- Interpretability and usability
- Others
- Discovery of clusters with arbitrary shape
- Ability to deal with noisy data
- Incremental clustering and insensitivity to input order
- High dimensionality