What do we have so far?
- A feature space with a similarity measure
- This is a classic learning problem!
We can use a standard classification or clustering method
To solve problems in
- Keyword-based association analysis
- Automatic document classification
- Similarity detection
- Link analysis: unusual correlation between entities
- Cluster documents by a common author
- Cluster documents containing information from a common source
- Sequence analysis: predicting a recurring event
- Anomaly detection: find information that violates usual patterns
- Hypertext analysis
- Patterns in anchors/links (for example, anchor text correlations with linked objects)
For applications: news article classification, automatic e-mail filtering, Web page classification, hate blogs, etc.