What do we have so far?
- A feature space with a similarity measure
- This is a classic learning problem!
We can use a standard classification or clustering method
To solve problems in
- Keyword-based association analysis
- Automatic document classification
- Similarity detection
- Link analysis: unusual correlation between entities- Cluster documents by a common author
- Cluster documents containing information from a common source
 
- Sequence analysis: predicting a recurring event
- Anomaly detection: find information that violates usual patterns
- Hypertext analysis- Patterns in anchors/links (for example, anchor text correlations with linked objects)
 
For applications: news article classification, automatic e-mail filtering, Web page classification, hate blogs, etc.
 
                         
                                

