Motivation
• Collect sets of keywords or terms that occur frequently together and then find the association or correlation relationships among them 收集经常一起出现的关键词或术语集,然后找到它们之间的关联或相关关系
Association analysis process
- Pre-process the text data by parsing, stemming, removing stop words, etc.
- Invoke association mining algorithms
• Consider each document as a transaction
• View a set of words in the document as a set of items in the transaction
- Term level association mining
- Can extract compound associations as entities or domain concepts (e.g. “New South Wales”, or “big data”). 可以提取复合关联作为实体或领域概念
- Can replace human effort for tagging documents in databases. 可以代替人工标记数据库中的文档
- The number of meaningless results and the execution time is greatly reduced over word-based search or mining
Example: What is being said about Donald Trump this week?