Motivation

• Collect sets of keywords or terms that occur frequently together and then find the association or correlation relationships among them 收集经常一起出现的关键词或术语集,然后找到它们之间的关联或相关关系

Association analysis process

  • Pre-process the text data by parsing, stemming, removing stop words, etc.
  • Invoke association mining algorithms

• Consider each document as a transaction
• View a set of words in the document as a set of items in the transaction

  • Term level association mining
    • Can extract compound associations as entities or domain concepts (e.g. “New South Wales”, or “big data”). 可以提取复合关联作为实体或领域概念
    • Can replace human effort for tagging documents in databases. 可以代替人工标记数据库中的文档
    • The number of meaningless results and the execution time is greatly reduced over word-based search or mining

Example: What is being said about Donald Trump this week?