apriori.py
问题引入:杂货店购物分析,对顾客的消费分析等等

关联分析

Apriori算法 如果一个项集是频繁的,那么它的所有子集也是频繁的;相反,如果一个项集是非频繁的,那么它的所有超集也是非频繁的。常用后一个结论来减少许多计算。

使用Apriori算法发现频繁项

  1. from numpy import *
  2. def loadDataSet():
  3. return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]
  4. def createC1(dataSet):
  5. C1 = []
  6. for transaction in dataSet:
  7. for item in transaction:
  8. if not [item] in C1:
  9. C1.append([item])
  10. C1.sort()
  11. return list(map(frozenset, C1))#use frozen set so we
  12. #can use it as a key in a dict
  13. def scanD(D, Ck, minSupport):
  14. ssCnt = {}
  15. for tid in D:
  16. for can in Ck:
  17. if can.issubset(tid):
  18. if can not in ssCnt: ssCnt[can]=1
  19. else: ssCnt[can] += 1
  20. numItems = float(len(D))
  21. retList = []
  22. supportData = {}
  23. for key in ssCnt:
  24. support = ssCnt[key]/numItems
  25. if support >= minSupport:
  26. retList.insert(0,key)
  27. supportData[key] = support
  28. return retList, supportData

(出现的一些报错https://blog.csdn.net/qq_36366757/article/details/81204492