apriori.py
问题引入:杂货店购物分析,对顾客的消费分析等等
关联分析
Apriori算法 如果一个项集是频繁的,那么它的所有子集也是频繁的;相反,如果一个项集是非频繁的,那么它的所有超集也是非频繁的。常用后一个结论来减少许多计算。
使用Apriori算法发现频繁项
from numpy import *
def loadDataSet():
return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]
def createC1(dataSet):
C1 = []
for transaction in dataSet:
for item in transaction:
if not [item] in C1:
C1.append([item])
C1.sort()
return list(map(frozenset, C1))#use frozen set so we
#can use it as a key in a dict
def scanD(D, Ck, minSupport):
ssCnt = {}
for tid in D:
for can in Ck:
if can.issubset(tid):
if can not in ssCnt: ssCnt[can]=1
else: ssCnt[can] += 1
numItems = float(len(D))
retList = []
supportData = {}
for key in ssCnt:
support = ssCnt[key]/numItems
if support >= minSupport:
retList.insert(0,key)
supportData[key] = support
return retList, supportData
(出现的一些报错https://blog.csdn.net/qq_36366757/article/details/81204492)