apriori.py
问题引入:杂货店购物分析,对顾客的消费分析等等
关联分析
Apriori算法 如果一个项集是频繁的,那么它的所有子集也是频繁的;相反,如果一个项集是非频繁的,那么它的所有超集也是非频繁的。常用后一个结论来减少许多计算。
使用Apriori算法发现频繁项
from numpy import *def loadDataSet():return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]def createC1(dataSet):C1 = []for transaction in dataSet:for item in transaction:if not [item] in C1:C1.append([item])C1.sort()return list(map(frozenset, C1))#use frozen set so we#can use it as a key in a dictdef scanD(D, Ck, minSupport):ssCnt = {}for tid in D:for can in Ck:if can.issubset(tid):if can not in ssCnt: ssCnt[can]=1else: ssCnt[can] += 1numItems = float(len(D))retList = []supportData = {}for key in ssCnt:support = ssCnt[key]/numItemsif support >= minSupport:retList.insert(0,key)supportData[key] = supportreturn retList, supportData
(出现的一些报错https://blog.csdn.net/qq_36366757/article/details/81204492)
