Discussion (11) - Apriori algorithm, FP Growth algorithm

1, Apriori algorithm

  Apriori algorithm is commonly used algorithms for mining association rules out the data, which is used to identify the data values ​​occur frequently in the set, find a collection of these patterns will help us make some decisions.

  Apriori algorithm uses an iterative approach, one first searches the candidate set and the corresponding support, a set of pruning is removed below the support, to obtain a set frequency. Then the remaining one set of frequently connect, get frequent two sets of candidates, candidate screening to remove lower than the support of the frequent two sets, get real frequent two sets, and so on, iteration continues until no found frequent output k + 1 set up, the corresponding set of frequent itemsets k is the algorithm.

  This shows that the algorithm is very simple, the i-th iteration comprising calculating the candidate frequency scanning support set for item i, i pruning frequent itemsets to obtain real and connected to generate candidate frequent itemsets three step i + 1.

  A set of entry support (Support) is defined as the ratio of the data contained in the set of records occupied. For example, FIG. 2} {milk support of 4/5. Support for the item is set for, so you can define a minimum support, leaving only meet the minimum support of item sets. Reliability or confidence (confidence) is directed to one such diapers {} -> {} wine association relationship defined. The reliability of this rule is defined as "support ({diapers, Wine}) / support (diapers {})."

  

 

  

   Algorithm steps:

      Input: data set D, support threshold [alpha] [alpha]

   Output: Maximum frequent k item sets

   1) to scan the entire data set, all appeared to give the data set as a candidate frequency. k = 1, 0 frequent set has.

   2) k mining frequent itemsets

     a) calculating the candidate scan data k frequent itemsets support

     b) removing the candidate frequent k itemsets support data set below the threshold value, to obtain k frequent itemsets. If frequent itemsets obtained k is empty, the process directly returns k-1 frequent itemsets as a result of the algorithm, the algorithm ends. If you get frequent k item set only one, simply return a collection of frequent k item sets as a result of the algorithm, the algorithm ends.

     c) based on frequent itemsets k, is connected to generate k + 1 candidate frequent itemsets.

   3) Let k = k + 1, go to step 2.

   As can be seen from the steps of the algorithm, Aprior each iteration algorithm must scan data sets, so large data sets, many types of data when the low efficiency of the algorithm.

2, FP Growth algorithm

   As an algorithm for mining frequent itemsets, Apriori algorithm requires multiple scan data, I / O is a big bottleneck. To solve this problem, FP Tree Algorithm (also known as FP Growth algorithm) used a number of techniques, no matter how much data needs to scan only two data sets, thus improving the efficiency of the algorithm running.

  Reference: http: //www.cnblogs.com/pinard/p/6307064.html

 

Published 121 original articles · won praise 8 · views 30000 +

Guess you like

Origin blog.csdn.net/bylfsj/article/details/104821666