On the first floor to the original https://www.cnblogs.com/pinard/p/6307064.html
structure
Head table entries and FP Tree:
item header table stored in all the frequent itemsets 1, and decreasing in accordance with the support, pointer is a chain, a string of the same elements of frequent itemsets.
The data item header table presorted
All frequent follow a set of support sorting, deleting a set below the threshold value, and place these items in Table 1 sets head. Each element in the sorted data which support a single, large elements on the front.
Contribute
Root node is null, then a strip is inserted into the data, the node number corresponding to the number of times each of a set occurring on this path (a bit like a trie).
(1) Insert ACEBF
(2) is inserted into the ACG, because we pre-sorted before each data, so the use of a common node.
Insertion on the existing path, it can not completely overlap occurs is branched.
Third data inserted when E is the right subtree to null.
(3) Insert ACEC
Note that each piece of data after the insertion, the corresponding leaf nodes connected to the back of the corresponding element in the list header table entry.
Excavation
After FP Tree built, its excavation. For each entry in a header table element up, found in the FP Tree from this point as all paths leaf node (such as a path below the end of the two D), each point on each path count is set to the count of the leaf node, and obtains all item sets 1 ~ n of the node to a final node.
For example when seeking frequent itemsets D, FP Tree which follows, since D has two leaf nodes, all carried out twice,
first modify the path to node D as the count of the leaf node, A and C are from the accumulated obtained, the Tree delete the total number of frequency less than the threshold value nodes (E and G) of which becomes a straight line:
we get a = 2 --C = 2-D = 2 paths (two paths combined results), and then seeking to the end of each n D frequent itemsets.
EG is useless if deleted, and then forming the ACD ACEGD two paths, the frequency is 1, split into 1 ~ n itemsets recombined.
advantage
Improved Apriori algorithm scans several times a bottleneck, the algorithm needs to scan only two data sets, for the first time establish item header table, and the second number built