FPTree achievements and mining

On the first floor to the original https://www.cnblogs.com/pinard/p/6307064.html

structure

Head table entries and FP Tree:
item header table stored in all the frequent itemsets 1, and decreasing in accordance with the support, pointer is a chain, a string of the same elements of frequent itemsets.
Here Insert Picture Description

The data item header table presorted

All frequent follow a set of support sorting, deleting a set below the threshold value, and place these items in Table 1 sets head. Each element in the sorted data which support a single, large elements on the front.
Here Insert Picture Description

Contribute

Root node is null, then a strip is inserted into the data, the node number corresponding to the number of times each of a set occurring on this path (a bit like a trie).
(1) Insert ACEBF
Here Insert Picture Description
(2) is inserted into the ACG, because we pre-sorted before each data, so the use of a common node.
Insertion on the existing path, it can not completely overlap occurs is branched.
Third data inserted when E is the right subtree to null.
Here Insert Picture Description
(3) Insert ACEC
Note that each piece of data after the insertion, the corresponding leaf nodes connected to the back of the corresponding element in the list header table entry.
Here Insert Picture Description

Excavation

After FP Tree built, its excavation. For each entry in a header table element up, found in the FP Tree from this point as all paths leaf node (such as a path below the end of the two D), each point on each path count is set to the count of the leaf node, and obtains all item sets 1 ~ n of the node to a final node.
For example when seeking frequent itemsets D, FP Tree which follows, since D has two leaf nodes, all carried out twice,
Here Insert Picture Description
first modify the path to node D as the count of the leaf node, A and C are from the accumulated obtained, the Tree delete the total number of frequency less than the threshold value nodes (E and G) of which becomes a straight line:
Here Insert Picture Description
we get a = 2 --C = 2-D = 2 paths (two paths combined results), and then seeking to the end of each n D frequent itemsets.
EG is useless if deleted, and then forming the ACD ACEGD two paths, the frequency is 1, split into 1 ~ n itemsets recombined.

advantage

Improved Apriori algorithm scans several times a bottleneck, the algorithm needs to scan only two data sets, for the first time establish item header table, and the second number built

Guess you like

Origin blog.csdn.net/dpengwang/article/details/92846444