1. Overview 1. Concept:
"关联规则分析"(Association Analysis)是指发现存在于数据集中的关联性,从而描述1个事物中某些属性间的规律
2. Related concepts:
The following table is part of the transaction records of a supermarket (TID is the transaction number, and Items are the products traded):
PID | Items |
---|---|
001 | Cola, Egg, Ham |
002 | Cola,Diaper,Beer |
003 | Cola,Diaper,Beer,Ham |
004 | Diaper,Beer |
By analyzing the data set, the association rules can be found, such as {C ola} → {D iaper} \{Cola\}→\{Diaper\}{ Cola}→{ D i a p e r } means that customers who bought Diaper are likely to buy Beer. Here are some common concepts
concept | Description |
---|---|
Affairs | Each record is called 1 transaction, for example, the above table contains 4 transactions |
item | Each item traded is called 1 item, such as Diaper, Beer, etc. |
Item set | A set containing 0 or more items is called an item set, such as {Beer,Diaper}, {Beer,Cola,Ham} |
k-items set | An item set containing k items is called "k-item set", for example {Cola, Beer, Ham} is called 3-item set |
Support count | When an itemset appears in several transactions, its support count is just a few. For example, {Diaper, Beer} appears in transaction 002/003/004, so its support count is 3 |
Support | The support count is divided by the total number of transactions. For example, in the above example, the total number of transactions is 4, and the support count of {Diaper, Beer} is 3, so the support for {Diaper, Beer} is 75%, which means that 75% of people bought Diaper and Beer |
Frequent itemsets | Itemsets with support greater than or equal to a certain threshold are called frequent itemsets. For example, when the threshold is 50%, since the support of {Diaper, Beer} is 75%, it is a frequent item set |
Antecedents and Afterparts | For the rule {A}→{B}, {A} is called the antecedent and {E} is called the consequent |
Confidence | For the rule {A}→{B}, its confidence is the support count of {A,B} divided by the support count of {A}. For example, the confidence of the rule {Diaper}→{Beer} is 3/3, or 100%, which means that the person who bought Diaper also bought Beer |
Strong association rules | Rules that are greater than or equal to the minimum support threshold and the minimum confidence threshold are called strong association rules. Generally speaking, association rules refer to strong association rules. The ultimate goal of association analysis is to find strong association rules |
Collection of items | If none of the direct supersets of the itemset X has the same support count as it, then X is a closed itemset |
Frequent closed itemsets | If the itemset X is closed, and its support is greater than or equal to the minimum support threshold, then X is a frequent closed itemset |
Maximum frequent itemset | If the itemset X is a frequent itemset, and its direct supersets are not frequent, then X is the maximum frequent itemset |
3. Steps:
①发现频繁项集
②发现强关联规则
4. Common algorithm:
two. Apriori algorithm