Machine learning association rule analysis


1. Overview 1. Concept:

"关联规则分析"(Association Analysis)是指发现存在于数据集中的关联性,从而描述1个事物中某些属性间的规律

2. Related concepts:

The following table is part of the transaction records of a supermarket (TID is the transaction number, and Items are the products traded):

PID Items
001 Cola, Egg, Ham
002 Cola,Diaper,Beer
003 Cola,Diaper,Beer,Ham
004 Diaper,Beer

By analyzing the data set, the association rules can be found, such as {C ola} → {D iaper} \{Cola\}→\{Diaper\}{ Cola}{ D i a p e r } means that customers who bought Diaper are likely to buy Beer. Here are some common concepts

concept Description
Affairs Each record is called 1 transaction, for example, the above table contains 4 transactions
item Each item traded is called 1 item, such as Diaper, Beer, etc.
Item set A set containing 0 or more items is called an item set, such as {Beer,Diaper}, {Beer,Cola,Ham}
k-items set An item set containing k items is called "k-item set", for example {Cola, Beer, Ham} is called 3-item set
Support count When an itemset appears in several transactions, its support count is just a few. For example, {Diaper, Beer} appears in transaction 002/003/004, so its support count is 3
Support The support count is divided by the total number of transactions. For example, in the above example, the total number of transactions is 4, and the support count of {Diaper, Beer} is 3, so the support for {Diaper, Beer} is 75%, which means that 75% of people bought Diaper and Beer
Frequent itemsets Itemsets with support greater than or equal to a certain threshold are called frequent itemsets. For example, when the threshold is 50%, since the support of {Diaper, Beer} is 75%, it is a frequent item set
Antecedents and Afterparts For the rule {A}→{B}, {A} is called the antecedent and {E} is called the consequent
Confidence For the rule {A}→{B}, its confidence is the support count of {A,B} divided by the support count of {A}. For example, the confidence of the rule {Diaper}→{Beer} is 3/3, or 100%, which means that the person who bought Diaper also bought Beer
Strong association rules Rules that are greater than or equal to the minimum support threshold and the minimum confidence threshold are called strong association rules. Generally speaking, association rules refer to strong association rules. The ultimate goal of association analysis is to find strong association rules
Collection of items If none of the direct supersets of the itemset X has the same support count as it, then X is a closed itemset
Frequent closed itemsets If the itemset X is closed, and its support is greater than or equal to the minimum support threshold, then X is a frequent closed itemset
Maximum frequent itemset If the itemset X is a frequent itemset, and its direct supersets are not frequent, then X is the maximum frequent itemset

3. Steps:

①发现频繁项集
②发现强关联规则

4. Common algorithm:
Insert picture description here
two. Apriori algorithm

Guess you like

Origin blog.csdn.net/weixin_46131409/article/details/113807201