Data Mining - Association Rule Analysis (1) Overview

Record what you have learned. If there is anything wrong, I hope you will point it out.

2.1 Overview of correlation analysis

2.1.1 Definition and application of correlation analysis

1. Association analysis definition: Find out the simultaneous association or sequential relationship between objects or item sets from the data set.

       Application: Shopping basket data analysis
                   Related sales
                   Catalog layout
                   Promotion analysis
                   ’ s ’s to ’ s ’s ’ s ’ s ’ s ​ ​ ​ ​ ​ ​ ​ t ​ ​ ​ ​ ​ ​ ​ ​ ​ ​​​​​​​​​​​​​a>

     Association rule mining is used for knowledge discovery, not prediction, so it is an unsupervised machine learning method.

2.1.2 Affairs and rules

 

Transaction ID: T={t1, t2, ..., tN} Example: TID={1, 2, 3, 4, 5}

项集:I={i1,i2 ,... , id} 例:I={Bread,Milk,Diaper,Beer,Eggs,Coke}

k-item set: {Break, Milk} is a 2-item set, {Bread, Diaper, Beer, Eggs} is a 4-item set

Transaction: a collection of several items. Example: {Bread, Milk} is a transaction

2.1.3 Association rules

1. Association Rule: Generally written in the form of X→Y, it is used to represent the implicit correlation within the data.

2. The strength of association rules is controlled by "three degrees": Support , Confidence , Improvement

3. Support, frequent itemsets

① Support (Support): refers to the percentage of itemsets that {X, Y} appears at the same time in all itemsets, that is, the probability that itemsets contain both X and Y;

②Minimum support (minsupport): That is, the minimum support threshold that the user-specified association rules must meet.

③Frequent Itemset: a non-empty item set whose support is greater than or equal to minsupport

4. Confidence

①Confidence (Confidence): The confidence (Confidence) of the association rule X→Y refers to the ratio of the number of item sets containing X and Y to the number of item sets containing , Y)/support (X) ​ It is easy to know, Confidence (X→Y) =P(Y |X)

②Minimum confidence (minconfidence): That is, the minimum confidence threshold that the user-specified association rules must meet. It reflects the minimum reliability of the association rules.

5. Strong Association Rule: Association rules that satisfy both minimum support (Minsupport) and minimum confidence (Minconfidence) are called strong association rules.

6. Examples of support and credibility calculations

Support and credibility of rule X → Y ​

Support s: the possibility that a transaction contains both {X and Y}

Confidence c: The conditional probability that a transaction containing item X also contains Y

Assuming the minimum support is 0.5 and the minimum credibility is 0.5, the association rules can be obtained

A → C (0.5, 0.67) ​
The first 0.5 represents support, 2 (number of times A and C appear at the same time)/4 (a total of 4 records) ​< /span>
The second 0.67 represents the confidence level, 2 (number of times A and C appear at the same time)/3 (number of times A appears)

C → A (0.5, 1) ​
The first 0.5 represents the support, 2 (number of times A and C appear at the same time)/4 (a total of 4 records) ​< /span>
The second 1 represents the confidence level, 2 (number of times A and C appear at the same time)/2 (number of times C appears)

7. Rule confidence, rule support, antecedent support, and consequent support

    AND total
1 0
X 1 A B R1
0 C D R2
total C1 C2 T

Rule confidence: A/R1 Rule support: A/T

Support for the antecedent: R1/T Support for the consequent: C1/T

example:

  eat Don't eat total
Excellent 60 40 100
Not excellent 66 14 80
total 126 54 180

Rule confidence: 60/100=60% Rule support: 60/180=33.33%

Support for the former term: 100/180 Support for the latter term: 126/180=70%

8. Lift

Lift = rule confidence / consequent support

即 Lift(X→Y) = Confidence(X→Y) / P(Y) = P(Y|X) / P(Y)

This formula is used to measure whether the occurrence of A will increase the probability of occurrence of B.

So there are three possibilities for improvement:

Improvement degree (A→B)>1: represents improvement;

Improvement degree (A→B)=1: represents whether there is improvement or decline;

Improvement degree (A→B)<1: represents a decrease.

9. Association rule mining issues

①The problem of mining association rules is to find association rules whose support and confidence are respectively greater than the minimum threshold given by the user.

② The problem of mining association rules can be divided into two sub-problems: ​Discovering frequent item sets: Find all frequent item sets or the largest frequent item set through Minsupport given by the user.​ Generate association rules: Find association rules in frequent item sets based on the Minconfidence given by the user. The first sub-problem is the focus of research on association rule mining algorithms in recent years.

10. Basic model of association rule mining

2.1.4 Summary

Mainly understand how to calculate support (S), confidence (C), and lift (L)

Guess you like

Origin blog.csdn.net/woxingliu2018/article/details/106269179