It's 2020, don't be superstitious about beer and diapers! The truth of data analysis is here

When it comes to artificial intelligence big data, someone must mention beer and diapers. What's interesting is that it's 2020, and some people believe in this old urban legend. Today we will explain the system.

1. The association rules behind the scenes

Supporting the beer and diaper story is the association rule algorithm. Note: There is nothing wrong with the association rule algorithm itself. This is a means to discover association relationships (note: not causality), and its algorithm principle is very simple and requires very little data, so it has a wide range of applications.

Suppose there are 6 kinds of products, ABCDEF, a customer buys AB to check out, and the cashier issues a small ticket with the name and price of the AB product. We can use 0 or 1 to represent whether the product is available, and simply indicate the small ticket to make:
Insert picture description here

Similarly, if there are 5 orders, it can be expressed as:
Insert picture description here

Note that even if there is no calculation, it can be seen with the naked eye. It seems that the three products of ABC have a high probability of appearing in the order at the same time. This is the basic idea of ​​association rules: find a combination with a high frequency of simultaneous occurrence. However, we need to use some indicators to measure: what is high in the end.

Because there are six products, there are many combinations that appear at the same time: A+B, A+B+C, etc., we start with the simplest two-two combination, and then count three, three, four, four... when calculating the combination , We hope that the higher the frequency of the combination, the better, so we have the concept of support:
Insert picture description here

There may be a sequence of purchases of two commodities, such as A first and then B. In this case, the probability of the user buying B can be calculated in the case of purchasing A, so as to decide whether to push B product, or C or D product after the user purchases A. Therefore, the concept of confidence is introduced:
Insert picture description here

Note that although it is calculated that there is a 75% probability of buying B after buying A, it does not necessarily have to wait until the user buys A before recommending B. For example, in this small example, directly pushing B also has an 80% purchase rate. Obviously, it is not worthwhile to wait until you buy A and then push B. The purchase rate has fallen, so the concept of lift is introduced:
Insert picture description here

The calculations of support, confidence, and lift are very simple. In theory, you only need to set the requirements of support and confidence, and then traverse all combinations in a certain order (such as the Apriori algorithm) to find all eligible combination. The biggest advantage of this method is its simplicity. The calculation method and calculation logic are simple, and the data required is very small. As long as there is order data, data collection is always the number one enemy of the algorithm. A method that requires less data will naturally It is extremely widely used. Especially used in shopping basket analysis.

However, the application belongs to the application, in which supermarket have you really seen beer and diapers piled together? Soon the storyteller discovered this bug, so he changed his words and said: Foreign supermarkets... bullying everyone to go abroad. So what is the truth?

2. Why does it not exist in reality

Unfortunately, beer and diapers do not exist in reality. First of all, beer and diapers are stories made up by the sales of teradata. It perfectly complies with the storytelling principle of "unexpected and reasonable" to sell technology products, so it has spread widely. In actual use, no matter whether it is technical or business, a perfect case like "beer and diaper" does not exist.

From a technical point of view, as an unsupervised method of finding rules, association rules are more suitable for exploratory analysis, and not suitable for direct pointing to a landing SKU combination. Note that the above example is highly condensed, so it looks simple and feasible. For example, beer actually includes many factors such as brand, packaging, price, whether it is on sale, and whether it is near the expiration date. In fact, the SKU is extremely large, and the data of a single SKU is very scattered.

If only the “beer” category is used for correlation, the data obtained is almost instructive. If you get down to a specific SKU with a specific price and a specific shelf life, such as "Corona/Corona beer 330ml*24 bottles of 178 yuan non-discount non-advanced" and "Bao Shi Green Help Diapers S164 Newborn Baby Ultra-thin Breathable Dry Type 155 yuan" The degree of support and confidence between specific SKUs is very low, and it is difficult to reach the level of landing.

This is the fundamental reason why beer and diapers do not appear in supermarkets. There are at least dozens of diapers for a small supermarket with 3.5 meters and 5 doors, and at least dozens of beers. Which one should be put together! Also consider the issue of cold storage of beer. You can never put diapers in the freezer. As for large supermarkets with hundreds of square meters, there are thousands of beer SKUs and thousands of diapers. The shelves are tens of meters long. They can only be stored separately in the beverage area and the maternal and baby products area. These two were put together, and they must have been beaten to death by the supervisor of the mall.

From a business perspective, the association rules, like all mathematical and statistical models, can only show that there is an association relationship between two numbers, and cannot demonstrate any logical relationship in a practical sense. The explanation of "Moms will buy beer for Dad when they buy diapers" is entirely to round the story. From the perspective of a mother who buys diapers, she has 100 reasons to buy more worthwhile things, such as dry tissues and wet tissues. Anyone who has changed diapers to BB knows that the tissue is as fast as splashing water. There are more direct and clear driving forces.

3. How to play in reality

In essence, consumers' decision-making is multi-factorial, and physiological needs, cognitive levels, product prices, materials, advertising, and publicity all affect consumers' final decision-making. Therefore, there are many ways to drive related sales.

The most direct recommendation is based on business rules, which is commonly known as hard rules. For example, some books have upper, middle and lower volumes, and they have no beginning and end when they are opened; some medicines are to be taken together, and eating indiscriminately will kill people. These commodities have a fixed pattern. At this time, you don't need to look at the data, but directly make recommendations based on business rules.

Some are not hard rules, but they are custom made by people. For example, going out for barbecue requires charcoal, stove, sticks, soy sauce, chicken wings, and Coke; beer is eaten with peanuts, crayfish, and cucumber peels; for example, instant noodles are served with ham. This is a soft rule. These soft rules based on user habits can also be recommended tools. For example, when you are a fresh food e-commerce company, you can sell them piece by piece, or you can pack a "autumn fat hot pot set" to pack and sell shabu-shabu ingredients such as lamb rolls, soup base, meatballs, and shiitake mushrooms.

Some were not rules, but the rules were implanted into users' minds through the advertisements of the merchants. For example, various cosmetics for girls, various game skins and suits for boys; such as the most classic: fear of getting angry and drinking XXX; for example, ESP packages are required for data analysis. These are pseudo-rules based on marketing. Although there is no scientific reasoning, but the user can accept it, it can become the recommendation criterion.

Of course, there is the simplest and most rude, preferential rule based on discounts. The simplest is that after the user joins the shopping cart, he finds that he has already bought 400, and there is a coupon worth 500 minus 100. At this time, what users are eager to find is "Where can I buy something that is not tasteless at 100 yuan". It is very likely that she will choose paper towels, shower gel, rice noodle oil and other hard currencies that can be stored.

These are all rules created by the business side's initiative. So please keep it in mind and forward it to the business side: there is no mysterious power that does not require your efforts, just lying quietly in the database waiting to be discovered by your data analyst. In 2020, no one's products are 100% unique. If you want to perform better than others, the key is to work hard.

Of course, if the business side wants to play a subjective initiative, it also needs data support (as shown below):
Insert picture description here

So the question is again, which method do you want to see more? If you are interested, follow the WeChat public account of [Ground Qi School], we will pick a method to share in the next article, so stay tuned.

Author: Chen grounded gas, micro-channel public number: down to earth school. A data analyst with ten years of experience has launched a series of data analysis courses and has more than 20,000 students.

Guess you like

Origin blog.csdn.net/weixin_45534843/article/details/109160767