Moved to tears! A boon for data analysts: cross-view granular computing

Author: Wang Wenkai

 

At NetEase Youshu , our goal is to make data analysis a pleasant activity.

Personally, I think an excellent data analysis tool should be able to do it: when users use it for data analysis, they have forgotten the existence of the tool, and can focus on discovering the stories behind the data. This can be called experience flow, which is a state of joy immersed in data analysis.

At the same time, you must have encountered such a situation: sometimes, you encounter a problem that is actually very easy to describe, but when you try to show and answer it in NetEase Youshu, you will find it is actually very difficult. At this point, the experience flow just mentioned is gone, and you need to start thinking about how to reprocess the data. At this time, your analytical thinking will be interrupted instead of focusing on questions and answers. It's frustrating for business analysts who are stuck and don't know how to proceed.

The core of these questions is:

Data needs to be freely aggregated to different granularities, and data of different granularities can be displayed in one table.

This is more abstract, let's take an example of a specific e-commerce scenario

An e-commerce company was established in 2013. One day, a data analyst was using NetEase’s data to analyze data. First of all, she wanted to see the annual sales growth, so in the data, it was easy to make the following histogram. She is satisfied and sales have been growing.

                                                       sales trend

 

At this point, a new idea appeared in her mind, and she wanted to see "Which year's customer contributed to the sales each year?"

First of all, let's analyze her question, what is "contributed by which year's customer"?

So she first made a simple table in the number, put "customer name", "order ID", "order date" into the Y-axis, and got the following chart. It can be seen that the customer Ding Jun has bought a total of 4 orders, and the earliest purchase date is "2013-03-01", then we can consider the customer Ding Jun to be a customer in 2013. So we actually want to calculate the first purchase time of each customer in the number , so that we can know which year the customer is a customer

                                                     Tentative form

 

Now, our goal is to calculate the "time to first purchase for each customer".

How to do it, is the first thought that pops up in your mind is min([order date]), congratulations, you are half done

First we create a calculated metric, as shown below

                                                                                 Earliest purchase date

 

Then add this field to the picture, but find that it is not the effect we expected. The columns "Order Date" and "Earliest Purchase Date" are exactly the same. Why is this happening? This involves the issue of graph aggregation granularity, which I will elaborate on in other subsequent articles.

What I want to say here is that there are "customer name", "order ID", "order date" in the current chart. These dimension fields will determine the aggregation granularity of the chart, so min([order date]), this aggregation method will be Affected by the current graph granularity. So it will be the same.

                                                           Earliest purchase date

So, we now need a way to specify the aggregate granularity (dimension) of this date, independent of the granularity on the chart.

NetEase has added a major feature in this version: Cross Level Calculation, hereinafter referred to as CLC.

CLC, a total of three forms, INCLUDE, EXCLUDE, FIXED, we will uncover them a little bit later.

In the current example, we will need to use FIXED expressions, which look like this:

                                                                   FIXEDexpression

Note that the blue part looks familiar, it is the basic data aggregation formula just now - the minimum value. The red part is a new container, which tells NetEase how to calculate the minimum date in the blue part. For example, in the FIXED formula, it tells NetEase that it is only in the granularity of "customer name". , which minimizes the order date, without considering any other granularity, that is, without considering the granularity on the current chart.

So we created a calculated dimension and wrote the FIXED expression as follows

                                       First purchase time per customer

At this point, drag the "First Purchase Time of Each Customer" into the chart, and that's it, this field will not be affected by the granularity on the chart.

                                                   First purchase time per customer

ok, now everything is ready, now you only need to put "the first purchase time of each customer" in the color bar, you can distinguish how many customers are in 2013 and how many are in 2014 in each year's sales of customers, contributed by

For example, it can be seen from the following figure:

1. 2013 is the first year of this e-commerce business, so all sales must be contributed by customers in that year

2. Among the sales in 2014, 2.243 million were contributed by customers in 2013, and 1.168 million were contributed by customers in 2014

                                               New customer sales contribution

If you don't want to see absolute values, you can change the chart to a stacked percentage bar chart. In this way, you can see the contribution percentage of customers in each year in each year's sales.

This shows the problem:

Although the sales of this e-commerce company are increasing year by year, the contribution rate of new customers is getting lower and lower

Among the sales in 2014, the contribution rate of new customers was only 34.24%

In 2015 sales, the contribution rate of new customers dropped to 9.31%

In the sales in 2016, the contribution rate of new customers was even lower

                                                            New customer sales ratio

This is a small case of NetEase 's cross-view granularity calculation. It is really the tip of the iceberg. Please look forward to more cases in the future!

Original article: NetEase Big Data

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324518382&siteId=291194637