From the wife's mother squeezed juice to take you know Apache Kylin (end text with book)

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/a934079371/article/details/102774291

First, the opening

Second, what Apache Kylin that?

Third, why use Apache Kylin?

Four, Apache Kylin architecture

Five, Kylin use cases

Six, Kylin + AI outlook

Seven summary

First, the opening

My people have a habit of drinking a glass of juice first brush your teeth after getting up every morning. I only like to eat three kinds of fruit: apples, pears, bananas, so every morning I tangled apples + pears, or bananas + pears, bananas or apples + pears + and so on.

At first I was the only person better, get up early in the morning a few minutes squeeze it very convenient. Behind my girlfriend to drink, I was a little longer to get up early, more fruit, pressing time becomes longer. Behind my father-in-law also drink, but I do not want to get up early, so I bought a juicer.

To the back, my parents have to drink. . I do not want to get up early at the same time do not want to buy a juicer, because I think the drink back home who have to continue to buy, can not afford a.

So when I use the redemption in front of two ordinary juicer with a versatile juicer, multifunction machine this powerful, I just put fruit a day in advance, and then a set, so that it will put all the fruit juice squeezed out, then what we want to drink their own mix on the line.

Unless such types of fruit increased, because of the limited slots to put fruit, also need to increase the time spent. Otherwise, no matter how much people drink, it is enough to drink. Although accounting for a relatively large area, but I can save time and money, it is worth!

The Apache Kylin with this versatile juicer have the same purpose -

Second, what Apache Kylin that?

1.Apache Kylin is a big data analysis framework, can be understood as an evolution of the Hive, but also the OLAP on Hadoop an engine, often used for several warehouse solutions.

2. China's first top-level Apache open source project , Bi Ali's Apache Dubbo, RocketMQ are early.

3. The same amount of data, Mysql level may be hours, minutes and is Hive level, and Apache Kylin is sub-second .

Third, why use Apache Kylin?

1. From the traditional database (Mysql, etc.) to SQL on Hadoop (Hive, SparkSql), etc., will find a problem, query time increases with the amount of data increases.

2. Query the amount of data continues to increase, and the query time does not change, you have to expand the level of the machine, so that parallel computing rate of speed. But the one hand increases the cost of machine resources, on the one hand and increase machine operation and maintenance labor costs.

3.Apache Kylin using pre-calculated approach to the method of space for time to solve these two problems. Let analysts spend more time modeling business, rather than wait for the query results.

Four, Apache Kylin architecture

640?wx_fmt=png

Apache Kylin Definitive Guide (Second Edition)

The leftmost Figure 1. First, there is Hadoop, Hive, Kafka and so on. Here is the item data sources, may be stored in the Hive, or Mysql, or if the flow Kafka.

2. After the data source configured by the intermediate calculation engine to pull, configured in accordance with dimensional modeling analysts, the user may select or be calculated using the Spark MapReduce, and then generates a Cube.

3. The resulting data set stored in HBase Cube on the right, waiting to be inquiries.

4. Rest api top of that query ODBC and JDBC and entrance, as well as the entire pre-calculated Cube user is recessive, so users only need to follow the normal query operations on the line, do not care about technical implementation details.

Five, Kylin use cases

: The two-day there are 100 million 11 transactions

Demand: Queries 11 day double up shop selling goods

1. Traditional solutions: scanning all the records, sales records to find double 11, then press the polymerization of sales of goods, and finally a sort returns. So sales statistics have statistics 100 million data, there are 11 double join 500 million, then the query time have multiplied by five.

2.Kylin solutions: Analyst advance modeled according to two dimensions [time sales, merchandise], calculated SUM (sales amount of goods) and stored. Kylin this will help us put all the goods and the corresponding amount of time in advance figured out, we only need the last row of a sequence can be.

That is, after good statistics, we need only be 11 for the two-day sort of merchandise can be assumed that there are 1000 merchandise, we just need to sort the 1000 record.

Kylin precomputed similar permutations of our junior high school, the number of all combinations = n-th power of 2, n is the number of dimensions.

For example: If there are two kinds of fruit: apples, pears.
It can juice appear four situations: 1. Apple 2. 3. pear apples + pears 4. not

640?wx_fmt=png
cube look

Six, Kylin + AI outlook

After assume Kylin + AI, there will be such a scene. With the increase in the number of data analysts modeling, AI dimensional model can automatically analyze several frequently used, and can take the initiative to recommend to the analyst. Analysts in completing construction of the mold, AI can diagnose the model: Dimensions whether to set reasonable, optimize range and so on.

This feature may have achieved, or are implementing, or about to achieve, specifically refer to "Apache Kylin Definitive Guide: 2nd Edition"

Seven summary

1. We can think of at the beginning of the Kylin talking about Juicers, as more and more people drink juice, freshly squeezed juice needs are also increasing (increasing amount of data), I do not want to get up early , I do not want to spend money to buy a juicer. (Corresponds Why use Kylin)

2. The types of fruit compared to dimensional modeling, and only when the dimension increases the time, Kylin working hours will increase. We timed juicer, with space for time, so I do not get up early.

3. We do not control Juicers how it works, we only need to care about what we can eat the fruit. Cube correspondence generation technology, we only care about the final results.

4. When we often use juicer when juicing opportunity based on the capacity of juice we drink every day, we prefer to judge what kind of fruit, put the number of the right. This is also Kylin + AI results.

Feng brother Comments:Apache Kylin technology itself integrates a number of large data components, but also covers a lot of mainstream technology and the concept of the number of positions to be cut from the Apache Kylin learning to learn all the major components of the data, the number of bins can also learn from the cut it.

Click above to connect to buy
Apache Kylin core R & D team well-written, based on the newer Apache Kylin 2.5, explain the number of dominant positions Apache Kylin from all aspects of architectural design, each module, integration with third-party, open source and secondary development practices, to help you release big data productivity.

Fans Welfare
Message to talk about your understanding of the Apache Kylin, comments go the heart of the three readers will receive a genuine book --- "Apache Kylin Definitive Guide (2nd Edition) " book one.

--end--

Recommended reading:

Fanger Wei code scan
Add friends, notes [ exchange group ]
Pull you to learn routes and rich resource exchange group
640?wx_fmt=png

Guess you like

Origin blog.csdn.net/a934079371/article/details/102774291