Study Note about The Practice of Presto & Alluxio in E-Commerce Big Data Platform

This article from a User Talk at PrestoCon. You can look the Slides and Video in web.
Insert picture description here
Some infomation about the Speaker

hello every one,my name is Wenjun Tao,a senior software engineer jd company.
I’m also the core member of our presto team.
It is my honor to share our works on Presto and Alluxio.

He will show people from the four aspects.
Insert picture description here

JD BDP

Introducation of JD.com BDP architecture.

Some scale information about JD BDP.
Insert picture description here
BDP architecture:They deploy hdfs as theiy under layer
Insert picture description here
They deploy hdfs as their under layer distributes file system.
Hive、Presto、Spark serviesas their computing stack.
YARN servies as resource manager and job scheduler and also work a bridge between hdfs and the computing frameworks.

Practice with Presto in BDP

Introducation of Presto and practice in JD BDP

As you know,Presto Cluster consists of coordinator and multiple workers.

  • Coordinator is responsible for reserving query from clients, analysis query and optimizer query.
  • Worker is responsible for it data from their stores such as hdfs ,processes, then returns the results to Coordinator.
  • Then coordinator according the results to the client.

Insert picture description here
They make some modifications since they use YARN as unified resource manager.
Insert picture description here

  1. They deploy a presto cluster on YARN cluster in their production environment,which is easy to maintaining and scaling.

  2. User have high demands on isolation,so they implemented job isolation.
    Job isolation schedule to make the class more stable .In addition, they have an ERP authorization system with presto.

  3. Inorder to allow dynamic management a powerful server was deployed,they can modify configurations of cluster at the runtime and he will give a detail exploration of this system.

  4. According to the query feature,they also developed the query results cache function.

This is their presto on yarn architecutre with this unified resoucre manager can dynamically scale the number of workers.
Insert picture description here
Here is powerful server.
Insert picture description here
Intelligent Scheduler
Insert picture description here
They divide the query into two categories
Insert picture description here
Cache improves query efficiency
Insert picture description here
Last 6 months, there are about 1 million queries every day and each query costs about 5 seconds.
Insert picture description here

Presto & Alluxio Stack

Our user case of Presto & Alluxio

They have deployed a Alluxio in theiry presto production environment two years ago.
Alluxio is virtual distributor file system that unifies data access between storage and computing.
Insert picture description here
Insert picture description here
There are some features contributed by jindong, such as adding new web ui features and improving shell commands.
Insert picture description here
For example, watermark evict strategy.
Insert picture description here
Cache consistency.
Insert picture description here
This scheme is very effective
Insert picture description here
They excuse the same query for many times.
Insert picture description here

Ongoing Exploration

The features we are exploring

Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_44112790/article/details/112753070