Study Note about The Practice of Presto & Alluxio in E-Commerce Big Data Platform

This article from a User Talk at PrestoCon. You can look the Slides and Video in web.
在这里插入图片描述
Some infomation about the Speaker

hello every one,my name is Wenjun Tao,a senior software engineer jd company.
I’m also the core member of our presto team.
It is my honor to share our works on Presto and Alluxio.

He will show people from the four aspects.
在这里插入图片描述

JD BDP

Introducation of JD.com BDP architecture.

Some scale information about JD BDP.
在这里插入图片描述
BDP architecture:They deploy hdfs as theiy under layer
在这里插入图片描述
They deploy hdfs as their under layer distributes file system.
Hive、Presto、Spark serviesas their computing stack.
YARN servies as resource manager and job scheduler and also work a bridge between hdfs and the computing frameworks.

Practice with Presto in BDP

Introducation of Presto and practice in JD BDP

As you know,Presto Cluster consists of coordinator and multiple workers.

  • Coordinator is responsible for reserving query from clients, analysis query and optimizer query.
  • Worker is responsible for it data from their stores such as hdfs ,processes, then returns the results to Coordinator.
  • Then coordinator according the results to the client.

在这里插入图片描述
They make some modifications since they use YARN as unified resource manager.
在这里插入图片描述

  1. They deploy a presto cluster on YARN cluster in their production environment,which is easy to maintaining and scaling.

  2. User have high demands on isolation,so they implemented job isolation.
    Job isolation schedule to make the class more stable .In addition, they have an ERP authorization system with presto.

  3. Inorder to allow dynamic management a powerful server was deployed,they can modify configurations of cluster at the runtime and he will give a detail exploration of this system.

    扫描二维码关注公众号,回复: 12581740 查看本文章
  4. According to the query feature,they also developed the query results cache function.

This is their presto on yarn architecutre with this unified resoucre manager can dynamically scale the number of workers.
在这里插入图片描述
Here is powerful server.
在这里插入图片描述
Intelligent Scheduler
在这里插入图片描述
They divide the query into two categories
在这里插入图片描述
Cache improves query efficiency
在这里插入图片描述
Last 6 months, there are about 1 million queries every day and each query costs about 5 seconds.
在这里插入图片描述

Presto & Alluxio Stack

Our user case of Presto & Alluxio

They have deployed a Alluxio in theiry presto production environment two years ago.
Alluxio is virtual distributor file system that unifies data access between storage and computing.
在这里插入图片描述
在这里插入图片描述
There are some features contributed by jindong, such as adding new web ui features and improving shell commands.
在这里插入图片描述
For example, watermark evict strategy.
在这里插入图片描述
Cache consistency.
在这里插入图片描述
This scheme is very effective
在这里插入图片描述
They excuse the same query for many times.
在这里插入图片描述

Ongoing Exploration

The features we are exploring

在这里插入图片描述
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weixin_44112790/article/details/112753070