This article from a User Talk at PrestoCon. You can look the Slides and Video in web.
Some infomation about the Speaker
hello every one,my name is Wenjun Tao,a senior software engineer jd company.
I’m also the core member of our presto team.
It is my honor to share our works on Presto and Alluxio.
He will show people from the four aspects.
JD BDP
Introducation of JD.com BDP architecture.
Some scale information about JD BDP.
BDP architecture:They deploy hdfs as theiy under layer
They deploy hdfs as their under layer distributes file system.
Hive、Presto、Spark serviesas their computing stack.
YARN servies as resource manager and job scheduler and also work a bridge between hdfs and the computing frameworks.
Practice with Presto in BDP
Introducation of Presto and practice in JD BDP
As you know,Presto Cluster consists of coordinator and multiple workers.
- Coordinator is responsible for reserving query from clients, analysis query and optimizer query.
- Worker is responsible for it data from their stores such as hdfs ,processes, then returns the results to Coordinator.
- Then coordinator according the results to the client.
They make some modifications since they use YARN as unified resource manager.
-
They deploy a presto cluster on YARN cluster in their production environment,which is easy to maintaining and scaling.
-
User have high demands on isolation,so they implemented job isolation.
Job isolation schedule to make the class more stable .In addition, they have an ERP authorization system with presto. -
Inorder to allow dynamic management a powerful server was deployed,they can modify configurations of cluster at the runtime and he will give a detail exploration of this system.
扫描二维码关注公众号,回复: 12581740 查看本文章 -
According to the query feature,they also developed the query results cache function.
This is their presto on yarn architecutre with this unified resoucre manager can dynamically scale the number of workers.
Here is powerful server.
Intelligent Scheduler
They divide the query into two categories
Cache improves query efficiency
Last 6 months, there are about 1 million queries every day and each query costs about 5 seconds.
Presto & Alluxio Stack
Our user case of Presto & Alluxio
They have deployed a Alluxio in theiry presto production environment two years ago.
Alluxio is virtual distributor file system that unifies data access between storage and computing.
There are some features contributed by jindong, such as adding new web ui features and improving shell commands.
For example, watermark evict strategy.
Cache consistency.
This scheme is very effective
They excuse the same query for many times.
Ongoing Exploration
The features we are exploring