SQL on Hadoop in practice and optimization of deft big data platform | Share Record


Deft Big Data Architect Jane Bell

This article is based on data deft great architects Zhong Liang "SQL on Hadoop in practice and optimization of deft big data platforms" share the contents of his speech, finishing A2M Artificial Intelligence and Machine Learning Innovation Summit from May 18-19.

Description: This paper introduces on Hadoop, deft SQL on Hadoop platform overview from SQL, SQL on Hadoop in the deft use of the experience and improve the analysis, deft SQL on four areas Hadoop's future plans introduced SQL on Hadoop architecture.

01 SQL ON Hadoop Introduction

SQL on Hadoop, as the name suggests it is based on an SQL engine architecture Hadoop ecology, in fact, we often hear the Hive, SparkSQL, Presto, Impala architecture, then, I will simply describe common architecture situation.

SQL on Hadoop-HIVE

HIVE, a data warehouse system. It is mapped to a data structure stored data, the read data stored on large-scale distributed through SQL, writing, management.


According to the definition of the data pattern, the Storage and an output, the input of which will be compiled SQL, optimized engine to generate the corresponding task scheduled to perform task then generated.

HIVE engine types are currently supported: MR, SPARK, TEZ.


HIVE itself based architecture, there are some additional service delivery, such as HiveServer2 and MetaStoreServer are Thrift architecture.

In addition, HiveServer2 provide functional remote client to submit SQL tasks, MetaStoreServer provides remote client operation function metadata.


SQL on Hadoop introduction -SPARK

Spark, a fast, easy to use, unified analysis of large-scale data processing engine to DAG as execution mode, the main module is divided into SQL engine, stream processing, machine learning, graph processing.


SQL on Hadoop introduction -SPARKSQL

SPARKSQL SPARK-based calculation engine, to achieve a unified data access, integration Hive, supports standard JDBC connection. SPARKSQL common scenarios for interactive data analysis.


The main execution logic SPARKSQL, above all, the SQL parsing syntax tree and semantic analysis generates a logical implementation plan, then the metadata interact with optimized logic implementation plan, and finally, the logic execution translated into a physical implementation plan that RDD lineage and perform tasks.


SQL on Hadoop introduction -PRESTO

PRESTO, an interactive analysis of open source distributed SQL query engine queries.

Because the memory-based computing, computing performance PRESTO larger than a large number of operations and the MR IO SPARK engine. It has easy-to-elastic expansion, support features pluggable connections.

The industry uses a lot of cases, including FaceBook, AirBnb, the US group and others have large-scale use.


SQL on Hadoop Introduction - Other industry solutions


We see so many SQL on Hadoop architecture, it illustrates the architecture side more practical and mature. Use SQL on Hadoop architecture, we can achieve massive data processing needs support.

02 deft SQL on Hadoop Platform Overview

Overview deft SQL on Hadoop platform - platform scale


Platform SQL queries daily total of around 700,000, the total amount of DQL is about 180,000. AdHoc cluster machine is mainly used for interactive analysis and inquiry, DQL average time for the 300s; AdHoc in internal Loacl task and application acceleration engines, so the query requires less time-consuming.

ETL cluster is mainly used to generate the ETL processing, and reporting. DQL average time of 1000s, DQL P50 consuming to 100s, DQL P90 4000S is time consuming, in addition to these two clusters, other small clusters mainly used to provide services to the individual use.

Deft SQL on Hadoop platform Overview - Service level


Service layer is the upper layer applications. In the upper level has four modules, including synchronization service, ETL platform, AdHoc platform and the user program. In scheduling the top, there are also four aspects of data, server logs e.g., post-processing it, it will be directly connected to the HDFS, we will follow it further washing process; RBI data and database information services, the synchronization service will pass into the corresponding data source, the metadata information and we will exist in the back-end metadata system.

Pages crawled data is stored hbase, cleaning and subsequent processing will be carried out.

Overview deft SQL on Hadoop platform - Platform Component Description


HUE, NoteBook is the main provider of interactive query system. Reporting system, BI system is mainly common ETL processing, and report generation, additional metadata system is in foreign service. Deft engine now supports MR, Presto and Spark.

Management system is mainly used to manage our current cluster. HiveServer2 cluster routing system, is mainly used to select the engine. Monitoring system and operation and maintenance systems, mainly carried out the operation and maintenance for HiveServer2 engine.

We use HiveServer2 process, encountered many problems. Next, I will detail to explain how everyone is deft and optimization practice.

03 SQL experience on Hadoop in deft analysis and improvement

HiveServer2 multi-cluster architecture

There are currently more than HiveServer2 clusters, namely AdHoc and ETL two clusters, as well as other small clusters. There are different clusters corresponding connecting ZK, the client may be connected through the cluster HiveServer2 ZK.

In order to ensure the stability of the core tasks of the ETL clusters were classified into core clusters and cluster in general. When a client connects HS2, we will determine task priority, high priority task will be routed to the cluster core, low-priority task will be routed to the general cluster.


HiveServer2 service internal flow chart


BeaconServer Service

BeaconServer Hook Server back-end services for the service, with HS2 in Hook, in addition to HS2 services to achieve the desired function. Currently supported modules include routing, auditing, SQL rewrite, mission control, error analysis, optimization recommendations.

Stateless, BeaconServer service support horizontal scaling. Based on the size of the requested amount can be flexibly adjusted scale services.

Configuring dynamic loading, BeaconServer service supports dynamic loading configuration. Each switch module supports, and services offline configuration dynamically loadable achieved. Such as routing modules, according to the rear end of the engine acceleration cluster resources, even adjusting the ratio of fusion route.

Seamless upgrade, back end module can be individually BeaconServer and services offline upgrade, HS2 not affect Hook end service.

Pain points SQL on Hadoop platform encountered in use


Problems with the new engine accelerated faces

Hive support SPARK and TEZ engine, but not for production environments.

SQL on Hadoop SQL engine has advantages and disadvantages, a higher threshold for users to learn and use.

Syntax and functionality between different SQL engines support different, it requires a lot of testing and work compatible, fully compatible with high costs.

Blood management, access control, operation and maintenance management, resource use different SQL engines will each provide services several positions are inconvenient.

Intelligent engine solution

In the Hive, a custom implementation engine.

Automatic routing function, the engine need not be provided, automatically select the appropriate acceleration engine.

Eradicate rule matching SQL, SQL compatible only will push to accelerate the engine.

Multiplexing HiveServer2 cluster architecture.

Intelligence Engine: Mainstream comparison engine program


Intelligence Engine: HiveServer2 custom execution engine module design

Based HiveServer2, implemented in two ways. JDBC is through JDBC interface, will be sent to the back-end SQL engine to accelerate the start of the cluster. PROXY way is to push down to the local SQL engine start acceleration of Client.

JDBC way to start the back-end cluster are based YARN, can achieve resource of time division multiplexing. For example resources AdHoc cluster will automatically recover at night, multiplexed as a resource reporting system.


Intelligence Engine: SQL Routing Architecture Design

HS2 routing scheme based on the Hook architecture, implemented in the corresponding end HS2 Hook, for switching the engine; BeaconServer backend services for routing service for the matching process of routing rules SQL. Different clusters can configure different routing rules.

In order to ensure the stability of the route calculation service, the team also designed Rewrite Hook, rewritten for AdHoc cluster SQL, automatically added LIMIT upper limit, to prevent large amounts of data SCAN.


Intelligence Engine: SQL routing rules list


Smart Engine: Advantages

Easy to integrate, the current mainstream of the SQL engine can easily achieve JDBC and PROXY way. Through the configuration, you can simply integrate new query engine, such as impala, drill and so on.

Automatic selection of engine, reducing engine user costs, but also make the migration easier. In the case of acceleration and engine overload can dynamically adjust the ratio, to prevent the influence due to an overload on the acceleration performance.

Automatically downgrade to ensure the reliability of operation. SQL support failback routing module, you can choose whether to route the engine fails, roll back to MR to run according to the configuration.

Module reuse for the new engine, you can reuse custom collection with HiveServer2 blood, certification authority, lock concurrency control scheme, greatly reducing the cost.

Resource reuse, adhoc queries for resource-sharing can be dynamically adjusted to ensure the effective utilization of cluster resources.

The effect of the application intelligent engine DQL


HiveServer2 existing performance issues


FetchTask Acceleration: presorting and logic optimization

When the inquiry is completed, the results of local polls file size has been acquired LIMIT, then return. In this case, when there are a large number of small files exist, and the rear end of large files in time, it will lead to Bad Case, constantly interact with HDFS, get file information and file data, greatly lengthen the running time.

Before Fetch, the size of the resulting file pre-ordering, you can improve the performance of several hundred times.

Example: There are currently 200 files. 199 small files record a, 1 large files mixed record with a total of 200 test, a large index file name after the small files.


FetchTask Acceleration: presorting and logic optimization

Hive has a SimpleFetchOptimizer optimizer generates FetchTask directly, reducing the time and the scheduled resource request time. However, this optimization bottlenecks occur. If the amount of data is small, but more than the number of files, the number of pieces need to return more, there is a lot can screen out the resulting data Filter conditions. This time the serial reads the input file, causing large delays inquiry, but did not play accelerated effects.

In SimpleFetchOptimizer optimizer to determine the condition of the number of new files, last mission will be submitted to the cluster environment to achieve accelerated by increasing concurrency.

Example: read the partition of the current 500 files. The optimized file number threshold value is 100.


Large table Desc Table Optimization

There are a large number of sub-partition table, it DESC process will interact with the metadata, access to all of the partitions. Last but returned with only the information related to the table.

When interacting with the metadata, the delay of the entire DESC queries, when metadata stressful times can not even return a result.

TABLE DESC for the process to directly remove the process of interaction with the metadata acquisition partition, with the acceleration time is proportional to the number of sub-partitions.

Example: one hundred thousand desc large table partition.


Other improvements

Split multiplexed data calculated skips reduce statistical estimate repetition entry process. Input data intensive tasks, scheduling rate up to 50%.

parquetSerde init acceleration, prune optimized skip repeat the same table column, to prevent the map task op init expires.

New LazyOutputFormat, has created another record output file, to avoid empty file, resulting in a large number of downstream reading empty files consume time.

statsTask supports multi-threaded aggregation statistics, leading to an intermediate file to prevent excessive polymerization is too slow, increase the running time.

AdHoc need to open parallel compiler, serial prevent SQL compiler lead to an overall increase in the delay time of issue.

Pain points SQL on Hadoop platform encountered in use


SQL on Hadoop in the deft use: common usability issues


HiveServer2 optimization service starts

When HS2 materialized view will start the initialization function, the entire polling metabase, resulting in very long start-up time HS2, from offline state to re-line spacing is too large, poor usability.

The materialized views modified to delay lazy loading, loading a separate thread, HS2 does not affect the service starts. Loading get support for materialized views cached information to ensure availability features.

HS2 time from start up to 5min + <5s.


HiveServer2 heat load configuration

HS2 itself off the assembly line on the higher costs, the need to ensure the implementation of all the tasks on the service in order to complete the operation. Modification of the configuration may be as high operating frequency, and the need to achieve thermal load.

In ThriftServer layer HS2 increase our interface, open up after the operation and maintenance system, push down the configuration to be updated automatically invoked, can achieve thermal load configuration to take effect.


HiveServer2 optimization of Scratchdir

HiveServer2 of scratchdir mainly used for temporary file storage during operation. When HS2 in the session creation, will create scratchdir. HDFS in stressful times, a large number of sessions will be blocked in the process of creation scratchdir, resulting in the accumulation of the number of connections to the upper limit, the final HS2 service can no longer be connected to the new connection, affect service availability.

In this regard, we first separated from the general inquiries scratch directory and create temporay table queries, and supports create temporay table queries scratch lazy creation. When create temporay table to create a large number of temporary files will be affected when the HDFS NameNode delay time, scratchdir HDFS NameNode general inquiries can be a normal response.

In addition, HS2 is further configured to support multiple scratch, scratch different loading ratio can be provided, in order to achieve uniform loads HDFS.


Hive Stage concurrent scheduling anomaly repair

Hive schedule in which there are two problems.

A sub-state is a non-executive Task completion time, if the parent rounds comprising sub Task Task, resulting in sub Task scheduling queue is repeatedly added. This Case, the non-state needs to be modified to perform the initialization state.

Second, when the process to determine whether the executable in the sub-Task, because of the abnormal state detection, can not join the normal sub Task scheduling needs, causing the loss of inquiry Stage. And this Case, our approach is after the execution was complete, a Stage results of the implementation status check, if it is found downstream Stage is not completed, direct throw an error, to achieve completeness checking the status of the query results.


Other improvements

HS2 implements the interface to terminate the query SQL. Using this feature, you can promptly terminate the abnormal SQL.

metastore JDOQuery query optimization, keyword abnormal skip prevent long Caton metadata query or part of the abnormal effects of metadata.

Increase control switch, forced to cover the appearance of the directory, problem-solving insert overwrite appearance, file rename being given.

Hive parquet increase pushdown closed configuration, to avoid abnormal parquet OR pushed down conditions, lead to incorrect results.

executeForArray join function leads to large string OOM, increase the limit optimization.

Increasing the schema table according to partition data read function, to avoid not modify the partition cascade resulting in the read data schema exception.

Pain points SQL on Hadoop platform encountered in use


Why develop SQL Expert System

Some users do not experience in development, we can not handle error processing engine returns.

Some erroneous error message is not clear, users can not correctly understand the cause of the error.

The high cost of the investigation task failure, the need for Hadoop are very familiar with the whole system.

User error SQL, and the need for optimization of SQL, has a lot of commonality. Manpower maintenance costs are high, but low system cost analysis.

SQL Expert System

SQL-based expert system architecture Hook HS2 is, in BeaconServer back-end implementation of the three main modules, namely SQL rules control module, SQL error analysis module, and SQL optimization suggestions module. SQL expert system knowledge base, including keywords, explanation of reasons, several major information processing programs, stored in the back-end database, and has been accumulated.

, The back-end SQL queries can be exceptional control through the SQL expert system to avoid waste of resources or affect SQL abnormal cluster stability. Users when they encounter problems, can get directly dealing with program issues, reducing the cost.

Example: an empty partition query control.


Job Diagnostic System

SQL error diagnosis expert system to solve the needs of HS2 part of task execution, but to determine the cause of problems such as the health of the job, the task exception, etc., require specialized systems to solve, and we designed the operating system diagnosis.

YARN operating system level diagnostics for different execution engines to collect the Counter and configuration analysis. At the operational level, put forward relevant optimization suggestions.

Operating system diagnostic data through the API is also available to SQL expert systems, supplementary reasons for problem analysis.


Operating system provides a diagnostic query page to query the task. The following rules are entered too many hits map task query process:


In operation interface, you can also see more job diagnostic information, and modify the job recommendations.


Pain points SQL on Hadoop platform encountered in use


SQL on Hadoop in the deft use: common operation and maintenance problems


Audit Analysis - Chart

BeaconServer audit function is a module services.

Configured by HS2 in Hook, send the required SQL, IP, User and other information to back-end, parsing, can be extracted DataBase, Table, Columns and operational information will be analyzed and then stores it in Druid system. User section open data available through visualization platform.


Audit analysis - hot information inquiry

Hot hot spot information query is about to show the information within a period of time, the user's focus operation, including which visited the library, which tables and which type of operation.


Audit analysis - blood information inquiry

The FIG can be seen that the information presentation blood upstream of a dependent table is created, typically for the scope of statistics.


Audit analysis - historical operating query

Operational history can be traced back to a period of time, the operation for a particular table. Able to obtain information about the user, client, platform, and operating time. Usually for CRUD tracking table.


HiveServer2 cluster AB switching scheme

Because of the high cost HiveServer2 off the assembly line on the service itself, if you want to perform the upgrade operation, often it takes a long time and affect availability. Switching scheme HiveServer2 cluster AB, rely mainly on the cluster line A, B clusters alternate embodiment, by switching the line on the cluster ZK machine, to achieve a seamless upgrade.


HiveServer2 off the assembly line on dynamic cluster

HiveServer2 cluster deployed Metrics monitors can track the use of cluster services in real time. In addition, we HS2 service has been transformed to achieve HS2 ZK off the assembly line and request Cancel interface.

When an external monitor Monitor perceived contiguous memory is too high, it will automatically trigger the FGC operation HS2 service process, if the memory is still too high in a row, then by ZK directly off the assembly line service, and according to the time sequence of queries submitted, in order to stop inquiries, until the memory recovery, to ensure the normal operation of the service remaining tasks.


HiveServer2 cluster management platform

HiveServer2 in a multi-cluster state, need to master each cluster, as well as the status of each HS2 services. By management platform, you can view the version, the start-up time, resource usage and the offline state.

Open platform with subsequent operation and maintenance can be more easily upgraded, and a gray-button.


Improved query deft summary platform


04 deft SQL on Hadoop's future plans

Expert system upgrade, automate SQL optimization and tuning parameters

AdHoc query cache acceleration

Research and application of new engines


Above from the pretty bell teachers to share. They are also want to see more deft speech about teacher? June 21-23 to participate in the global Internet infrastructure GIAC Assembly Shenzhen station it - we invited to the deft application of R & D test head of Mi Qun, will tell us the topic "quickly moving online quality monitoring end" of.


In addition, this session, the organizing committee also invited the 105 guests from the first-tier Internet giant Google, Microsoft, Oracle, eBay, Baidu, Alibaba, Tencent, Shang Tang, Tucson, byte beating, Sina, the US group reviews etc. attend and share their experiences about the topics AI, medium and large units, Cloud-Native, IoT, chaos engineering, Fintech, data and business intelligence, project management and culture, classical architecture, etc., problems encountered and solutions. Now fill out the registration information, and free access to all of the PPT GIAC summit! Come to identify the figure of two-dimensional codes Register now!




Reproduced in: https: //juejin.im/post/5cf48545f265da1bb96fc73d

Guess you like

Origin blog.csdn.net/weixin_34292959/article/details/91433047