[Big Data] Linkis: Data middleware that connects upper-level applications and underlying computing engines

Linkis is a data middleware open sourced by WeBank, which is used to solve the connection, access and multiplexing problems between various tools and applications in the foreground and various computing and storage engines in the backend.

1 Introduction

Linkis, a data middleware that connects multiple computing and storage engines such as Spark, TiSpark, Hive, Pythonand , HBaseetc., provides a unified REST// interface to WebSocketthe outside world JDBC, and submits and executes SQL, Pyspark, HiveQL, Scalaand other scripts.

Based on the microservice architecture, Linkis provides enterprise-level features such as financial-level multi-tenant isolation, resource management and control, and authority isolation. It supports unified variable, UDF, function, and user resource file management, and has high concurrency, high performance, and high availability. Big data operations / Request full lifecycle management capabilities.

2. Background

The wide application of big data technology has spawned an endless stream of upper-level applications and lower-level computing engines.

It is a common practice for almost all enterprises at this stage to implement business needs by introducing multiple open source components, and to continuously update and enrich the big data platform architecture.

As shown in the figure below, when we have more and more upper-level applications, tool systems, and underlying computing and storage components, the entire data platform will become a network structure as shown in the figure above.

insert image description here
With the continuous introduction of new components to meet business needs, more and more pain points are also generated:

  • The business requirements are varied, and the upper-level components have their own characteristics. Users have a strong sense of fragmentation when using it, and the learning cost is high.
  • There are many types of data, and the storage and calculation are very complex. Generally, a component only solves one problem. Developers must have a complete technology stack.
  • The introduction of new components cannot be compatible with the original data platform in terms of multi-tenant isolation, user resource management, and user authority management. The top-down customized development is not only a huge project, but also reinvents the wheel.
  • The upper-layer applications are directly connected to the underlying computing and storage engines. Any changes in the underlying environment will directly affect the normal use of business products.

3. Original design intention

How to provide unified data middleware, connect with upper-layer application tools, and shield various calls and usage details at the bottom layer, so that business users only need to focus on business implementation, even if the computer room of the bottom-level platform is expanded or the overall relocation is not affected. Linkis was designed from the ground up!

insert image description here

4. Technical Architecture

insert image description here
As shown in the figure above, based on the SpringCloud microservice technology, we have created multiple microservice clusters to build Linkis' middleware capabilities.

Each microservice cluster undertakes part of the functional responsibilities of the system, and we have clearly divided them as follows. like:

  • Unified Job Execution Service (UJES) : A distributed REST / WebSocket service for receiving various access requests submitted by the upper system.
    • Currently supported calculation engines are: Spark, Python, TiSpark, Hiveand Shelletc.
    • Supported scripting languages ​​are: SparkSQL, Spark Scala, Pyspark, R, Python, HQLand Shelletc.;
  • Resource Management Service (RM) : Supports real-time management and control of the resource usage of each system and user, limits the resource usage and concurrency of the system and users, and provides real-time resource dynamic charts to facilitate viewing and management of system and user resources.
    • Currently supported resource types: Yarnqueue resources, servers (CPU and memory), concurrent users, etc.
  • Unified storage service (Storage) : Universal IO architecture, can quickly connect to various storage systems, provide a unified call entry, support all common format data, high integration, easy to use.
  • Unified Context Service (CS) : Unified user and system resource files (user scripts,JAR,ZIP,Propertiesetc.), unified management of parameters and variables of users, systems, and computing engines, one setting, and automatic reference everywhere.
  • Material library service (BML) : System and user-level material management, which can be shared and transferred, and supports automatic management of the entire life cycle.
  • Metadata service (Database) : Real-timeHivedatabase table structure and partition status display.

Relying on the mutual cooperation of these microservice clusters, we have improved the way and process of the external service of the entire big data platform.

5. Business Architecture

insert image description here

  • Gateway : Based on Spring Cloud Gateway, the plug-in function has been enhanced, and the front-end Client and background multi-WebSocket microservices have been added 1 11 more thanNNN support, mainly used for parsing and routing and forwarding user requests to specified microservices.

  • Unified entry : The unified entry is the job lifecycle manager of a certain type of engine job for the user. Entrance manages the entire life cycle of a job from receiving jobs, submitting jobs to the execution engine, feeding back job execution information to users, and completing jobs.

  • Engine manager : The engine manager is responsible for managing the full life cycle of the engine. Responsible for applying for and locking resources from the resource management service, instantiating new engines, and monitoring the life status of the engines.

  • Execution engine : The execution engine is a microservice that actually executes user jobs, and it is started by the engine manager. In order to improve the interaction performance, the execution engine directly interacts with the unified portal, and pushes the execution log, progress, status and result set to the unified portal in real time.

  • Resource management service : Real-time control of the resource usage of each system and each user, management engine manager resource usage and actual load, limit the resource usage and concurrency of the system and users.

  • Eureka : Eureka is a service discovery framework developed by Netflix, and SpringCloud integrates it into its sub-projectsspring-cloud-netflixto realize SpringCloud's service discovery function. Each microservice has a built-in Eureka Client, which can access Eureka Server and obtain service discovery capabilities in real time.

6. Process flow

How does Linkis handle a SparkSQL submitted by the upper system?

insert image description here

  • The user of the upper system submits a SQL, which is first passed through Gateway, Gatewayresponsible for parsing the user request, and routing and forwarding to the appropriate unified entrance Entrance.
  • EntranceIt will first find out whether there is an available Spark engine service for the user of the system, and if so, directly submit the request to the Spark engine service.
  • If there is no Spark engine service available, start Eurekaregistering the discovery function through the service to get a list of all engine managers, and RMobtain the actual load of the engine managers in real time by requesting .
  • EntranceGet the engine manager with the lowest load and start asking the engine manager to start a Spark engine service.
  • The engine manager receives the request and starts asking RMthe user under the system whether the new engine can be started.
  • If it can be started, it starts requesting resources and locks them; otherwise, it returns an exception that failed to start Entrance.
  • The resource is successfully locked, and the new Spark engine service is started; after the startup is successful, the new Spark engine is returned to Entrance.
  • EntranceAfter getting the new engine, start requesting SQL from the new engine.
  • Spark's new engine receives SQL requests, starts Yarnsubmitting and executing SQL to , and pushes logs, progress, and status to Entrance.
  • EntrancePush the acquired logs, progress and status to in real time Gateway.
  • GatewayPush back logs, progress and status to the frontend.
  • Once the SQL is executed successfully, Engineactively push the result set to the front end Entranceand Entrancenotify the front end to fetch the result.

How to ensure high real-time performance

As we all know, Spring Cloud integrates Feignas a communication tool between microservices.

Based on FeignHTTP interface calls between microservices, only A microservice instance can randomly access an instance of B microservice according to simple rules.

However, the execution engine of Linkis can directly and proactively push logs, progress and status to the unified portal that requests it. How does Linkis do it?

Linkis Feignimplements a set of its own underlying RPCcommunication scheme based on .

insert image description here
As shown in the figure above, we Feignhave encapsulated Senderand on the basis of Receiver.

SenderIt is directly available as the sender, and the user can specify to access a certain microservice instance, or random access, and also supports broadcasting.

ReceiverAs the receiving end, the user needs to implement Receiverthe interface to process the real business logic.

SenderThree access methods are provided, as follows:

  • askThe method is a synchronous request response method, which requires the receiver to return a response synchronously.

  • sendThe method is a synchronous request method, which is only responsible for sending the request to the receiving end synchronously, and does not require the receiving end to give a reply.

  • deliverIt is an asynchronous request method. As long as the process of the sending end does not exit abnormally, the request will be sent to the receiving end through other threads later.

7. How to support high concurrency

Linkis designed 5 55 large asynchronous message queues and thread pools, Job occupies less than11 millisecond, to ensure that each unified entrance can undertake more than10000 1000010000 + TPS permanent job requests.

insert image description here

  • How to improve the request throughput of the upper layer?
    • EntranceThe WebSocket processor has a built-in processing thread pool and processing queue to receive Spring Cloud Gatewaythe upper layer request forwarded by the route.
  • How to ensure that the execution requests of different users in different systems are isolated from each other?
    • EntranceIn the job scheduling pool , each user of each system has a dedicated thread to ensure isolation.
  • How to ensure efficient job execution?
    • The job execution pool is only used to submit the job. Once the job is submitted toEnginethe terminal, it will be put into the job execution queue immediately to ensure that each job does not occupy the execution pool thread for more than 1 11 millisecond.
    • The RPC request receiving pool is used to receive and processEnginethe logs, progress, status and result sets pushed by the end, and update the relevant information of the Job in real time.
  • How to push the job's log, progress and status to the upper system in real time?
    • The WebSocket sending pool is specially used to process the log, progress and status of the Job, and push the information to the upper system.

8. User-level isolation and scheduling timeliness

Linkis designed the Scheduler module - a group scheduling consumption module that can intelligently monitor and expand, and is used to realize the high concurrency capability of Linkis.

insert image description here
Each user of each system will be grouped separately to ensure system-level and user-level isolation.

Each consumer has an independent monitoring thread to count indicators such as the length of the waiting queue in the consumer, the number of events being executed, and the growth rate of execution time.

The grouping object corresponding to the consumer will set thresholds and alarm ratios for these indicators. Once a certain indicator exceeds the threshold, or the ratio between multiple indicators exceeds the limited range (for example, if the average execution time is monitored to be greater than the distribution interval parameter, it will be considered to exceed threshold), the monitoring thread will immediately expand the consumer accordingly.

When expanding, it will make full use of the above parameter adjustment process, and increase a certain parameter in a targeted manner, and other parameters will be automatically expanded accordingly.

9. Summary

As a data middleware, Linkis has made many attempts and efforts to shield the details of lower-level calls.

For example: How does Linkis implement unified storage services? How does Linkis unify UDF, functions and user variables?

Due to limited space, this article will not discuss it in detail. If you are interested, you are welcome to visit the official website: https://linkis.apache.org

Is there a set of data middleware that is truly based on open source, has been self-developed and perfected through financial-level production environments and scenarios, and then returned to the open-source community, so that people can use it to serve production with relative confidence, support financial-level business, and have enterprise Level feature guarantee?

We hope Linkis is the answer.

Guess you like

Origin blog.csdn.net/be_racle/article/details/132436264
Recommended