Near real-time large-scale data streaming Brooklin

Brooklin is a near real-time, large-scale distributed data flow services, LinkedIn since 2016 has been using the service, support thousands of daily traffic and over 2 trillion messages.

Brooklin Overview

Why develop Brooklin?

Because of the scalable, low-latency data processing pipeline demand continues growing, LinkedIn data infrastructure has also been evolving. In a reliable way high-speed transmission of large amounts of data is not easy, in addition, they also solve other problems - to support a big problem to be solved is also a fast-growing type of data storage and messaging systems. To this end, LinkedIn developed Brooklin, to meet the needs of their system with scalability in terms of the variability of the amount of data and systems.

What is Brooklin?

Brooklin is a distributed system, the data intended to mass transfer in a reliable manner to a plurality of different data storage systems and messaging systems. It exposes a set of abstractions, to expand its ability to support the new system of consumption and production data through the new Brooklin consumers and producers to write. Is used to transmit data to a plurality of data storage systems (e.g., Espresso and Oracle) and a message system (e.g., Kafka, Azure Event Hubs and AWS Kinesis) in LinkedIn, Brooklin.

Scenarios

Brooklin, there are two scenarios: Data Transfer Bridge and change data capture (Change Data Capture, CDC).

Data transmission bridge

Data may be distributed in a different environment (public cloud data centers and companies), or deployment location group. Under normal circumstances, since the access mechanism serialized format, or the difference in compliance security requirements, each environment has its own complications. Brooklin as a "bridge" between these environments, transmission of data between these environments. For example, Brooklin can transfer data between different cloud services (such as AWS Kinesis and Microsoft Azure) between different clusters within the same data center or even between different data centers.

Brooklin is committed to providing cross-environment data transmission services, which need only one service can deal with all the complexities of the situation, so that application developers can focus on process data rather than moving data. In addition, this centralized, hosted and extensible framework also facilitate the implementation of various corporate strategies and implement data governance. For example, by arranging Brooklin, unified policy may be implemented within the company, such as requiring any incoming data must JSON format, or any outflow of data must be encrypted.

Prior to the development of Brooklin, LinkedIn use Kafka MirrorMaker (KMM) to Kafka Kafka mirror data from one cluster to another cluster Kafka, but scalability issues encountered when using this tool. Brooklin bridge can be used as general-purpose data transmission, so they can easily transfer large amounts of data Kafka. It is so, they had given up KMM, in turn, will be integrated into Kafka mirroring solutions in Brooklin.

One of the greatest scenario in Brooklin LinkedIn is a mirror image Kafka move data between clusters and data centers. LinkedIn heavily dependent Kafka, Kafka used to store all types of data, such as logging, tracking and indicators, and the like. In order to facilitate centralized access to all of LinkedIn's data centers use Brooklin data summary. They also transmit large amounts of data between Kafka and LinkedIn Azure with Brooklin.

Illustrated as a hypothetical example, it is used Brooklin Kafka pooled data of two data centers, so that easy access to the entire set of data from any data center. Each data center can handle a plurality of clusters Brooklin source and destination

Brooklin program for mirroring Kafka data has been large-scale testing, it has been completely replaced the previous KMM, daily mirror news for several trillion. Brooklin solve the pain points when using KMM, both the stability and maneuverability has improved. Construction mirror Kafka Brooklin based solution, it can make good use of the key features of Brooklin, as will be described in more detail.

Multi-tenant

在 KMM 部署模型中,每个 KMM 集群只能在两个 Kafka 群集之间镜像数据。因此,KMM 用户通常需要使用数十个甚至数百个单独的 KMM 集群,这种方式非常难以管理。由于 Brooklin 同时可以处理多个独立的数据管道,所以可以使用一个 Brooklin 集群来同步多个 Kafka 集群,从而降低了维护数百个 KMM 集群的复杂性。

图示为一个假设示例,使用 KMM 来汇集两个数据中心的 Kafka 数据。相比之下,Brooklin 需要的 KMM 集群更少(一个 Brooklin 对应一个来源和目的地)

 动态配置和管理

有了 Brooklin 服务,通过 HTTP 调用 REST 端点即可轻松创建新数据管道(也称为数据流)和修改现有管道。如果要镜像 Kafka,无需做出更改或部署静态配置即可轻松创建新的镜像管道或修改现有管道的镜像白名单。

虽然镜像管道可以在同一个集群中共存,但 Brooklin 可以单独控制和配置每个管道。也就是说,修改管道的镜像白名单或向管道添加更多资源都不会影响到其他管道。此外,Brooklin 允许按需暂停和恢复单个管道,这在临时操作或修改管道时非常有用。在进行 Kafka 镜像时,Brooklin 支持暂停或恢复整个管道、白名单中的单个主题,甚至单个主题分区。

 问题诊断

Brooklin 还公开了一个诊断 REST 端点,可以按需查询数据流的状态。可以通过该 API 轻松查询某个管道的内部状态,包括单个主题分区是否出现滞后或错误。由于诊断端点汇集了整个 Brooklin 集群的信息,因此可以无需扫描日志文件就可以快速诊断特定分区的问题。

 特殊功能

开发 Brooklin 的目的是为了取代 KMM,因此他们对 Brooklin 的 Kafka 镜像解决方案进行了优化,提高了稳定性和可操作性。为此,他们针对 Kafka 镜像进行了一些独有的改进。

他们努力实现更好的故障隔离,在镜像一个特定分区或主题时出现的错误不会影响到整个管道或集群。Brooklin 能够在分区级别检测出错误,并自动暂停这些有问题的分区镜像。过了一段时间(可配)之后,这些自动暂停的分区可以自动恢复,不需要手动干预。这种特性在临时出现错误时特别有用。与此同时,其他分区和管道的处理不受影响。

为了改进镜像延迟和吞吐量,Brooklin 还可以在免冲刷(flushless)生产模式下运行,在这个模式下,Kafka 的处理进度可在分区级别进行跟踪。检查点是在每个分区而不是管道级别完成的,所以 Brooklin 可以避免进行 Kafka 生产者冲刷(这是一种阻塞的同步调用,通常会让整个管道停顿几分钟)。

通过将 LinkedIn 的所有 KMM 迁移到 Brooklin,他们将镜像集群的数量从数百个减少到大约 12 个。利用 Brooklin 进行 Kafka 镜像还可以加速迭代,因为他们还在不断添加新特性和改进。

变更数据捕获(CDC)

Brooklin 的第二个主要应用场景是变更数据捕获。这些应用场景主要任务是以低延迟数据流的形式传输数据库的更新。例如,LinkedIn 的大部分真实数据(例如作业、连接和配置文件信息)都保存在各种数据库中。有一些应用程序需要知道何时发布了新工作职位、建立新的专业联系或更新成员个人资料。这些应用程序不需要通过查询在线数据库来检测这些变更,Brooklin 会通过近实时的方式传输这些数据库更新。使用 Brooklin 进行变更数据捕获的最大优势之一是可以更好地实现应用程序和在线数据存储系统之间的资源隔离。应用程序可以独立于数据库进行扩展,从而降低了拖垮数据库的风险。LinkedIn 使用 Brooklin 构建 Oracle、Espresso 和 MySQL 的变更数据捕获解决方案 ; 此外,Brooklin 的可扩展模型还有助于编写新的连接器,为数据库源添加 CDC 支持。

Change data capture to capture the online update is available for the data source, and propagated to a number of near-line process applications. Exemplary use case is a notification service / application, for monitoring the profile update, to send notification to the relevant users

Boot Support

Sometimes, before the incremental update process, the application may require a complete snapshot of the data. This is the first time you start or due process of changing the logic of the need to re-process the entire data set, this situation may occur in the application. Brooklin connector model can be extended to support such use cases.

 Transaction support

Many databases support transactions for these data sources, Brooklin connector ensures transaction boundaries.

More information

The first release version of Brooklin introduced Kafka mirroring, you can use simple commands and scripts to test projects. Brooklin team is working to add support for more data sources and destinations, so stay tuned!

Guess you like

Origin www.cnblogs.com/fewfwf/p/11832552.html